CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada
Evaluating Interfaces for Privacy Policy Rule Authoring Clare-Marie Karat1, John Karat1, Carolyn Brodie1, and Jinjuan Feng2 1
2
IBM T.J. Watson Research Center Hawthorne, NY 10532 USA ckarat, jkarat,
[email protected]
Towson University Baltimore, MD 21272 USA
[email protected] PI [18, 22]. Questions of who has what rights to information about us for what purposes become more important as we move toward a world in which data is aggregated [19]. It is now technically possible to identify revealing information about almost anyone. As stated by Adams and Sasse [2]: ‘Most invasions of privacy are not intentional but due to designers’ inability to anticipate how this data could be used, by whom, and how this might affect users.’ Deciding how we can design usable and effective privacy technology for the future includes philosophical, legal, cultural, business, social policy, human performance, and practical dimensions. Since at its core the goals of HCI and user-centered design are to understand the user, the user’s tasks or goals, and the social and physical context in which the user is interacting with the system, all of these dimensions can be considered as within the scope of creating usable and effective privacy solutions [16].
ABSTRACT
Privacy policy rules are often written in organizations by a team of people in different roles. Currently, people in these roles have no technological tools to guide the creation of clear and implementable high-quality privacy policy rules. High-quality privacy rules can be the basis for verifiable automated privacy access decisions. An empirical study was conducted with 36 users who were novices in privacy policy authoring to evaluate the quality of rules created and user satisfaction with two experimental privacy authoring tools and a control condition. Results show that users presented with scenarios were able to author significantly higher quality rules using either the natural language with a privacy rule guide tool or a structured list tool as compared to an unguided natural language control condition. The significant differences in quality were found in both user self-ratings of rule quality and objective quality scores. Users ranked the two experimental tools significantly higher than the control condition. Implications of the research and future research directions are discussed.
Our research focus assumes a view of privacy as “the right of an individual to control use of their personal information by organizations in daily life” rather than as “the right to individual isolation” [3, 20, 22, 29, 30]. Organizations commonly provide a description of what kind of information they will collect and how they will use it in privacy policies and also have internal policies and procedures that guide the use of sensitive information within the organization. In some areas (e.g., the collection and use of health care information in the US or movement of personal information across national boundaries in Europe) such policies are required by law, though the content of the policy is not generally specified in legislation. While differences in privacy legislation in different regions of the world and different domains (e.g., health care, banking, government) [6, 7, 18, 20, 31] will continue to exist for some time, there is considerable consensus around a set of high-level privacy principles for information technology [22, 25].
Author Keywords
Privacy, privacy policies, natural language interfaces, social and legal issues, design process. ACM Classification Keywords
H5.2. Information interfaces and presentation: User Interfaces. K4.1. Public policy issues: Privacy INTRODUCTION
The rapid advancement of the use of information technology in industry, government, and academia makes it much easier to collect, transfer, and store personal information (PI) around the world. This raises challenging questions and problems regarding the use and protection of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2006, April 22–27, 2006, Montréal, Québec, Canada. Copyright 2006 ACM 1-59593-178-3/06/0004...$5.00.
While privacy policies are not new to organizations [4, 28, 29], very little has been done to implement the policies through technology. Privacy policy enforcement remains largely a human process. Said another way, currently polices are not directly connected to the procedures for implementing them. There are emerging standards for privacy policies on websites [10], but these address
83
CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada
• •
machine readable policy content without specifying how the policy might be implemented. Our research focus is on how people in organizations can create a wide range of policies that can be understood easily, and how technology might enable the policies to be enforced and audited for compliance. Our long-term goal is the creation of technology that enables people to have a logical and verifiable thread from the privacy policies that they author to access decisions regarding data held in an organization’s system configuration. This will lower the risk of inappropriate use of PI. Such privacy policy technology will enable organizations to manage personal information appropriately, audit for compliance with privacy policies, and preserve individual rights.
• • •
•
data user: the person who requests access to the PI, data element: the personal information to be accessed, purpose: the reason for using the PI, use: the action that will be taken regarding the PI, condition: a constraint regarding the use of the data element, for example, the data element may only be used if the data subject has opted in to receive marketing information , and obligation: a commitment that must be fulfilled regarding use of the PI, such as to notify the data subject or delete the PI after 30 days.
The first four of these elements can be said to be required of any good privacy policy rule, and the last two are optional. The data user who accesses the data may be acting in a particular role in regard to a purpose. For example, doctors may read protected health information for medical treatment and diagnosis. In many privacy policies and legislation, granting or denying access incurs an obligation on the data user or the organization to take additional actions. For example, a medical researcher may read protected health information for medical research if the patient has previously explicitly authorized release (i.e., a condition) and the patient is notified within 90 days of the release of information (i.e., an obligation).
The current situation with respect to privacy in organizations is described in a Forrester report [12]. This research reveals a mismatch between consumer demands for privacy and enterprise practices in industry. According to this report, although customer concerns about privacy remain high, the majority of executives (58%) believe privacy issues are addressed extremely well by their companies. Most executives don’t know whether their customers check the privacy policies or not and few see the need to enhance their privacy practices. Research by Jupiter [14] and survey research in the Asia-Pacific region [26] complement these results. This research highlights a gap between the protection individuals expect and what organizations are currently providing. We suggest that privacy protection must extend across the network into enterprise processes, and that there is a need to audit data collection and sharing mechanisms. We agree that technology design generally does reflect concerns of society in general [1], and believe we are experiencing a shift toward a greater concern for privacy in IT design.
Research Purpose and Background
Privacy policy rules are often written in organizations by a team of people including representatives from human resources, business process owners, information technology security, and specialists in the policy and compliance areas. Currently, people in these roles have little or no technology tools to guide the creation of enforceable privacy policy rules that are clear and implementable through technology. A high-quality privacy policy rule is one that people can understand and which in turn can be put into practice through the organization’s people, business processes, and technology.
Research results from the International Association of Privacy Professionals (IAPP) reports that 98% of companies have privacy policies. Often organizations have both internal policies, which state more specific rules about information handling within an organization, and external policies which describe the policy in more general terms intended to inform external audiences such as customers, patients or citizens about the use of their information. In this work we focus on internal policies, largely because they describe actual data handling procedures in organizations. There has been progress in standardizing privacy policy rules [5, 10, 24]. The rules have been found to have a fairly specific structure which describes who can use what information for what purposes. Organizations generally have a number of internal privacy policies; some to address the use of data about internal employees, and others to address the use of data about individuals with which the organization interacts (e.g., customers, patients, clients, citizens). Any policy includes a number of rules governing the use of data-subjects’ information. The elements of a privacy policy rule include:
We will use a brief scenario to highlight the requirements for future privacy technology. People in organizations need to be able to read and discuss the organization’s privacy policy rules in natural language to define and gain consensus about the content of the rules, educate employees about the privacy aspects of their roles, and inform the members of the organization about current rules and any changes. It is also critical that the organization is able to communicate the privacy policy rules in an appropriate level of detail to the organization’s clients, constituents, and customers who can then decide whether and how to interact with the organization. Next, people must be able to implement these privacy policy rules in the organization’s information technology systems in a logical and verifiable manner that maintains the intended content of the privacy policy rules. This would allow an organization to have a clear and logical flow from the natural language text of the
84
CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada
policies through to the implementation in the organization’s systems for privacy access decisions, and compliance auditing of logs regarding those access decisions.
In this research, we focused on the task of authoring privacy policy rules to understand how best to facilitate the authoring process. We conducted interview sessions and design walkthrough sessions of an early prototype version of the privacy policy workbench with target users from North America, Europe, and Asia. These sessions enabled us to define user profiles for the people in the roles who author privacy policies. We also identified design requirements for an authoring tool [17].
In order to obtain the logical end-to-end flow, it is critical that people know how to author clear and implementable rules. These rules must have sufficient content and definition to enable them to be mapped reliably, for example, to specific roles, actions, data elements, and purposes within an organization. Many privacy policy rules today are vague and cannot be implemented as written. Much privacy legislation today also suffers from this problem. This situation brings to mind the adage “garbage in, garbage out”. If tools could be created to help people author clear and implementable rules, the opportunities for improvements in legislation, organizational policies and practices, and privacy protection for individuals would be significant. Therefore, an important research goal is to determine how to enable people to write high quality privacy rules.
We then created two different tools for authoring privacy policy rules as part of a mid-fidelity prototype of a privacy policy workbench for privacy policy authoring, implementation, and compliance auditing. The two methods enabled in these tools were: Natural Language with a Guide (NL with Guide) and Structured Entry from Element Lists (Structured List). In the first method, people could write privacy policy rules using a guide for the rule as an aid (see Figure 1). The guide identified the six possible elements in a privacy rule, along with a suggested, though not required, ordering of elements for readability. The second method allowed people to click on one or more elements in each of the six lists of elements to create a privacy rule in the form of a sentence (see Figure 2). We created the two authoring methods based on emerging and adopted standards for the elements necessary in privacy policy rules. The international standards activities include efforts related to the Enterprise Privacy Authorization Language (EPAL) within W3C [5] and XACML with a privacy profile within OASIS [24].
A step in this direction is to create usable and useful tools that facilitate the creation of privacy policies in natural language that can be transformed into machine-readable XML code. The research reported here describes an empirical study we conducted as part of a larger effort to design a system which facilitates privacy policy authoring, implementation, and compliance monitoring for organizations across their heterogeneous configurations [17]. We envision a privacy policy workbench that enables people to write rules in natural language which are parsed by the workbench and transformed into the machinereadable code. This research builds on previous research conducted in natural language processing support for developing policy-governed software systems and on deriving semantic models from privacy policies [e.g., 8, 21]. The privacy policy is written initially in natural language. Then the workbench employs natural language shallow parsing technology [23] to parse the rules and extract the elements of the privacy rules using a defined syntax for privacy policy rules, grammars written specifically for privacy rules that reflect the defined syntax, and data dictionaries for policy domains (e.g., healthcare, banking, government). The policy rules and identified policy elements are then stored in a database. Experts in the organization can review, update and finalize the rules through use of a representation of the rules provided in natural language. When the rules are finalized, they are transformed into XML code. The privacy policy authoring workbench can use the machine-readable code described above to implement the policy across an organization’s information technology configuration. Once implemented, the policy enforcement engine makes decisions about requests to access personal information and logs the data about decisions. The log data is analyzed to determine compliance with privacy policies and to answer queries from people about the use of their personal information.
Figure 1. The natural language with guide method of authoring privacy policy rules.
The two privacy policy authoring methods received very positive ratings in scenario-guided design walkthrough sessions of the privacy policy workbench prototype conducted with target users [17]. The prototype was created using dynamic HTML. This type of prototype is referred to as “Wizard of Oz” as it is realistic and allows people to
85
CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada Experimental design
understand and provide feedback on the design ideas while there is little real-time functionality underlying it. In these sessions, target users of the privacy workbench observed the use of the workbench to create privacy policy rules following the storyline in the scenario. Our team collected qualitative and quantitative feedback from the target users on the design and capabilities of the workbench.
Thirty-six employees of a large IT company were recruited through email to participate in the study. The participants had no previous experience in privacy policy authoring or implementation. While it is generally important to utilize target users (i.e., representatives of the group of people who will use the system) in laboratory studies, we elected to use privacy policy novices in this study for two reasons. First, the team’s earlier work with customers suggested that privacy policy authoring is handled by teams with a variety of backgrounds. Many of the team members have no specific training in privacy policy rule authoring. This suggests that authoring methods should be suitable for a fairly wide audience. Second, the population of people with experience in authoring privacy policies is currently very small due to the emerging nature of the privacy field. They are part of a rare group of specialists and would be extremely difficult and expensive to recruit for an empirical laboratory study. We expect this to change over time. Privacy policy rule authors will become skilled as they gain access to the sorts of tools we are creating. Thus we felt it practical and appropriate to begin the evaluation of the methods we were designing with a general audience in order to understand general design issues related to usability, quality, and effectiveness. Then the authoring methods must be validated through hands-on evaluation sessions with target users in organizations as the design for the system is refined and the effort moves forward in an iterative user-centered design process.
Figure 2. The structured list method of authoring privacy policy rules.
The next logical step was to evaluate the privacy policy rule authoring methods in hands-on use by people attempting to write rules related to particular situations. EVALUATING POLICY AUTHORING
A repeated measures design was employed in the study; each participant completed one task scenario in each of the three conditions. For each task they were given a different scenario (health care, finance, or government related). The order of the presentation of scenarios was counter-balanced across all participants and conditions.
An empirical laboratory study was conducted to compare the quality of rules written and user satisfaction with the two experimental privacy policy authoring tools the team designed and illustrated in the privacy workbench prototype. In order to provide a baseline comparison for the two methods (NL with Guide, and Structured List), we added a control condition that allowed users to enter privacy policies in text in any format with which they were satisfied (Unguided NL). The control condition presented a blank text processing application window for the participant to author rules in. This most closely approximates what we found people do currently to write privacy rules and is thus a typical baseline condition for this task as well. We had two hypotheses for the study:
All participants started with a privacy rule task in the Unguided NL control condition (Unguided NL). Then, half of the participants completed a similar task in the NL with Guide condition, followed by a third task in the Structured List condition. The other half of the participants completed the Structured List condition followed by the NL with Guide condition. The control condition needed to be run first so that the participants were not biased by the new methods provided in the experimental conditions. Since the participants received no feedback after the conditions on the rules they had written, we believe that there was a very limited experimental effect from having all participants complete the control condition first. We hypothesized that the difference between the control and experimental conditions would be large. We were very interested in the differences that might emerge between the two experimental conditions. All participants completed presession consent and information forms, the tasks in the three conditions, and a general debriefing in an hour or less.
1) The two authoring tools (NL with Guide, shown in Figure 1 and Structured List, shown in Figure 2) will enable people to write better quality privacy rules and with higher satisfaction than the control condition tool (Unguided NL, similar to Figure 1 except without the Guide). 2) There will be no difference in the quality of rules created or user satisfaction with the NL with Guide and Structured List tools.
86
CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada
They received a coupon for lunch in exchange for their participation.
In order to compare the quality of the rules participants created under different conditions and scenarios, we developed a standard metric for scoring the rules. We counted each element of a rule as one point. Therefore, a basic rule of four compulsory elements had a score of four. A scenario that consisted of five rules, including one condition and one obligation, had a total score of 22. We counted the number of correct elements that participants specified in their rules, and divided that number by the total score possible for the specific scenario, thus creating the standardized metric. We allowed a wide variety of word forms and sentence structures to be counted as correct (e.g., allowing participants to use various terms for “medical information” in the scenario in Figure 3). We did not subtract from the score if participants included extra elements. This scoring method created a standardized score of the percentage of elements correctly identified and included in the privacy rules written by participants. We could then compare the scores for participant’s rules across the different scenarios and conditions.
In each task, we instructed participants to compose a number of privacy rules for an organization for a predefined task scenario. A sample of the scenarios provided to participants is given in Figure 3 below. Participants worked on three different scenarios in the three tasks. We developed the scenarios in the context of three privacy sensitive domains, namely health care, government, and banking. Each scenario contained five or six privacy rules, including one condition and one obligation. For example, one privacy rule for the scenario in Figure 3 might be “Pharmacists can use current drug information for the purpose of checking for drug interactions with new prescriptions.” The Privacy Policy for DrugsAreUs Our business goals are to answer customer questions when they call in (Customer Service), fulfill orders for prescriptions while protecting against drug interactions (Pharmacists), and to provide customers valuable information about special offers (Marketing). In order to make sure our customers’ privacy is protected, we make the following promises concerning the privacy of information we collect at DrugsAreUs. We will only collect information necessary to provide quality service. We will ask the customers to provide us with full name, permanent address and contact information such as telephone numbers and email addresses, and a variety of demographic and personal information such as date of birth, gender, marital status, social security number, and current medications taken. On occasions where we need to verify a customer’s identity, Customer Service Reps will only use the social security number to do so. Our pharmacists will use the current medication information when processing new orders to check for drug interactions.
To further explore the concept of high-quality privacy policy rules we completed additional analyses of the rules. We analyzed the missing elements in rules written by participants in the three conditions to determine any patterns in these errors. We also used a well-known metric in the HCI, educational, and government domains known as the Flesch readability score [11] to assess the readability of the privacy policy rules that participants created. The Flesch formula is run against text in order to compute a readability score based on the difficulty level of a passage of text. The higher the score, the more readable the text is. The Flesch metric is used to determine whether materials for the general public are written at the appropriate educational level of difficulty or in test development to ensure that passages of text are written at equivalent levels of difficulty [13]. To follow-up on the Flesch metric analysis and delve into a more in-depth level of analysis of the rules, we also scored the rules on the complexity of the elements in the sentence structure.
We will make reports for our internal use that include age and gender breakdowns for specific drug prescriptions, but will not include other identifying information in the reports and will delete them after five years. At times we might use customer information in the creation of other reports for internal analysis, but these will not include individually identifying information. For example, our research department might access customer data to produce reports of particular drug use by various demographic groups. If the customer indicates that they are willing to receive information about special offers, Marketing will provide customer names and addresses to our partner organizations but will not include any other details of our customer records.
RESULTS AND DISCUSSION
There was a significant difference in the task completion time across the three conditions (F(2, 70) = 4.58, p < 0.05). Mean participant time on task was 910 seconds for Unguided NL, 814 for NL with Guide, and 992 for Structured List conditions respectively (see Figure 4). Posthoc tests showed that the NL with Guide method took significantly shorter time than the Structured List method. There was no significant difference between the Unguided NL method and the other two methods. The longer amount of time for the Structured List method may have been due to the fact that the element lists were pre-populated with items that the user had to become familiar with through the task. In the two natural language conditions, users were free to use their own words. In these two conditions, the NL with Guide method gave the user valuable aid in the
Figure 3. Sample scenario for privacy policy authoring.
To evaluate the quality of the privacy policy rules written and the user satisfaction with them, we collected and analyzed several different types of data. We recorded the time that the participants took to complete each task and collected the privacy rules that participants composed. We also collected, through questionnaires, participants’ satisfaction with the quality of the rules created and their preference rankings of the three tools at the end of the study session.
87
CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada
composition of the rules and increased user productivity as compared to the Unguided NL method.
create rules of significantly higher quality than the Unguided NL method. There was no significant difference between the NL with Guide method and the Structured List method. Using the Unguided NL method, participants correctly identified about 42% of the elements in the scenarios, while the NL with Guide method and the Structured List method users correctly identified 75% and 80% of the elements, respectively. Considering the fact that the participants were novice users, some of the improvement might have been the result of learning in the first trial. However we did not provide feedback on rule quality and attribute most of the improvement to the authoring methods themselves and not to learning in the first trial.
A repeated measures test with post-hoc analyses indicated that participants were more satisfied with the quality of the rules created by the NL with Guide method or the Structured List method as compared with the Unguided NL method (F(2, 70) = 6.54, p < 0.005). On a scale of 1 to 7, with 7 indicating highest overall satisfaction, participants mean satisfaction scores were 3.8 for Unguided NL, 4.9 for NL with Guide and 4.6 for Structured List conditions (see Figure 5). There was no significant difference between the NL with Guide method and the Structured List method. In initial use, users perceived they were creating higher quality rules with the two experimental methods as compared to the control condition.
We also examined the correlation between perceived rule quality and the objective rule quality. For the Unguided NL and the Structured List conditions the correlations (.14 and .26, respectively) showed an agreement between perceived and objective measures. For the NL with Guide condition, the correlation (-.20) showed a negative relationship between the two. These correlations are relatively weak and account for a small amount of the variance. The data suggest that there is wide variation in the standards that participants used to evaluate the quality of the rules and the objective metric that we used to score the rules. This result is an interesting area for future research.
1200
Time (seconds)
1000
800
600
400
200
0.9
Unguided NL
NL with Guide
Standardized Quality Score
0
Structured List
Figure 4. Mean time on task for privacy rules created using the three authoring methods.
7
Satisfaction Scores
6
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Unguided NL
5
NL with Guide
Structured List
Figure 6. Mean objective quality scores for the privacy rules created using the three authoring methods.
4 3
We examined the data in more depth by separating the total number of elements identified by participants into the six privacy policy element categories, to examine whether there were any categories that caused particular difficulties (see Figure 7). A two way repeated measures test with post-hoc analysis suggested that the NL with Guide method and the Structured List method significantly outperformed the Unguided NL method across all categories except the Condition category. There was no significant difference between the NL with Guide condition and the Structured List condition for any of the 6 categories. For the Unguided condition, we found that participants were more likely to identify the Data Element than the Data User (t (35) = -4.45, p < 0.001), suggesting that privacy policy novices tended to
2 1
Unguided NL
NL with Guide
Structured List
Figure 5. Mean participant satisfaction ratings of quality of rules created using the three authoring methods.
A statistical test of the rule quality scores calculated using the standard objective metric found a significant difference between the three conditions (F(2, 70) = 44.3 p < 0.001) (see Figure 6 below). Post-hoc tests showed that the NL with Guide method and the Structured List method helped users
88
CHI 2006 Proceedings • Privacy 1
April 22-27, 2006 • Montréal, Québec, Canada
focus on ‘what data can be accessed’ rather than ‘who should have access to the data’. Both the NL with Guide and Structured List methods helped the users create more complete rules for all element categories except Conditions as compared to the Unguided NL method. The Condition element may warrant additional design work to make it clear and understandable.
40
Readability Score
35
1 Unguided NL
Percentage of elements
0.9
NL with Guide
Structured List
30 25 20 15 10 5
0.8
0
0.7
Unguided NL
0.6
NL with Guide
Structured List
0.5
Figure 8. Mean Flesch readability scores for the three authoring methods, higher score denotes greater readability.
0.4 0.3 0.2
30
0.1 0
25
O
Average Words
n tio di on
n io at ig bl
C
a at
e os rp Pu
D
r se
n tio Ac
U
Figure 7. Mean percentage of elements included by users in privacy rules for each data category for the three methods.
20
15
10
5
We examined the readability of the policies created. Jensen and Potts [13] found that privacy policies posted on the Web were generally not easy to read. We adopted the Jensen and Potts [13] measurement approach and used the Flesch readability score to evaluate the readability of the rules composed in the study. A repeated measures test suggested that there was a significant difference in the readability of the rules composed in the three conditions (F (2, 70) = 15.89, p