PrivacyGuide: Towards an Implementation of the ... - ACM Digital Library

Linkography and Privacy

IWSPA’18, March 21, 2018, Tempe, AZ, USA

PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation Welderufael B. Tesfay

Goethe University Frankfurt Frankfurt am Main, Germany [email protected]

Peter Hofmann


Shinsaku Kiyomoto

Toru Nakamura

KDDI R&D Laboratories Inc. Saitama, Japan [email protected]

Jetzabel Serna

KDDI R&D Laboratories Inc. Saitama, Japan [email protected]


ABSTRACT

1

Nowadays Internet services have dramatically changed the way people interact with each other and many of our daily activities are supported by those services. Statistical indicators show that more than half of the world’s population uses the Internet generating about 2.5 quintillion bytes of data on daily basis. While such a huge amount of data is useful in a number of fields, such as in medical and transportation systems, it also poses unprecedented threats for user’s privacy. This is aggravated by the excessive data collection and user profiling activities of service providers. Yet, regulation require service providers to inform users about their data collection and processing practices. The de facto way of informing users about these practices is through the use of privacy policies. Unfortunately, privacy policies suffer from bad readability and other complexities which make them unusable for the intended purpose. To address this issue, we introduce PrivacyGuide, a privacy policy summarization tool inspired by the European Union (EU) General Data Protection Regulation (GDPR) and based on machine learning and natural language processing techniques. Our results show that PrivacyGuide is able to classify privacy policy content into eleven privacy aspects with a weighted average accuracy of 74% and further shed light on the associated risk level with an accuracy of 90%.

These days, there are over a billion websites on the world wide web 1 which are accessed by almost half of the world’s population 2 . As a result, huge amount of data is being collected and synthesized by service providers and spying entities on daily bassis 3 . When users disclose these digital footprints about themselves, they often have no control as to what companies will do with them which results in a huge information asymmetry. To address this issue, regulatory bodies have devised compliance guidelines and responsibilities for service providers that aim to preserve user’s informational selfdetermination rights. As such, privacy policies have emerged to be the de facto transparency boards that service providers use to communicate their information processing practices. Moreover, privacy policies also serve as binding legal agreements between website operators and their users [22]. While some studies indicate that readable privacy notices increase the trust of users on the service providers [1, 16], a number of other research works have shown that privacy policies suffer from bad readability [3, 18, 21], as a result users rarely read them. This low reading rate is mainly due to their complexity, use of technical jargons in their description and their naturally excessive length. In fact, server side observations show that only 1% or less of total users click on privacy policies [13]. Eye tracking experiments in a different study also back up server side observations [20]. Reading privacy policies is not just only cumbersome, but also a waste of time that would cause tremendous harm for the economy. McDonald and Cranor [15] showed that if a user was to read the privacy policy of every service she visits on the Internet, she would on average need 244 hours annually which is slightly more than half of the average time a user would spend on the Internet by then. This is even more complicated and time consuming in today’s Internet services infrastructure where the number of on-line services has more than doubled since then and data is also shared with "third parties" which have their own privacy policies [5]. While the Internet continues to grow, service providers and other data processors are continuously being challenged to abide by data protection regulations. One such regulation is the newly adopted European

KEYWORDS privacy policy; privacy notice; machine learning; text summarization ACM Reference Format: Welderufael B. Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto, and Jetzabel Serna. 2018. PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation. In IWSPA’18: 4th ACM International Workshop on Security And Privacy Analytics, March 19–21, 2018, Tempe, AZ, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/ 3180445.3180447 Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). IWSPA’18, March 19–21, 2018, Tempe, AZ, USA © 2018 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-5634-3/18/03. https://doi.org/10.1145/3180445.3180447

INTRODUCTION

1 http://www.internetlivestats.com/total-number-of-websites/

Accessed October 2017 Ac-

2 www.thenextweb.com/contributors/2017/04/11/current-global-state-internet/.

cessed September 2017 3 www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-datacreated-daily. Accessed September 2017

15



Union (EU) General Data Protection Regulation (GDPR) [8]. The GDPR takes, among other things, transparency and user consent as core components wherein the responsibilities of the data processor and the rights of the user are specified. Especially, article 13 highlights on the information that must be provided to the data subject at the time their personal data is collected. Moreover, article 12 states that information must be provided in "a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child". Our motivation in this work is, thus, mainly influenced by the GDPR. We aim at supporting Internet users by simplifying privacy policies’ readability. We take the requirements stated in the GDPR as a basis and provide a privacy policy benchmarking tool that considers privacy aspects and through the use of risk indicators we draw users’ attention to the most relevant parts of the privacy policy. To achieve this, we first introduce a categorization of data based on the interpretation of the GDPR. Second, we propose a risk based approach to privacy policy benchmarking wherein we use supervised machine learning techniques in order to extract the most relevant content of the lengthy privacy policies. We then employ a three scale risk based classification (Green, Yellow, Red) in order to simplify the interpretation of the privacy policy and draw users attention to the risk indicators.

1.1

Following a similar direction, Guntamukkala et al. [11] also proposed a method for evaluating the completeness of privacy policies. In their work, authors highlighted that their core contribution was mainly a new evaluation criteria called goal-based approach. However, in their work, authors did not discussed how to evaluate the privacy risk associated to a given privacy policy. We considered these approaches as relevant and motivation for our work, despite the fact that they are pre-GDPR and did not consider the evaluation of privacy risk.

2.2

Terms of Service: Didn’t Read (ToS:DR) 5 is a community based project whose purpose is to evaluate privacy policies by crowdsourcing, and also provides an add-on for a browser. However, as pointed out by Zimmeck et al. [24] crowd-sourcing based methods suffer from lack of participation. Still, Zimmeck et al. [24] used results of (ToS:DR) to derive privacy aspects and develop Privee, a machine learning and NLP based tool. Nevertheless, as pointed out by Zaeem et al. [23], Privee also has drawbacks related to the ambiguity of the natural language and inconsistency of evaluations. In this regard, they defined a new criteria of privacy policies and created a corpus by setting up a small team work that evaluated the consistency of privacy policies. However, neither Zimmeck et al. nor Zaeem et al. based their criteria in any regulations. Moreover, some of Zaeem et al.’s criteria such as Credit Card Number information cannot naturally be applied to evaluate non-finance related web services.

Outline

The rest of the paper is organized as follows: Section 2 analyzes the related works. Section 3 elaborates on the research methodology and the assessment criteria used in this work. The PrivacyGuide architectural approach is then introduced in Section 4, followed by experimental results and discussion in Section 5. Finally, in Section 6, we conclude our paper and point out future directions of research.

2.3

ACKNOWLEDGMENTS

RELATED WORK

The challenges associated with data collection and processing practices, as well as those related to privacy policies have been addressed from different perspectives by a number of research works. Due to their diverse practices, in this paper, we summarize the related works in three categories as described below.

2.1

Privacy Notices

The idea of privacy notices is comparable with the idea of nutrition labels, i.e., without reading the whole privacy policy, users can get an idea of how a company will collect and process their data by skimming through privacy notices [12]. Studies by Kelley et al. [12] and Gluck et al. [10] have shown that the use of condensed and standardized privacy notices has a positive effect on user’s awareness of privacy practices. This holds true especially for improving accuracy and handling time as Kelley et al. outlined in their study; while full-policy-text formats performed the worst, standardized formats lead to the best results [12]. Crowd-sourcing approaches like ToS;DR (Terms of Service; Did not Read) follow a similar pattern as privacy notices: Privacy policies are condensed into a short list of positive and negative aspects and judged with an overall grade. Based on the reviews of a high number of contributors this can lead to valuable and fine-grained interpretations of privacy policies. Due to the ambiguity of natural language, these results may be even more precise than those of automated classifications [24]. However, despite the positive aspects of privacy notices and crowd-sourcing approaches, both attempts suffer from similar problems, namely: 1) Lack of coverage: The community-driven effort in ToS;DR rated 68 web services since its launch in 2012. Even if they provide positive and negative aspects, only 11 of them are finally judged with an overall grade. 2) Permanently outdated ratings: Since companies frequently update their privacy policies, crowd-sourcing approaches seem to be

The authors would like to thank the text annotation task participants. The work is partially supported by the H2020 EU Project CREDENTIAL under Grant No.: 653454.

2

Evaluation of Privacy Risk from Privacy Policies

Completeness of Privacy Policies

Constante et al. [4] proposed a method for evaluating the completeness of privacy policies, using Natural Language Processing (NLP) and Machine Learning (ML) techniques, where a privacy policy is said to be complete if it contains descriptions which should be explained in privacy policies, such as how to deal with cookies. Their work is based on relevant documents such as The Organization for Economic Co-operation and Development (OECD) guideline 4 , and the Safe Harbor Framework [9]. 4 http://www.oecd.org/sti/ieconomy/oecdguidelinesontheprotectionofprivacyand\

5 https://tosdr.org/index.html

transborderflowsofpersonaldata.htm

16

Accessed Dec 2017



rarely promising especially when it comes to scalability aspects. 3) Difficulties with legal documents: Privacy policies often contain technical and vague language. As Wilson et al. [22] state, even experts may not always agree on the interpretation of policies, thus crowd workers can not be expected to perform better.

3

METHODOLOGY

This section elaborates on the methodology followed in this paper. Inspired by previous research works in privacy policy summarization [4, 23, 24] and the EU GDPR, the implementation a machinelearning based proof-of-concept is proposed. First, with the support of legal experts and an extensive analysis of the EU GDPR, a privacy policy assessment criteria is defined. Second, a privacy corpus based on the defined criteria is created and used as an input for classification. Finally, a definition of the evaluation presentation is performed.

3.1

Figure 1: Abstract view of the PrivacyGuide architecture

Defining Assessment Criteria

The privacy aspects of the EU GDPR needed to be interpreted before they could be used as assessment criteria for privacy policies. While the EU GDPR is the legal boundary of privacy policies, it does not reflect the wording and formulation that is commonly used. Research works such as Costante et al. [4] have defined their privacy aspects mainly based on regulations, however, others used empirical techniques like quantitative user studies [24] or qualitative expert interviews [23]. For the definition of classification categories (based on the privacy aspects), we compared the privacy aspects of the EU GDPR (cf. Table 1) with the classification categories that have been used by previous studies. Article 9 is used to asses how service providers promise to treat specially privacy sensitive information types. Once the main categories were defined, a set of application privacy policies were analyzed by a group of privacy experts, as a result, three risk levels for each of the categories namely, privacy aspects were determined. Table 2 shows a sample of the risk level association, which was then used as a basis for the privacy policy corpus creation.

3.2

3.3

Classification

Since the problem space was defined as a multi-class problem (i.e., eleven privacy aspects and three associated risks levels for each of the categories), four well known classifiers namely, Naive Bayes, Support Vector Machine (SVM), Decision Tree (DT), and Random Forest that fulfill this criteria were implemented. We then compared the results using established evaluation metrics, i.e., precision, recall and finally f-measure as weighted harmonic average of both.

3.4

Privacy Policy Evaluation Presentation

Two information levels have been considered: a high-level information (by default), comprised of indicators representing each privacy aspect (category) and the resulting evaluation (risk level); and, a detailed information level, providing the privacy aspect description, meaning of the risk level, and finally, the privacy policy fragment associated to the privacy aspects and its evaluation.

4

PRIVACYGUIDE ARCHITECTURE

The main goal of the PrivacyGuide is to create a condensed and visualized summary of a given privacy policy. The visualized summary consists of relevant privacy aspects associated to the identified level of risk. Figure 1 provides a high level view of the PrivacyGuide architecture, its core components are described next.

Privacy Policy Corpus Preparation

Supervised learning techniques take as an input a training data set that must be manually prepared, so that afterwards the algorithm can learn and classify further inputs of data. Thus, in order to build up the ’privacy corpus’, 45 privacy policies were collected taking into consideration the 45 most accessed websites in Europe according to the results provided by Alexa. In order to perform that manual labeling tasks, 35 participants were recruited: 68,57% male and 31.43% female, all of them with at least bachelor degree, and ages ranging between 21 and 29 years old. They were provided with a description of the task, detailed instructions and explanations for each of the eleven categories and their a potential risk level (including descriptive examples). Finally, in order to facilitate the labeling task, they were provided with an annotation form with drop-downs and text passages describing the categories. On average, each participant extracted 12,66 text passages from a single privacy policy, resulting on 443 text passages assigned to a privacy category and classified with a risk level.

4.1

Core components

PrivacyGuide has been implemented using Java as the programming language, and it consist of a standalone application. Its core components have been implemented using Machine Learning APIs from the Waikato Environment for Knowledge Analysis (WEKA) of the University of Waikato (New Zea-land) 6 . In more detail, the core components consist of a policy content extraction mechanism, a pre-processing component and two consecutive prediction engines. Content Extraction: The aim of this component is to extract the plain text from a privacy policy that is embedded in an on-line 6 WEKA

is a open source http://www.cs.waikato.ac.nz/ml/weka/index.html

17

software

available

at:



Table 1: Identified privacy aspects Privacy aspect

Description

Source [8]/literature)

Data collection

Service providers (SPs) are required to collected data only for the intended purpose

Protection of children

Information related to children should be treated with utmost caution

Third-party sharing

The SP should inform the user if her personal data is shared with third parties

Data security Data retention

Companies are required to implement state of the art data security mechanisms Companies need to determine the storage period of personal data depending on the purposes for which they hold the information Does the service provider collect or share aggregated personal data? Mainly, the SP is required to provide full control of data to the user (deleting, modifying or transferring to other provider).

Art. 5 (1), 6; recitals 3250, 58, 60 et seq. Art. 6 (1) lit. f, 8, 12 (1); recitals 38, 58, 65 Art. 13 (1) lit. e, 14 (1) lit. e; recitals 61 Art. 32, recital 78 Art. 17, 13 (2), 14 (2)

Data aggregation Control of data Privacy settings

SPs should provide best privacy settings by default, or at least allow the user to make modification as needed Account deletion The user shall have the right to obtain from the controller the erasure of personal data concerning her without undue delay Privacy breach notifica- In event of a data breach, the data subject has the right to get notification of the tion incidence, and the possible remedies have to be communicated. Policy changes SPs are requested to inform users in transparent and understandable way when privacy policy are changed.

(GDPR

[23] Art. 13 (2) lit. b, 14 (2) lit. b, 15, 16, 17, 18; recitals 63 et seq. Art. 25, recital 78 Art. 17 Art. 12, 34, 40 Art. 12

Table 2: Sample privacy aspects and associated risk Privacy aspect

low risk

Medium risk

High risk

Third-party sharing

The policy explicitly mentions that either no third party sharing is done or that it only happens with the user permission for the intended purposes.

Information about third-party sharing and purposes (not limited to the intended purpose) are clearly stated. Personal information is deleted on users’ specific request.

User information may be shared with a nonspecific number of third parties with nonspecific purposes or agreements, or non third party sharing practices are mentioned.

Account dele- Full deletion: All account information tion can be deleted by the user herself or it is deleted within a month after user actions.

website. To achieve this, we used the Java library Boilerpipe by Kohlschutter et al. [14] and extracted the main content from a web page associated to the given URL. The library uses SVM and Decision Tree machine learning algorithms to extract the plain text by deleting all unnecessary elements as well as the source code tags. The main output is privacy policy main content, in the form of policy fragments.

No deletion intention is stated or no deletion information is provided.

in the (manually labeled) training data. The rationale behind this process was to reduce as much as possible the overall instances that needed to be classified. An instance of the keyword selection is shown in Table 3. As texts or documents contain a high number of words which characterize them, there are only some subset words that are candidates and useful for the classification task. WEKA enables the classification of texts and documents by offering options for filtering and pre-processing methods. The central filter that is used to convert a text into a set of attributes is the StringToWordVector filter. This method converts a string (e.g. a sentence or document) into a vector that contains all words as attributes and represents their occurrences. Furthermore, stop words removal, tokenization, and stemming are also performed in this phase. Term-frequency inverse document frequency (tf-idf) is used as feature extraction to

Pre-processing and feature selection: This component receives the policy fragments extracted from the policy content extraction mechanism and it pre-processes them. In other words, the resulting text is manipulated by string-operations and it gets split up into single sentences by using regular expressions. We reduced the workload by filtering the policy fragments using a keyword catalog for the eleven classification categories (cf. Section 3.1). The selection of keywords was done by considering those keywords identified

18



Table 3: Example of keywords used for filtering the policy fragments Privacy aspect

Keywords

Data collection

IP, personal, information, health, sensi, devic, mail, name Protection of chil- child, age, 18, 16, 13, guard, parent dren Third-party sharing share, sell, third, partner, provide Data security secur, measure, administrat, safe, tech, SSL, proce Data retention retent, stor, durat, long, period Data aggregation aggregat, combin, source, multi, similar, other Control of data access, update, review, delete, control, add, corr Privacy settings opt-in, opt-out, choice, select, prefer, object, choose Account deletion account, withdraw, delete Privacy breach noti- breach, secur, 72, noti fication Policy changes change, noti, policy, mail

Figure 2: Summarized policy representation GUI

4.2

GUI for Summarized Policy Presentation

We consider the Graphical User Interface (GUI) as one of the most important elements of the proposed tool. The objective of this component is on the one hand to provide a simplified presentation of the privacy policy and on the other hand, to draw users attention into identifying potential privacy risks. We examined previous research regarding the effectiveness of different formats for privacy indicators [2, 6] representation. Thus, we defined our design goals by considering requirements such as user-friendliness, risk indicators, use of icons, simplified interface, etc.,. Following this, we designed a user interface consisting of a dynamic dashboard (cf. Figure 2) that presents high-level information summarized by categories and represented by icons together with a legend including color changes that are associated to the resulting risk level; i.e. green (low risk), yellow (average risk) or red (high risk). Since one of the goals is to increase the understanding and trigger users interest in the privacy policies, we provide users with detailed information, which is made accessible by hovering over the icons. The detailed information view includes information associated to the risk class, a short description of its meaning and consequence; and the closest original sentence or paragraph on which the assessment was performed.

consider the most relevant words. Therefore, the words are ranked depending on their frequency in the actual document (tf) compared to their frequency in all other documents (idf). Prediction Engines: The output of the pre-processing component resulted in a high number of candidate sentences for each of the privacy aspects, that needed to be reduced to the most relevant ones. Following this, we implemented four well known supervised machine learning classifiers as indicated in Section 3.3. We defined a two step multi-class approach as described below. Privacy Aspect Prediction Engine (PAPE) After filtering the training and testing data, all eleven classifiers were trained by using the built-in methods of WEKA. Besides training and predicting, the Classifier class offers an option to directly print out evaluation metrics. Due to the relatively small training set we chose a cross-validation technique. Thereby the data set is partitioned in k equally sized subsets (folds). The algorithms performance is measured by applying it on each of the k subsets. We choose the 10-fold as it has been shown to work well in similar tasks [19]. Risk Prediction Engine (RiPE) Once the privacy aspect is predicted in the PAPE component, the next is to identify the associated risk class as shown in Table 2, which contains a sample of the risk definition levels. For transparency purposes, the user should be provided with the relevant sentence on which the risk classification is based on. Therefore, the best representative needs to be extracted. The assignment of a risk class depends on a specific probability which is used to determine the best representative candidate sentence. Similar to PAPE, the RiPE component also implements 10-fold cross validation to compute the evaluation metrics.

5

EXPERIMENTAL RESULTS AND DISCUSSION

In this section we present the experimental evaluations of PrivacyGuide. PrivacyGuide is applied to the next 10 websites’ privacy policies (cf. Table 5) in the Alexa list to validate it. We measured the computational performance (cf. Figure 3) and the time it consumes to produce the privacy aspect classification and risk labeling results (cf. Figure 4). The results were obtained after running PrivacyGuide in Microsoft Surface Pro 3 device with Windows 10 Pro OS, 8GB RAM, Intel Core i5-4300U CPU @ 1.90 GHz (Boost: 2.50 Ghz) and ADSL 2+ Internet connection. Before building the final model, we ran the experiments on the four algorithms (cf. Section 3.3). While Naive Bayes and SVM seem

19



Figure 4: Response time: a comparison between different optimization approaches

Figure 3: Computational performance per task

Table 4: Classification performance to be robust in the available training and testing data, Decision Tree and Random Forest showed low precision. This could be due to both the size and presence of some "noise" in the training data. Naive Bayes and SVM have relatively similar performance. However, SVM takes quite a lot of time compared to Naive Bayes, thus, the final model and the results indicated in Table 4 are based on the Naive Bayes classifier. Furthermore, Naive Bayes performed even better in the risk classification phase resulting in f-measure of 90%. Figure 3 shows the results from the computational performance of PrivacyGuide as applied to new sample of privacy policies. By applying it to ten different privacy policies for validation (listed in Table 5), we observed, that the time needed for all steps decreases with the number of iterations. While the data and policy pre-processing tasks seem to take constant amounts of time, this effect is remarkably strong in the case of policy loading and classification task. The overall result indicates PrivacyGuide can help users save significant amount of time needed to go through lengthy privacy policies (e.g., as opposed to reported in [15]). Nevertheless, the time that is needed for loading a privacy policy can be affected by a variety of external reasons like: the websites’ response time, the Internet connection, the bandwidth, the amount of content that is displayed on the same page and many more. The second performance test compares the effectiveness of the different optimization approaches (cf. Figure 4). While the "workload shifting" and pre-loading methods have already some influence on the performance, the parallelization approach caused the biggest loss in computing time. The application of all three approaches reduced the overall computing time from an average of 29200.1 ms to less than 2 seconds (1738.9 ms). Even though there might remain plenty of room for further improvements, we decided not to invest in further performance enhancements at the expenses of stability. As a result of the corpus preparation and classification we were able to observe that the 11th privacy aspect (cf. Table 4), i.e., data breach notification had an unexpected outcome. In 2012, the European Commission ran a public consultation on "Improving the Network and Information Security in EU" between July 23 and October 15 [7]. In total, the consultation received 160 responses.

Privacy Aspect

Recall (%)

Precision (%)

F1 (%)

Data Collection Protection of children Third-party sharing Data security Data retention Data aggregation Control of data Privacy settings Account deletion Policy changes Data breach notification

79 91 76 71 68 64 63 69 58 87 0

78 1.0 66 88 59 63 79 60 68 91 0

79 95 71 78 63 64 70 64 63 89 0

In this consultation, 57% of the respondents had expressed that they experienced information security incidents over the previous year that had a serious impact on their activities. Following this, the GDPR imposes a rather stricter requirement about the right of the data owner to get notified of an unfortunate data breach in the controller or processor of data. The GDPR defines personal data breach as a "...breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to, personal data transmitted, stored or otherwise processed" [8]. As such, we took privacy breach notification as one aspect in the privacy policy analysis. However, the results are not only surprising but also alarming, in that none of the 45 assessed privacy policies had stated data breach notification plans.

6

SUMMARY AND FUTURE WORKS

In this paper we introduced a simplified and visualized privacy policy summarization tool based on Machine Learning and Natural language processing techniques, taking as a basis the latest privacy principles and developments as highlighted in the EU GDPR. PrivacyGuide has been developed as an aid to support end-users in reading and understanding privacy policies. PrivacyGuide should

20



Table 5: The ten privacy policies used for validation

1 2 3 4 5 6 7 8 9 10

Company name bet365 Mailchimp The Sun News Northern and Shell Group (Sunday Express) SAP SE Unicredit Gruppo Mondadori Telecom Italia The Mirror Trumblr

Privay policy link https://help.bet365.com/en/privacy-policy https://mailchimp.com/legal/privacy/ http://www.newsprivacy.co.uk/single/ http://www.express.co.uk/privacy https://www.sap.com/corporate/en/legal/privacy.html https://www.unicreditgroup.eu/en/info/privacy.html http://www.mondadori.com/privacy-policy-eng http://www.telecomitalia.com/tit/en/footer/Privacy.html http://www.mirror.co.uk/privacy-statement/ https://www.tumblr.com/policy/en/privacy

simplify the way users inform themselves about data processing practices of companies on the Internet and lead them to take more informed decisions about their data sharing practices. Even though, PrivacyGuide has been designed mainly for end-users, we believe that it can also be valuable for industrial, scientific or governmental purposes. One may think of a data privacy officer willing to have a better insight of her company’s privacy policy readability and risk perception for conformity and compliance with the EU GDPR. Future research directions are aimed at extending our work by: i) analyzing additional privacy aspects and enriching our privacy policy corpus; ii) incorporating PrivacyGuide with personalized privacy settings prediction for better privacy preservation as outlined in [17]; iii) validating our approach by performing user studies to further assess the tool in terms of usability, acceptability, expected accuracy and consequently, of its potential adoption by users.

[12]

[13] [14] [15] [16] [17]

[18]

REFERENCES

[19]

[1] Gaurav Bansal and Fatemeh Zahedi. 2008. The moderating influence of privacy concern on the efficacy of privacy assurance mechanisms for building trust: A multiple-context investigation. ICIS 2008 Proceedings (2008), 7. [2] Christoph Bier, Kay Kühne, and Jürgen Beyerer. 2016. PrivacyInsight: the next generation privacy dashboard. In Annual Privacy Forum. Springer, 135–152. [3] Rochelle A Cadogan. 2011. An imbalance of power: the readability of internet privacy policies. Journal of Business & Economics Research (JBER) 2, 3 (2011). [4] Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry den Hartog. 2012. A machine learning solution to assess privacy policy completeness:(short paper). In Proceedings of the 2012 ACM workshop on Privacy in the electronic society. ACM, 91–96. [5] Lorrie Faith Cranor. 2012. Necessary but not sufficient: Standardized mechanisms for privacy notice and choice. J. on Telecomm. & High Tech. L. 10 (2012), 273. [6] Lorrie Faith Cranor, Praveen Guduru, and Manjula Arjula. 2006. User interfaces for privacy agents. ACM Transactions on Computer-Human Interaction (TOCHI) 13, 2 (2006), 135–178. [7] EC. 2013. Proposal for a DIRECTIVE OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL concerning measures to ensure a high common level of netw ork and information security across the Union. (2013). http://eur-lex.europa.eu/ legal-content/EN/TXT/PDF/?uri=CELEX:52013PC0048&from=EN [8] EU. 2016. REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). (2016). [9] Henry Farrell. 2003. Constructing the international foundations of ecommerceâĂŤThe EU-US Safe Harbor Arrangement. International Organization 57, 2 (2003), 277–306. [10] Joshua Gluck, Florian Schaub, Amy Friedman, Hana Habib, Norman Sadeh, Lorrie Faith Cranor, and Yuvraj Agarwal. 2016. How Short Is Too Short? Implications of Length and Framing on the Effectiveness of Privacy Notices. In Symposium on Usable Privacy and Security (SOUPS). [11] Niharika Guntamukkala, Rozita Dara, and Gary Grewal. 2015. A MachineLearning Based Approach for Measuring the Completeness of Online Privacy

[20] [21] [22]

[23] [24]

21

Policies. In Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on. IEEE, 289–294. Patrick Gage Kelley, Lucian Cesca, Joanna Bresee, and Lorrie Faith Cranor. 2010. Standardizing privacy notices: an online study of the nutrition label approach. In Proceedings of the SIGCHI Conference on Human factors in Computing Systems. ACM, 1573–1582. Ron Kohavi. 2001. Mining e-commerce data: the good, the bad, and the ugly. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 8–13. Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. 2010. Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining. ACM, 441–450. Aleecia M McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies. ISJLP 4 (2008), 543. George R Milne and Mary J Culnan. 2004. Strategies for reducing online privacy risks: Why consumers read (or donâĂŹt read) online privacy notices. Journal of Interactive Marketing 18, 3 (2004), 15–29. Toru Nakamura, Welderufael B Tesfay, Shinsaku Kiyomoto, and Jetzabel Serna. 2017. Default Privacy Setting Prediction by Grouping UserâĂŹs Attributes and Settings Preferences. In Data Privacy Management, Cryptocurrencies and Blockchain Technology. Springer, 107–123. Robert W Proctor, M Athar Ali, and Kim-Phuong L Vu. 2008. Examining usability of web privacy policies. Intl. Journal of Human–Computer Interaction 24, 3 (2008), 307–328. KA Ross, CS Jensen, R Snodgrass, CE Dyreson, CS Jensen, R Snodgrass, and L Chen. 2009. Cross-Validation. Encyclopedia of Database Systems. (2009). Nili Steinfeld. 2016. âĂĲI agree to the terms and conditionsâĂİ:(How) do users read privacy policies online? An eye-tracking experiment. Computers in human behavior 55 (2016), 992–1000. Ali Sunyaev, Tobias Dehling, Patrick L Taylor, and Kenneth D Mandl. 2014. Availability and quality of mobile health app privacy policies. Journal of the American Medical Informatics Association 22, e1 (2014), e28–e33. Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N Cameron Russell, et al. 2016. The Creation and Analysis of a Website Privacy Policy Corpus.. In ACL (1). Razieh Nokhbeh Zaeem, Rachel L German, and K Suzanne Barber. [n. d.]. PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining. ([n. d.]). Sebastian Zimmeck and Steven M Bellovin. 2014. Privee: An Architecture for Automatically Analyzing Web Privacy Policies.. In USENIX Security Symposium. 1–16.

PrivacyGuide: Towards an Implementation of the ... - ACM Digital Library

PrivacyGuide: Towards an Implementation of the ... - ACM Digital Library

Suggest Documents

PrivacyGuide: Towards an Implementation of the ... - ACM Digital Library

Towards 'integrated' monitoring and ... - ACM Digital Library

Canine Computer Interaction- Towards ... - ACM Digital Library

Reimagining Leaderboards: Towards Gamifying ... - ACM Digital Library

From Architectural Requirements towards an ... - ACM Digital Library

Towards an Automatic Top-down Role ... - ACM Digital Library

Towards an XML Format for Time-Stamps - ACM Digital Library

Towards an Anthropomorphic Lamp for Affective ... - ACM Digital Library

The Design and Implementation of WiMAX ... - ACM Digital Library

Towards Minimizing the Annotation Cost of ... - ACM Digital Library

Towards the Assessment of Software Product ... - ACM Digital Library

Digital Decoupling and Disentangling: Towards ... - ACM Digital Library

Maturing towards the digital library: Implementation of the electronic ...

The cloud - ACM Digital Library

design - ACM Digital Library

crpit - ACM Digital Library

Conversations - ACM Digital Library

Incentives - ACM Digital Library

Gunrock - ACM Digital Library

Abstract - ACM Digital Library

AdaGIDE - ACM Digital Library

MOVELETS - ACM Digital Library

Implementation of Software Testing Practices in ... - ACM Digital Library

Implementation of Scrum in Pakistan's IT Industry - ACM Digital Library