Metadata Protection Scheme for JPEG Privacy & Security using Hierarchical and Group-based Models Jonathan Lepsøy
Suah Kim, Dessalegn Atnafu, Hyoung Joong Kim
Norwegian University of Science and Technology
[email protected]
Graduate School of Information Security, Korea University {suahnkim,desatne}@gmail.com,
[email protected]
Abstract—Privacy and intellectual rights management is becoming an important topic. The existing social network services (SNS) have many limitations in meeting consumers‘ expectations of their privacy and intellectual rights management. Metadata management in SNS is heterogeneous and susceptible to personal identification and copyright information misuses. There are cases where consumers are not informed in how their metadata is shared and used, while the copyrights of images are removed without the owner’s consent. In this paper, we propose two metadata protection models called hierarchical and group-based model. The models are analyzed and evaluated with respect to the interest of user and services.
I. I NTRODUCTION SNS have become extremely popular with the development of technology. The better availability of Internet connection, applications and services, and hand held devices has spiked the usage of services where users daily distribute enormous amounts of data. This data includes, but is not limited to, images, videos, audio and document files. Users share these files with each others and services for different purposes. However, many users are not aware of the metadata which are transmitted along with those files. Metadata provides context about the file. It includes, but not limited to personal information about the user (e.g. name, address, gender, GPS location, and contact information [1]) and intellectual property rights information (e.g. copyrights). Metadata is a very powerful tool to organize and search through the growing libraries of image, audio and video content that users are producing and consuming, as well as protecting intellectual property [2]. On the other hand, misuse of metadata could potentially compromise user’s privacy in an unexpected way. Different services apply their own policy to handle user’s metadata. The International Press Telecommunications Council (IPTC) has carried out a survey about metadata handling methods on 15 major social network services [3]. According to the study Facebook, Twitter and Flickr remove metadata contents including copyright information. However, Google+ and Tumblr keep all metadata embedded in user images [3]. The first, second and third authors have contributed equally. The fourth author is the corresponding author
Such independent effort can expose users for privacy recycling; cause inconvenience to users from having to understand a variety of privacy policies for difference services; create difficulty for the auditing group to monitor misuse of metadata. Developing a standard way of handling metadata is required to prevent misuse of metadata. There are international efforts to standardize privacy protection in information communication technology. ISO/IEC 29100 is one of the existing standards which provides a high level framework for protection of the personally identifiable information (PII) [4]. The Privacy & Security initiative is the most recent effort in the ISO/IEC Joint Photographic Experts Group (JPEG) to assist transparent communications between the industries and the consumers regarding privacy and intellectual property rights related issues [5]. Privacy & Security initiative is working to provide a broad set of privacy protection tools for images and metadata, maintain data integrity, and protect intellectual property rights. The first workshop was held in October 2015, to gather interested industries and experts, and scope out the relevant technologies needed and better understand the needs of the industry, users, and policies. The activity is expected to be recommended to be part of a new standard called Privacy & Security. In this paper, we analyze the effect of the partial inclusion and exclusion of the metadata on privacy and intellectual rights. We also propose and analyze metadata protection schemes based on hierarchical and group-based models. II. C ASES OF METADATA MISUSE There are countless ways of utilizing personal data for malicious purposes. In particular we can divide such misuse into two main categories; invasion of privacy, and copyright infringement. Today, the way SNS and services handle metadata varies from service to service. This, in combination with long and hard-to-read privacy policies, makes it hard for users to stay informed on how their data is being handled. From the IPTC study discussed in the introduction [3], it appears that most SNS choose to simply delete all information from the images, which admittedly is the easiest approach. This way, the users’
Fig. 1. Example of location tracking using Youtube API [6].
metadata is not distributed to other users and thus not subject to misuse by an adversary. There are however several issues with such a solution. It is not obvious what each respective SNS does with the metadata which is removed from the images and if it is stored and/or sold. Further, all copyright information is removed from the images as well, which can hurt professionals who use SNS to advertise their works. In this section we outline situations that may occur in daily life using services that most people are accustomed to. A. Youtube - location tracking The first example originates from ”Cybercasing the Joint: On the Privacy Implications of Geo-Tagging” [6]. For a more detailed explanation of the experiment, please refer to this mentioned paper. An outline of the experiment can be seen in Fig. 1. Youtube provide users with an outstanding API which makes implementation in your own code very simple and versatile. As notable features, it gives the programmer many different options for searches in addition to uploading, downloading etc. In this experiment, a program was written to search for videos on Youtube which had GPS-location corresponding to the coordinates selected or within a set radius from the location and containing a set keyword which were posted recently. From this information, the home location of the video owner was extracted and a further search was executed to detect which of the users would not be home in the near future based on the other videos posted. By removing the GPS-location data, this would not have been possible and similar experiments could be performed on images as well by exploiting the metadata. Revelation of such information is often not done on purpose, but rather due to lack of knowledge. B. Facebook - copyright infringement Within the field of photography, copyright and ownership of your images are of essence to prevent copyright infringement and loss of revenue. Most SNS today remove all metadata from the images uploaded to their websites which is the easiest
way of coping with the problems outlined in the Youtube example from the previous subsection. This approach, however, imposes issues with copyright. Fig. 2 shows how an image can be copied without the copyright information through a popular SNS like Facebook even without the ill intention. If the adversary uses the image for commercial purposes, this is a copyright infringement and can cause substantial financial losses for the owner. III. P ROPOSED MODELS As a means of overcoming the failures described in the “Cases of metadata misuse“, we propose two metadata access control models for uploading images to SNS. It is nearly impossible to design one ideal model due to the fact that each service has different requirements and needs. This is why we propose two models so that each service has the option to choose the model which suits their needs the best based on the features offered. The first proposed model is called hierarchical model. It is based on a lattice-based access control [7]. In this model, the privacy setting is designed in hierarchical levels such that the metadata accessible by the service is limited by the numerical levels. In other words, the lower the privacy setting is, the more metadata is accessible. The second proposed model is called group-based model. It is based on the role based access control [8]. In this model, privacy settings are set depending on several predefined contextual groups. Each group will have their own privacy setting and metadata accessible by the service is limited by it. To aid the services in their choice, we outline each model’s features. Even though the models differ substantially in functionality, they are designed with the same goal in mind, i.e. increased transparency, collection limitation, and consent & choice [4]. Through using such models, the service receives an opt-in consent from the user rather than providing an opt-out option. This practice is likely to improve data minimization, and reduce the risk of a privacy breach. A. Hierarchical model The hierarchical model is a highly intuitive model. Fig. 3 shows an example of a three-level hierarchical model (outlined
Fig. 2. Example of copyrgiht infringement through SNS.
Fig. 3. Hierarchical model (green area) with advanced optional selection (blue area)
in the green box). The number of levels is generally up to the implementing party, but it is advised to keep the number of levels relatively small to avoid confusion. Each level can be viewed as a privacy setting template which removes a part of metadata from the image. The highest level will remove all metadata, where as the lowest level will retain all the metadata. This is generally performed hidden for the user to avoid excess amounts of information. However, an extension as shown in Fig 3 (outlined in the blue box) can give an option to manually remove or retain additional metadata. This way, the user will also have the option to view which metadata is being removed. B. Group-based model In the group-based model there is no concept of levels, but rather the concept of context. The privacy setting is set for a set of predefined groups (e.g. friends, family, colleagues). The metadata removed for each group should be based on what information the user wants to share with the different groups. As an example, sharing the location information with their family is desirable for making a photo album, but it may not be desirable to share with colleagues. Upon upload, the user has to decide which group is the target audience and select group accordingly. An extension to group-based model attempts to make it more transparent by basing the different templates on which services retaining the metadata offers while showing which risks are accompanied with it. This way the user is made immediately aware of the implications and can make their decision based on this information rather than just a level of security. A graphical display of this model is shown in Fig. 4. IV. A NALYSIS In order for standard models to be adopted there are several considerations that have to be made, both with regard to the user and the service. The user experience must receive a minimal impact in terms of complexity of usage. In addition, the user must be enabled to make informed decisions based on sufficient, but not excess, information. Furthermore, for a service to implement such a model, it is important that the service’s interests remain satisfied. In this section we will analyse how the models are compared to each other as well as an existing approach. These days many SNS remove all metadata upon upload to protect users privacy [3]. In most cases users are not explicitly informed about this practice. In addition, what happens to the deleted metadata is not clear to everyone. This violates the principle of ”Openness, transparency and notice” which is one of the most important principles [4]. By being more transparent, users will be better informed and thus be able to make informed decisions regarding their privacy. In addition, this practice violates the principle of ”Consent and choice”. The point of consent might be defended by the user actually agreeing to the terms and conditions, but it has been shown that few people read these statements and they are
often hard to read [9]. Through the mentioned practice, SNS also violate the principle of ”Individual participation and access”. This principle states that the PII principals should be allowed to access and review their PII and should be given the possibility to amend, correct or remove their PII from the service, and also potential third parties to whom personal data has been disclosed. A. Evaluation of each models The hierarchical model is user friendly and has comprehensive privacy settings. However, even though the user has a concept of privacy level, they do not know which metadata is shared and they have little freedom of choice. Furthermore, a user may be more inclined to choose a higher security option than needed when offered a level based decision. To work around some of these issues, an extension was developed which gives the users the option to make changes to the template manually. This allows for personal participation and presents the user with the exact metadata being shared. By including the extension, the model becomes more transparent and might allow the user to make more informed decision on their choice of security level. SNS like Facebook provide users with different groups of contacts and their concern is not necessarily limited to protection of data, but also management of information distribution. In this case, the use of levels might not be a relevant way of distinguishing between privacy choices. The group-based model provides a more logical solution for such services and removes the concept of levels. This way, the issue with applying unnecessarily high settings is removed and the model is tailored to the service. The group-based model lacks in terms of transparency and personal participation unless the user is allowed to define their groups and their related metadata themselves. The extension to this model was developed in an effort to increase transparency and usability for both the user and the service. It shows the risks and benefits related to each group, making it more informative and allows the users to utilize the features of the service to the full. The main drawback with this extension is that the amount of information and setting complexity could be overwhelming for users. Finally, it is important to compare the models to an existing approach. The common approach is complete deletion of metadata without notifying the user. The proposed models provide predefined templates which meet generic privacy requirements. In addition, the extensions empower users to choose exactly which metadata is being deleted or shared. Further, they limit the collection of personal data which is of utmost importance in a sharing world where profiling databases grow bigger by the minute. V. C ONCLUSION In this paper, the vulnerabilities of the existing metadata handling methods are pointed out and analysed. The fact that
Fig. 4. Group-based model (green area) with risk and benefit extension (blue area)
different SNSs apply diversified and uncoordinated method of metadata management can endanger users. Neither sharing all metadata provided by user nor removing everything solves the problem. However a standardized metadata management model that can be adopted by wide range of services help to protect users‘ PII and copyright information. In this paper, hierarchical and group-based models are proposed to address the limitations associated with the existing methods. Hierarchical model best suits for image sharing services like Flickr, where images are accessible to anyone. On the other hand, groupbased model suits for SNS like Facebook, where users have different groups of contacts with specific nature. Both of the models provide a predefined template for the users to choose. In addition, the proposed extension to each model provides a more transparent, flexible and informative solution. Each service can choose a model based on their need. In the future, a hybrid approach will be developed which will combine the best features from each model. ACKNOWLEDGEMENT This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (NRF-2015R1A2A2A01004587). The authors would like to thank Dong Keming and Tariku Tsadik for helpful discussion. R EFERENCES [1] “Extensible metadata platform (xmp) specification: Part 1, data model, serialization, and core properties,” Adobe Systems Incorporated, 2012. [2] M. W. Group et al., “Guidelines for handling image metadata,” Version, vol. 1, no. 1, p. 46, 2009.
[3] Iptc study shows some social media networks remove rights information from photos. [Online]. Available: https://iptc.org/aboutiptc/media-releases/2013-03-12/ [4] “Information technology - security techniques - privacy framework,” ISO/IEC 29100, 2011. [5] “Jpeg privacy & security abstract and executive summary,” http://jpeg.org/items/20150910 privacy security summary.html, accessed: 2015-08-03. [6] G. Friedland and R. Sommer, “Cybercasing the joint: On the privacy implications of geo-tagging,” in Proceedings of the 5th USENIX Conference on Hot Topics in Security, ser. HotSec’10. Berkeley, CA, USA: USENIX Association, 2010, pp. 1–8. [Online]. Available: http://dl.acm.org/citation.cfm?id=1924931.1924933 [7] R. S. Sandhu, “Lattice-based access control models,” Computer, vol. 26, no. 11, pp. 9–19, 1993. [8] D. F. Ferraiolo and D. R. Kuhn, “Role-based access controls,” arXiv preprint arXiv:0903.2171, 2009. [9] “Special eurobarometer 431 data protection survey,” http://ec.europa.eu/public opinion/archives/ebs/ebs 431 sum en.pdf, 2015, accessed: 2015-10-20.