Privacy Aware eLearning environments based on ...

3 downloads 7370 Views 800KB Size Report
was used to send emails with marketing elements. (Washington .... email, credit card number etc. .... eLearning software systems: Web and application servers.
Privacy Aware eLearning environments based on Hippocratic database principles Jasmin Azemović, Ph.D. University „Džemal Bijedić“ Faculty of Information Technologies, Maršala Tita b.b, Mostar, Bosnia and Herzegovina, [email protected]

ABSTRACT Ensuring privacy in modern information systems is of primary importance for the users of these environments. Use and trust of users certainly depends on the degree of privacy. Solution for the above mentioned problems can be found in application of the „Hippocratic Databases – HDB concept". The idea is inspired by the basic principles of Hippocratic Oath to be applied on the databases in order to provide data privacy and confidentiality. Implementation and advantages of this concept have been researched for the needs of business intelligence systems and health information systems, but not of eLearning systems, until now. We have created a prototype model of e-learning environment that fully implements the principles of the HDB database. In order to prove the usability and viability of the model, we compared the performance of the production eLearning system with prototype model. The results of these studies are found in this research paper.

Keywords Privacy, eLearning, Hippocratic Database, privacy-aware, perfomance

1. INTRODUCTION The whole idea of Hippocratic databases (HDB) was inspired with basic principles of Hippocratic Oath with purpose of preserving privacy and secrecy in modern information systems. Initial research was presented by (R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic databases). Those two goals are one of the important issues in process of analyzing, designing and projecting. There are researches in area of using this concept in business intelligence systems (Bhatti, et all., 2008) and medical information systems [3], but not in eLearning systems. Our assumption starts from the idea that is possible to implement HDB principles for eLearning systems and it’s all specific components following ten principles of HDB and recommended standards. Contribution of our research are to create model of database based on all principles of Hippocratic databases for satisfying specific needs for privacy and security in dynamic eLearning environments.

But, major contribution is proven of usability HDB principles in this type of information system with accent on performance using privacy policies. An approach often taken is to enforce privacy policies at the application level [2]: First, the application issues the query to the database and retrieves the result. Then, the application scans the resulting records and filters prohibited information (for example, by setting it to null). However, this approach leads to privacy leaks when applied at the cell level. Take for example the student Denis decided to hide your name from the list of exam results in mathematics and that the system executes the following query: SELECT Name, Last Name, Score FROM results WHERE subject = Math. In this case the student's name will be hidden, but not

it’s rating because this is not the Privacy Policy. Very quickly it could be found that it was Denis although his name is not on the list. This way of defining the policy for the principal partial disclosure and should be more flexible to user needs. 2. PRIVACY SECURITY AND ACCESS CONTROL ISSUES Internet is showing constant growth of basic services for modern communication in all aspects of human life. Terms like eCommerce, eGovernment and eLearnig are just one of many in our life that have a status “usual” and they are products of using information technologies. Important thing to notice is that most of these areas are still in process of standardization. Usage of information technologies in selected areas are producing big amount of data every day. Amount of information in whole world is doubled every 20 months, number and size of databases is increasing even faster. (Office of the Information and Privacy Commissioner, Ontario. Data Mining: Staking a Claim on Your Privacy) [14] Beside this, internet with all his services enables collection and storing information about clients. The process can be fully automatic and without consent of information donor (business clients, students, patients or regular web

users). These facts address privacy and security issues in one of the first places in process of analysis, design and implementation of modern information systems. Terms like privacy, security and data access control was reserved for “big” databases and systems of special purpose. Now, those are imperatives and standards which every database system needs to fulfill. Privacy is right on decision of every individual: when, how and how much information will be available for storing and exchange between systems. (Alan Westin, Professor Emeritus of Public Law and Government, Columbia University) [1]. Examples of violating privacy are not rare; on the contrary numbers of tensional and intentional violations are increasing. We can easily find usage examples of collected information, totally illegal. Here are some situations which can be classified like serious violations of privacy.  UK Ministry of defense has confirmed stealing laptop with personal data of 600 000 applicants for armed forces. Database contains information like (passport ID, driving license number, information about families etc.). (www.securitypark.co.uk, 21.02.2008).  Leading USA provider of Health services, Kaiser, was sent 858 email messages with personal ID attribute including poll answers about patient disease, by mistake on wrong email addresses (Washington Post, 10 August 2000).  Toysmart Company was sold all collected data about customers and data about customers habits (www.ftc.gov/opa/2000/07/toysmart.htm)  Pharmacy chain allowed other pharmacy company access to the recipes data of patient. That information was used to send emails with marketing elements (Washington Post, 15 February 1998). Access control is a mechanism with only two purposes: prevent misusage of resources or at list get full details about that event. Functionality is based on access control rights and operations defined in access control matrices. In a nutshell access control defined who and how has access to specific resources These elements, on database level, are implemented on the object and row level. But that process is not easy to maintain. DBA (database administrator) has an important role to specify access control list for each user or group based on position and responsibility of each individual. Keeping that model operational is very time consuming, resources and money expensive and also source for common mistakes. Current trends and solutions put privacy issue beside on the priority list and leave it to company security policy to handle. Examples of privacy violation show us how that can finish. Access control and security

mechanisms should be parts of technology to provide and keep privacy of the data. 3. HIPPOCRATIC DATABASE IN eLEARNING ENVIRONMENT There is a growing trend of usage all kinds of eLearning systems (LMS, CMS, eLearning, eUniversity). Those systems are collecting a great amount of data and big part is very sensitive because it deals with private issues of students which can be misused. New trends require putting three security elements on the first place in all phases of developing eLearning information systems. Elements like electronic students file, grades, exams and mobility of study are very complex to build and maintain. That is because we need to protect content, services and personal data from outside intruder and also these systems carry a risk of privacy violation from inside staff (administers and educational staff). Solving this problem in eLearning environments, will provide creating similar solutions in other areas like: eGovernment, eHealth, eCommerce etc. One of the solutions can be applying researches from Hippocratic database (HDB) areas. The whole idea is inspired with basic principles of Hippocratic Oath which are applied to the database systems [3] “And about whatever I may see or hear in treatment, or even without treatment, in the life of Human beings – things that should not ever be Blurted out outside – I will remain silent, holding Such things to be unutterable” – Hippocratic Oath This research defined ten principles which if they are used and implemented properly can provide and guarantee privacy of data. A. Ten Hippocratic Database Principles Ten guiding principles of Hippocratic databases and initial designs to provide limited disclosure and compliance audition were introduced in [3]: 1. Purpose Specification: For personal information stored in the database, the purposes for which the information has been collected shall be associated with that information. 2. Consent: The purposes associated with personal information shall have consent of the donor of the personal information. 3. Limited Collection: The personal information collected shall be limited to the minimum necessary for accomplishing the specified purposes. 4. Limited Use: The database shall run only those queries that are consistent with the purposes for which the information has been collected.

5. Limited Disclosure: the personal information stored in the database shall not be communicated outside the database for purposes other than those for which there is consent from the donor of the information. 6. Limited Retention: Personal information shall be retained only as long as necessary for the fulfillment of the purposes for which it has been collected. 7. Accuracy: Personal information stored in the database shall be accurate and up-to-date. 8. Safety: Personal information shall be protected by security safeguards against theft and other misappropriations. 9. Openness: A donor shall be able to access all information about the donor stored in the database. 10. Compliance: A donor shall be able to verify compliance with the above principles.

we tacked only one part of environment that keeps student personal data. But model itself is not size depended.

For now there is no technical or commercial implementation of this or similar principles of HDB. In case of eLearning environment implementing principles oh HDB could prevent privacy violation which involved: students, educational and administration staff and significantly simplify access control policy administration tasks. Reasons for accessing private and other operational data could be: Getting student grades from other courses to create subjective picture about them, modifying data about grades, getting private email and phone numbers, credit card information etc. Area of misusage is very wide.

Modeling first principle means that every record in database should have very precisely defined purpose. In eLearning environments that can be: reporting, statistical analysis, entering grades, attending courses etc. In our model, following principle is implemented through student_purpose table. Objects are set between students and purpose tables and create many-to-many relation. That is necessary when one record needs to have many purposes (Figure 1.)

B. HDB principles in eLearning environment One of the contributions of this research is efficient model of one part of eLearning environment which implements all ten HDB principles. To keep model simple and clarified to analyze, we tacked only one part of environment that keeps student personal data. But model itself is not size depended. In following pages we will look at HDB principles but through our model that is shown in Figure 1. Special accent will be put on some “problematic” parts where some of the HDB principles are in conflict with eLearning functionality Purpose specification

Output of our research is prototype model of one part of eLearning environment which implements all ten HDB principles. To keep testing simple and clarified to analyze,

Figure 1. HDB eLearnig model prototype

Consent Donor of information, in our case student, has a full right to give or deny consent for usage of personal data on specific attributes which are not essential for system functionality like: ID attributes, first name, last name. From the other side, protected data can be: grades, phone number, email, credit card number etc. Proposed model solves this problem using object and attribute_consent tables. Also model gives possibility to provide or deny consent for each attribute in any table (Figure 1.) Limited collection Minimum requirements for amount of information about students that are necessary for business process, are defined by state laws and/or University policies. This principle can’t have precise technical implementation without universal set of rules about collection of data. For sake of HDB principle we propose that storing of data should be minimal. For example, information about places of birth or parent names is irrelevant from aspect of eLearning system Limited use Usage of each query and/or stored procedure (or some other object) from data access layer will be defined and tagged in corresponding table with its purpose. That aspect in model is implemented with dataAccess_purpose object (Figure 1.) Limited disclosure Generally it is very hard to define what is outside access in eLearning environments. Mostly, they are closed systems in University boundaries. Because of that there is no frequent need for outside access of student’s data. However there are Universities which are connected and open for student, knowledge and educational staff exchange. The whole Europe is now in Bologna process of reforming educational system. eLearning environments should support that process. Let’s consider the following example. A student from Bosnia and Herzegovina (BiH) decides to continue his education in Greece. eLearning system from Greece University demands access to the personal data from database in University from BiH. Purpose is to import relevant data about courses, grades and personal data preparing environment for student’s arrival. Our model supports foregoing scenario with external object. In our case Greece should not have access if there is no corresponding record in external table (Figure 1.) Limited retention The biggest problem in implementing HDB principles is information retention period in eLearning systems. In a nutshell this principle defines that records should be erased from database after fulfilling its purpose. But in eLearning and eUniversity systems limited retention period is not defined. For example, let’s imagine situation when a student finishes or leaves, University administration staff delete all electronic trails of education process. That action will neutralize any of following operations: retroactive analysis,

statistical analysis, providing diploma supplement and any other operations which involved usage of student’s data. The second contribution of this research is model which can provide solution of this paradox situation (Figure 2.). At the same time, the model keeps data and satisfies limited retention principle. We suggest that all students’ data should be de-normalized from relational model into data warehouse (DW). That process should be executed after a student finishes or leaves University. Next step is to use public key cryptography in order to protect data in to DW. One copy of the key should have a student and the other University. So data can be decrypted on personal request of student or with their permission. The important thing is that proposed model allows statistical analysis in to DW on protected data but without revealing student identity or any other personal data. That is possible to do because non-critical and personal identity not important data are not encrypted like: StudentID (database level identify attribute), year of assigning on study, grades on specific subject etc. From that information it is not possible to know: name, address, phone or any other sensitive personal data. (Figure 3.)

Figure 2. eLearning rettention period Student ID 154894 478954 114585 786554 325468

Last Name Heu^#7@ l;n$7^&# 1#*@(^# N*&^%# V#(0*0-

First Name js@sW# 4587!#? mqw#@ 34@#75 ^@#($&

Year

Grade1

Grade2

2006 2006 2007 2007 2008

6 10 7 8 10

8 9 6 8 10

Figure 3. Encrypted data into DW Accuracy Accuracy of data and preventing tampering on database object and data itself, we can provide using the existing model for tempering detection. That model is a result of one of our previous researches (Jasmin Azemović, Denis Mušić, Efficient model for detection data and data scheme tempering with purpose of valid forensic analysis) [6]

Safety Safety of personal data is provided by using role access model. Example: financial data should be accessed only by accounting department or grade from mathematics is totally unimportant for users from software engineering. Otherwise, access to non-need to know data can create preconceived opinion about some individual or misusage. Objects authorization and roles are providing this principle of HDB (Figure 1.) Openes A student should have, at any time, access to all private, personal and data collected during study process. This principle should be implemented from user interface. Compliance Information donor should have insight in usage and access history of personal or any other data. With this principle we can provide transparency for students and usage of others HDB principles. Our model supports this with student_access_log object that is connected to other tables and collects relevant data (Figure 1.) C. Prototype of HDB eLearning environment In this part we can look at model of implemented above presente4d and modeled HDB principles in eLearning prototype. This model was part of our previous research [5]. Figure 1 clearly show relational model of one part of information system. All ten principles were implemented in database level and access to data is ensured trough views. This model is tested and compared with classic eLearning environment. Accent was on performance.

CHARACTERISTIC OF CURRENT eLEARNING ENVIRONMENT (DLWMS) 4.

DLWMS 1(Distance Learning Web Management System) is an in-house web-based solution, which incorporates in its functionality, partly LMS and complete CMS. Was developed in order to support Distance Learning model of study at the Faculty of Information Technology.

The first step in research is to determine whether the current situation with the performance of the system for electronically learning. This was necessary for reasons of comparison of systems without the HDB and HDB elements. D. Indicators of system performance Many parameters have their influence on speed of execution of user requests to the IT system. When it comes to DLWMS 2, one should bear in mind that the web-based eLearning environment where performance is not always fast is a result of the application itself. In Figure 1 shows the architecture of your current system. Thus it is a web application, one of the performance indicators are customer and network resources. Whether a client comes with Wi-Fi, ADSL, and GPRS is an important but not relevant indicator for this study. The fast lane client is relative and there is no way that it can affect the same, if we look at the problem from the perspective of the system. On the other hand frontend component (Figure 4) is located in the home environment of the institution and represents the logic of eLearning software systems: Web and application servers. As this study deals with the database as a whole, the speed of code execution and performance of Web servers are also relative and not relevant indicator for this study. The only indication that an important part of measuring performance in the current system is that the frontend connected to the component backend (database) via a direct 1Gbps connection. Current input to our system with the clients is 10 Mbps; this clearly shows that the input is not nearly proportional to the related application servers with databases The component for which to measure performance is a database of e-learning environment. Speed of execution code that calls SQL applications with the goal of access to data is a key component. On the basis of measurement will be determined by reference parameters of the system and later compared with results obtained by measuring the performance of HDB prototype model. E.

Measurement parameters

Parameter

Measurement unit

CPU time Physical read Physical write Duration Request per second

%/sec. Pages/sec. Pages/sec. Sec. User/sec.

TABLE I. Figure 4. Arhitecture of DLWMS eLearning enviroment.

1

https://dl.fit.ba/dlwms2/

MEASUREMENT PARAMETERS

The above list of parameters is sufficient to determine the response rate to the basic system and exercising their

eventual optimization 2 . On the basis thereof shall be determined by how fast the existing system on the basis of the obtained data, and statistical analysis to determine at reference values to be input parameters for further research. In order to get as realistic data, the lab will be an actual hardware / software infrastructure DLWMS 2 system, which consists of one primary and two secondary server database that contains DLWMS 2 database. For the measurement is taken the primary server. Database server consists of following hardware and software components:  2 x AMD Opteron’s x64 (3 GHz per CPU)  4 GB RAM  2 x36 GB SCSI  Windows 2003 Server  SQL Server 2008 R2  Data and log files are physically separated into two SCSI disk to improve performance, and it is good practice to the database server.  16 GB database and 3000 users of the system  Discs are regular defragmented in order to reduce the number of operations of reading and writing to disk. Selection of measured parameters was performed on those parts where the most vulnerable users' privacy.  Student Services o Overview of personal data, student achievement, status information, payment  Access Control o Logging activity data access Measurements carried out 3 times in moments when the system was over 120 active users. The focus was on the loading personal profile of students by staff and simultaneous logging of user activity. This represented both measuring element.

TABLE III.

ACCESS CONTROL AVERAGE RESULTS

As can be seen from the tables, overall performance of eLearning systems in the Faculty of Information Technology are at a satisfactory level. All results were obtained in a production environment, and users have not complained about the performance at any time of measurement. Average 5. HDB PROTOTYPE MODEL PERFORMANCE MEASUREMENTS Our model is based on two connected research. First it is create an efficient model of eUnivesity system which is established on the principles Hippocratic database elements [4]. Second one is based on relational model of HDB elements for one part of eLearning system [5]. G. Results for HDB prototype As in the first case the measurement will be performed three times, but with the following differences. Since this is a prototype, it is not possible to make measurements in real conditions. However, the model will be loaded with 250 through the simulator's load to be as close as possible measurement parameters DLWMS system. During the measurements will be continuously simulated 21 simultaneous requests per second. Measured parameters are the same (Table 1).

F. Results for Students services part of DLWMS TABLE IV. TABLE II.

2

STUDENT SERVICES AVERAGE RESULTS

Gartner, http://www.gartner.com

STUDENT SERVICES AVARAGE RESULTS

TABLE V.

6.

ACCESS CONTROL AVARAGE RESULTS

ANALYSIS OF THE RESULTS

After comparison of the results obtained by measuring the reference parameters within 2 DLWMS environment and HDB eLearning prototype we can conclude the following:  Average CPU busy in any moment not crossed the threshold of 22%;  Parameters (reading, duration and number of requests) are deviated by approximately 40% compared to the reference value  Deviation of 40% is not a big problem, if we consider that the top values were observed when measuring DLWMS 2, fit well within the range of 200% - 300% compared to the reference The system of recording and when these extreme values behaved in accordance allowed deviations in terms of performance (Figure 4).

Figure 5. DLWMS 2 and HDB comparasion (Part 2)

 Average CPU occupancy as in the previous case did not exceed the value of 22%;  Parameters (reading, writing, duration and number of requests) are deviated in the range 10% - 30% which is less than the first group of measurements  Deviations of 10% - 30% have a negligible impact on performance  The system is at all times was within the allowable deviation in terms of performance The conclusion is that the comparison of the measured results of access control clearly showed that the HDB eLearning prototype successfully met all the requirements which are empirically proven. 7.

Figure 4. DLWMS 2 and HDB comparasion (Part 1) The conclusion is that the comparison of the measured results of inspection of personal data clearly showed that the HDB eLearning prototype successfully met all the requirements which are empirically proven. If you look at the results in terms of access control, we come to the following facts which are clearly visible in Figure 5:

FUTURE PLANS

In this moment privacy and data security is big issue. Exponential usages of social networks raise this issue on even bigger level. Users put and share their personal information on cloud base environments where they don’t have influence on their personal data. Next big shift in information technologies is cloud (operating systems, office tools, applications, SaaS, collaboration, etc.) all of this now exists in cloud version. Cloud computing has significant implications for the privacy of personal information as well as for the confidentiality of business and governmental information. While the storage of user data on remote servers is not new, current emphasis on and expansion of cloud computing warrants a more careful look at its actual and potential privacy and confidentiality consequences. A considerable amount of cloud computing technology is already being used and developed in various flavors (e.g., private, public, internal, external, and vertical).Not all types of cloud computing raise the same privacy and confidentiality risks. The definitional borders of cloud computing are much debated today. For present purposes, cloud computing involves the sharing

or storage by users of their own information on remote servers owned or operated by others and accessed through the Internet or other connections. Cloud computing services exist in many variations, including data storage sites, video sites, tax preparation sites, personal health record websites, photography websites, social networking sites, and many more. If we use all this fact in to consideration, future direction for HDB research are quite clear. Plans are to go in following directions     

Analyzing then principles and apply them to the cloud based systems Add new definitions if necessary Test cryptography usage in cloud base environments Create new model Test prototype

Reason for this is that classic HDB approach is not enough to protect privacy in cloud based environments. One of the biggest issues is that data in data centers are in jurisdiction of countries and governments where data centers is located. This requires different approach and some extra privacy insurance elements 8.

CONCLUSION

Privacy, security and access control are elements which are implemented on the object and row level of database. Keeping that model operational is very hard, resources and money expensive and also source for common mistakes. Current trends and solutions put privacy issues beside on the priority list and leave them to company security policy to handle. Examples of privacy violation show how that can finish. Access control and security mechanisms should be parts of technology to provide and keep privacy of the data. Result of our research in area of privacy preserving in eLearning environment is a normalized relational model. Because of simplicity, we have taken only one part of environment that keeps student’s personal data. But the model itself is not size depended. The model implements all ten principles of Hippocratic databases. In order to prove the usability and viability of the model, we compared the performance of the production eLearning system with prototype model. We are proven the fact that the application of the model is especially effective for the protection of privacy and that all measuring parameters are the permissible range.

9.

REFERENCES

[1]. Alan Westin, Professor Emeritus of Public Law and Government, Columbia University [2]. P. Ashley and D. Moore. Enforcing privacy within an enterprise using IBM Tivoli Privacy Manager for ebusiness [3]. R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic databases. In The 28th International Conference on Very Large Databases (VLDB) [4]. D.Mušić, J.Azemović, Mohamed El-Zayat, „Component of the efficient eUniversity system“, 2009 The 2nd IEEE International Conference on Computer Science and Information Technology [5]. V.Bevanda, J.Azemović. D.Mušić, „Privacy preserving in eLearning environment (Case of modelling Hippocratic database structure)“, 4th Balkan Conference in Informatics, Thessaloniki, Greece, 2009. [6]. J.Azemović, D.Mušić, „Efficient model for detection data and data scheme tempering with purpose of valid forensic analysis“, ICCEA 2009, Manila, Philippines. [7]. D.Mušić, J.Azemović, „Applying Case-based reasoning for mobile support in diagnosing infective diseases“, ICCDA 2009, Singapore. [8]. D.Mušić, J.Azemović. E.Čatrnja, „Influence of learning communities and collaborative learning on students’ success“, Chennai, India ICSTE 2009. [9]. Borka Jerman-Blažić, Tomaž Klobučar: Privacy provision in e-learning standardized systems: status and improvements, Elsevier, Science Direct, Computer Standards& Interfaces 27 (2005), 561-578, [10]. Sabah S. Al-Fedaghi: Beyond Purpose-Based Privacy Access Control, Eighteenth Australasian Database Conference (ADC 2007), Ballarat, Australia. CRPIT, 63. Bailey, J. and Fekete, A., Eds. ACS. 23-32. [11]. Kristen LeFevre , Rakesh Agrawal , Vuk Ercegovac , Raghu Ramakrishnan , Yirong Xu , David DeWitt, Limiting disclosure in hippocratic databases, Proceedings of the Thirtieth international conference on Very large data bases, p.108-119, August 31-September 03, 2004, Toronto, Canada [12]. Norjihan Abdul Ghani, Zailani Mohd Sidek: Hippocratic Database : A Privacy- Aware Database, Proceedings of World Adademy Of Science, Engineering and Technology Volume 32, August 2008, ISSN: 2070-3740 [13]. Jae-Gil Leey, Kyu-Young Whangy, Wook-Shin Hanz, IlYeol Songx: Hippocratic XML Databases: A Model and an Access Control Mechanism, Journal of Computer Systems Science and Engineering, Vol. 21, No. 6, pp. 395 ~ 404, Nov. 2006 [14]. Office of the Information and Privacy Commissioner. Data Mining: Staking a Claim on Your Privacy [15]. Dragan Pleskonjić, Nemanja Maček, Borislav Đorđević, Marko Carić: Sigurnost računarskih sistema i mreža, Mikro knjiga 2007, ISBN 978-86-7555-305-2