A Proof-Carrying Code Based Framework for Social Networking

0 downloads 94 Views 335KB Size Report
rent systems such as CSP, CCS, state charts, temporal logic and I/O automata. ... tion of a system describes a number of
A Proof-Carrying Code Based Framework for Social Networking

Sorren C. Hanvey

Advisor: Nestor Catano University of Madeira [email protected]

Madeira Interactive Technologies Institute Portugal

.

Contents 1 Introduction 1.1 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Formal Methods . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Formal Specification and Modeling . . . . . . . . . 1.3.2 Formal verification . . . . . . . . . . . . . . . . . . 1.3.3 Implementation . . . . . . . . . . . . . . . . . . . . 1.4 Formal Methods for Privacy . . . . . . . . . . . . . . . . . 1.4.1 Modeling a system . . . . . . . . . . . . . . . . . . 1.4.2 Abstraction and Refinement . . . . . . . . . . . . . 1.4.3 Policy Composition and Conflicting Requirements 1.5 Matelas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Access Permissions for Social-Networks in B . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

3 4 5 7 7 8 10 10 10 11 11 11 12

2 State of Art 2.1 Proof Carrying Code . . . . . . . . . . . . . 2.2 Formal Methods Tools . . . . . . . . . . . . 2.2.1 SMT - Solvers . . . . . . . . . . . . 2.2.2 The Java Modeling Language (JML) 2.2.3 Krakatoa/WHY Toolset . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

15 15 17 17 19 19

3 Proposed PCC Framework 3.1 The Source Program . . . . . . . . . . 3.2 The Compiler and Byte Code Modules 3.3 Safety Policy . . . . . . . . . . . . . . 3.4 Verification Condition Generator . . . 3.5 The Prover/Proof Checker . . . . . . . 3.6 The Certificate . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

22 22 23 24 24 30 31

. . . . . .

. . . . . .

. . . . . .

Chapter 1

Introduction Since their introduction, social network sites (SNSs) [11] have attracted millions of users, many of whom have integrated these sites into their daily practices. As of this writing, there are hundreds of SNSs, with various technological affordances, supporting a wide range of interests and practices. While their key technological features are fairly consistent, the cultures that emerge around SNSs are varied. Most sites support the maintenance of pre-existing social networks, but others help strangers connect based on shared interests, political ones, or activities. Some sites cater to diverse audiences, while others attract people based on common language or shared racial, sexual, religious, or nationalitybased identities. Scholars from disparate fields have examined SNSs in order to understand the practices, implications, culture, and meaning of the sites, as well as users’ engagement with them. We define social network sites as web-based services that allow individuals to construct a public or semi-public profile within a bounded system, articulate a list of other users with whom they share a connection and traverse their list of connections and those made by others within the system. The nature and nomenclature of these connections may vary from site to site. What makes social network sites unique is not that they allow individuals to meet strangers, but rather that they enable users to articulate and make visible their social networks. This can result in connections between individuals that would not otherwise be made. On many of the large SNSs, participants are not necessarily ”networking” or looking to meet new people; instead, they are primarily communicating with people who are already a part of their extended social network. To emphasize this articulated social network as a critical organizing feature of these sites, we label them ”social network sites.” While SNSs have implemented a wide variety of technical features, their backbone consists of visible profiles that display an articulated list of Friends who are also users of the system. Profiles are unique pages where one can ”type oneself into being” . After joining an SNS, an individual is asked to fill out forms containing a series of questions. The visibility of a profile varies by site and according to user discretion. Structural variations around visibility and

4

Chapter 1 – Introduction

access are one of the primary ways that SNSs differentiate themselves from each other. After joining a social network site, users are prompted to identify others in the system with whom they have a relationship. Most SNSs require bidirectional confirmation for Friendship, but some do not. The term ”Friends” can be misleading, because the connection does not necessarily mean friendship in the everyday vernacular sense, and the reasons people connect are varied. There are a number of issues pertaining privacy and security that have been identified in the use of such social networking sites. These issues have been discussed further in the sections below. One of the major issues addressed though the work presented here is that people have very little control over any form of data collection and usage by third party plug-ins. We describe a solution based on the concept of Proof Carrying Code [29, 30] which allows the system to take control by specifying constraints on the ways in which its users data can be used. These constraints are based on the Social Network Model defined in Matelas [12] described later in this report, and organizations that builds plug-ins for this Social Network must provide formal proofs that their application meets these constraints. Checking the proof by an independent verifier demonstrates that constraints are (or are not) respected by the third party application. We discuss how our framework can be put into practice, and we present the technical aspects and decisions as and when required.

1.1

Privacy

The public display of connections is a crucial component of SNSs. The Friends list contains links to each Friend’s profile, enabling viewers to traverse the network graph by clicking through the Friends lists. On most sites, the list of Friends is visible to anyone who is permitted to view the profile, although there are exceptions. Most SNSs also provide a mechanism for users to leave messages on their Friends’ profiles or walls. This feature typically involves leaving ”comments,” although sites employ various labels for this feature. Popular press coverage of Social Network Sites has emphasized potential privacy concerns, primarily concerning the safety of younger users [20, 25]. Researchers have investigated the potential threats to privacy associated with Social Network Sites. In one of the first academic studies of privacy and Social Network Sites, Gross and Acquisti (2005) [21] analyzed 4,000 Carnegie Mellon University Facebook profiles and outlined the potential threats to privacy contained in the personal information included on the site by students, such as the potential ability to reconstruct users’ social security numbers using information often found in profiles, such as hometown and date of birth. Acquisti and Gross (2006) [1] argue that there is often a disconnect between students’ desire to protect privacy and their behaviors, a theme that is also explored in Stutzman’s (2006) [18] survey of Facebook users and Barnes’s (2006) [33] description of the ”privacy paradox” that occurs when teens are not aware of the public nature of the Internet. In analyzing trust on social network sites,

1.2 Security

5

Dwyer, Hiltz, and Passerini (2007) [15] argued that trust and usage goals may ˜ affect what people are willing to shareNFacebook users expressed greater trust in Facebook than MySpace users did in MySpace and thus were more willing to share information on the site. Privacy is also implicated in users’ ability to control impressions and manage social contexts. Boyd [14] asserted that Facebook’s introduction of the ”News Feed” feature disrupted students’ sense of control, even though data exposed through the feed were previously accessible. Preibusch, Hoser, Gurses, and Berendt (2007) [34] argued that the privacy options offered by Social Network Sites do not provide users with the flexibility they need to handle conflicts with Friends who have different conceptions of privacy; they suggest a framework for privacy in Social Network Sites that they believe would help resolve these conflicts. To overcome this problem, we propose to use formal methods techniques [31] to build a core social network application based on a Proof-Carrying Code [29] Framework that enforces privacy and security social-network policies across any plug-ins for the social network. The key aspect of this ongoing work is the combined use of mathematical formalisms and formal methods tools to ensure that the implemented social-network application adheres to stipulated privacy and security properties. The implementation of such a core social-network application poses several interesting research challenges.

1.2

Security

Multiple Independent Levels of Security (MILS) [32, 9] architecture stipulates a layered approach to security. At the foundation is the MILS separation kernel, a small, real- time microkernel that implements the following functional security policies: 1. Control Information flow: Information flow always goes from an authorized source to an authorized target. 2. Data isolation: Data in a partition is accessible for that partition only, and private data remains private. 3. Fault isolation: If a bug or virus damages a partitioned application, the damage might spread other applications, although those applications should be provided a way to be aware of the damage and to react to it. The Social network core application implementing PCC must realize these policies to prevent privacy breaches by a user using a third party applications. We can borrow from the MILS architecture and implement the above policies in our system as follows: 1. Control Information flow: A partition might be defined as the data available to the plug-in user and his friends in the social-network. Is theory we

6

Chapter 1 – Introduction

are looking at controlling information flow across user profiles as opposed to physical address as in MILS architecture. In the MILS Architecture Partitioned Information Flow deals with allowing information to be relayed from one partition to another only if both partitions have been granted access to that information. Our B model for social-networks already provides support to Partitioned Information Flow Policy with socialnetworks invariants which ensure that information is only shared with the people who have the correct access privileges over it. The concept of friendship grouping, i.e., Assigning all friends a user has on the social network into varying levels such as Best friend, Social friend or acquaintances allows us to control information flow especially when information must be relayed automatically across varying partitions. For example if any social friend of the user is granted some permission on some content owned by the user, then the user’s best friends will all have the same permission on that content. This is due to the best friends belonging to a grouping at a higher privilege level that social friend. Content is data (e.g. text, video, a photo, etc). PCC (Proof Carrying Code) would be use to ensure PIFP in run-time. For example if a user uses an application to automatically upload photos from their computer to the network, the PCC verifier will ensure that the application being used does not grant privileges over these photos to anyone not approved based on the user’s preferences. 2. Data isolation: The concept of ”partition” in RTOS (Real-time Operating Systems) becomes the notion of ”friendship” in the social network domain. We treat a user called plug-in as a clone on the user himself awarding it the same access privileges as the user that called it there by ensuring that the plug-in only has access to the data accessible by the user. Before the plug-in application quits it must scrub any information that it collected so as to prevent any privacy breaches and all its access privileges are then revoked. Thus we can ensure data isolation within the PCC model. 3. Fault isolation: Faults in the traditional sense would be ensured at the programming language level with the usual try-catch statements of Java or C. In a security sense as the plug-in application only has the access privileges of the user that called it it will not be able to gain access to any other content out of it’s scope due to the rigorous privacy and security policies modeled into the system. These policies will be the safety rules that the application is tested against in the PCC verifier and the application must satisfy these rules before it can be used. The MILS architecture also specifies enforcement of these policies such that they are: Non-Bypassable, Always Invoked, Tamperproof and Verifiable. The requirement that the policy enforcement be Verifiable is absolutely critical. This is achieved by us by the very way in which the system is modeled and implemented. With the use of the B-Method we can ensure that all policies are consistent and verified before implementation. For third party applications our PCC verifier will ensure that all policies (safety rules) are adhered to. The

1.3 Formal Methods

7

use of PCC and our PCC verifier ensures that all third party applications are properly vetted before being made available to the users which in turn ensures that they are non-bypassable and always invoked.

1.3

Formal Methods

Privacy means something different to everyone. One can argue technology’s dual role in privacy: new technologies raise new threats to privacy rights and new technologies can help preserve privacy. Formal methods, as just one class of technology, can be applied to privacy, but privacy raises new challenges, and thus new research opportunities, for the formal methods community. [36] Formal methods differ from other design systems through the use of formal verification schemes, that is, the basic principles of the system must be proven correct before they are accepted. Traditionally extensive testing is carried out to verify behavior, but testing is capable of only finite conclusions. Such methods of testing can only show the situations where a system won’t fail, but cannot say anything about the behavior of the system outside of the testing scenarios. In contrast, using formal methods, once a theorem is proven true it remains true thereby guaranteeing the systems behavior. It is very important to note that the use of formal methods does not remove the need for testing. Formal methods help identify errors in reasoning which would otherwise be left unverified but do not fix bad assumptions in the design. Formal methods come in several flavors in software development and verification, e.g, formal modeling, formal modeling, and deductive verification.

1.3.1

Formal Specification and Modeling

The formal specification of a system is a description of its behavior using a mathematical notation. The advantage of using such a notation over the more ambiguous natural language and diagrams is that it is precise and therefore more suitable for use in writing specifications. The only major disadvantage is that most people understand natural language better than mathematical notation. A considerable amount of time and effort must be spent to first learn the specification and then gain experience in its use, before its full benefits can be attained. A specification language may be used as a design tool and, if the notation is readable enough, as a documentation tool [10]. The problem with using mathematics alone is that specifications can become unmanageable and unreadable. In addition to the basic mathematical notation a schema notation is included to aid the structuring of specifications. This provides the framework for a textual combination of sections of mathematics using schema operators. Many of these match equivalent operators in the mathematical notation. As well as the formal text, a specification may contain some other natural language to explain the mathematical description. However, if there is a conflict

8

Chapter 1 – Introduction

between the two descriptions, that is the formal and informal, the mathematics or the formal is the chosen since it provides a more precise specification. Therefore formal specifications are often written in a specialized specification language with precise semantics. There are a number of such languages including Z [10], JML [26] [27] and B [2]. These languages describe the states of a system by means of mathematical objects such as sets, relations and functions relying on the concepts of discrete mathematics. A set of pre-conditions and post-conditions, logical formulae describe certain properties before and after the transition between these states. These languages predominantly deal with sequential systems. There are number of languages that deal with concurrent systems such as CSP, CCS, state charts, temporal logic and I/O automata. These languages generally have much simpler states described using simple domains such as the integers but more complicated transition functions that are often described using sequences, trees or partial orders of events. [17] A formal specification is vital for the application of formal verification techniques as it can be compared logically with the system model.

1.3.2

Formal verification

Formal verification is the process of checking whether a design satisfies some requirements (behavioural properties). We are concerned with the formal verification of designs that are formally specified. As described earlier the specification of a system describes a number of states and transition logics for transition within the states. States and transition between states constitute Finite State Machines (FSMs). The entire system is an FSM, which can be obtained by composing the FSMs associated with each component. Given a present state, the next state of an FSM can be written as a function of its present state and transition logic. As stated above specification relies on discrete functions. Discrete functions can be represented conveniently by a binary decision diagram or a multi-valued decision diagram. We use these diagrams to represent the transition functions, the inputs, the outputs and the states of the FSMs.. The two most popular methods for automatic formal verification are model checking and theorem proving: Model Checking Model checking [4] is the most successful approach that’s emerged for verifying requirements. A model-checking tool accepts system requirements and a specification that the final system is expected to satisfy. The tool then outputs yes if the given model satisfies given specifications and generates a counterexample otherwise. The counter example details why the model doesn’t satisfy the specification. By studying the counter example, it is possible to pinpoint the source of the error in the model, correct the model, and try again. The idea is that by ensuring that the model satisfies enough system properties, we increase our confidence in the correctness of the model. The system requirements are called models because they represent requirements or design. There is no standard language used for defining models, since requirements for systems in different application domains vary greatly in size, structure, complexity, nature

1.3 Formal Methods

9

of system data, and operations performed. Most real-time embedded or safetycritical systems are control-oriented and lay more emphasis on the dynamic behaviour of the system rather the structure of or operations performed on the internal data maintained by the system. For control-oriented systems, finite state machines (FSM) are widely accepted and used. Further revisions of FSM have yielded Extended Finite State Machines (EFSMs) which are more suited for real-life industrial systems. Most model checking tools have their own rigorous formal language for defining models, but most of them are some variant of the EFSM. Model checking though useful has a few disadvantages. The major disadvantage is that it generally does not work with systems that have infinite state spaces. This problem is due to the fact that it must check all states in the space, it does so by using finite state machines. FSMs cannot handle infinite state spaces and therefore model checking encounters a problem. However these problems can sometimes be gotten around, either by combining model checking with principles based on abstraction or induction or by providing a partial proof by restricting the space [16]. A major threat that model checking faces is the ”state space explosion” problem. This mainly occurs in systems in which there are many interacting components or data structures with wide ranging values. In these systems the number of states in the state space can be extremely large. This results in significantly larger time and space requirements, and there is a risk that the computer could run out of memory space before the process is completed. Two main approaches have been proposed to cope with the state explosion problem, Symbolic algorithms and Partial order reduction. Theorem Proving Theorem Proving [19] deals with the verification of computer programs that show that some statement or the conjecture is a logical consequence of a set of statements or the axioms and hypotheses. Theorem proving is used in a wide variety of domains given an appropriate formulation of the problem as axioms, hypotheses, and a conjecture. The language in which the conjecture, hypotheses, and axioms are written is referred to as logic. The specification written in a formal language can be manipulated by the theorem prover. This formality is the underlying strength of Theorem proving as there is no ambiguity in the statement of the problem. The problem at hand must be described precisely and accurately, and this process in itself can lead to a clearer understanding of the problem domain. The proofs produced by this method describe how and why the conjecture follows from the axioms and hypotheses. The proof output may not only be a convincing argument that the conjecture is a logical consequence of the axioms and hypotheses, it often also describes a process that may be implemented to solve some problem. Automated Theorem Proving systems are enormously powerful computer programs, capable of solving immensely difficult problems. Their application and operation needs to be guided by an expert in the domain of application, in order to solve problems in a reasonable amount of time. The interaction may be at a very detailed level, where the user guides the inferences made by the system, or at a much higher level where the user determines intermediate lemmas to be proved on the way to the proof of a

10

Chapter 1 – Introduction

conjecture. There is often a synergetic relationship between ATP system users and the systems themselves [19]: The system needs a precise description of the problem written in some logical form, the user is forced to think carefully about the problem in order to produce an appropriate formulation and hence acquires a deeper understanding of the problem, the system attempts to solve the problem, if successful the proof is a useful output, if unsuccessful the user can provide guidance, or try to prove some intermediate result, or examine the formulae to ensure that the problem is correctly described, and so the process iterates. ATP is thus a technology very suited to situations where a clear thinking domain expert can interact with a powerful tool, to solve interesting and deep problems. One major advantage that theorem provers have over model checkers is that they are easily able to deal with infinite spaces. By using mathematical methods such as induction, they are able to prove that properties hold over infinite domains relatively simply.

1.3.3

Implementation

Once the system has been model, it is implemented by converting the specification into code. This might be done in a number of ways from manually to automatic code generation with the use of tools such as Atelier B [5].

1.4

Formal Methods for Privacy

The following are just a few examples of how formal methods can help overcome some of the major hurdles faced today in ensuring privacy.

1.4.1

Modeling a system

Traditionally one model’s a system, its environment and the interactions between the two, while simply making assumptions about the environment in which the system operates. We cannot make assumptions about an adversary the way we might about other failures and must include the adversary as part of the system’s environment. Privacy involves three entities: the data holder (e.g. the social network), an adversary, and the data subject (i.e., the user). Here we notice the inherent difference between security and privacy: In security, the entity in control of the system also has an inherent interest in its security. In privacy, the system is controlled by the data holder, but it is the data subject that benefits from privacy. Formal methods akin to proof carrying code (discussed later in the report), which requires the data holder to provide an easy-to-check certificate to the data subject, might be one way to address this kind of difference. Privacy requires modeling different relationships among the (minimally) three entities. Complications arise because relationships do not necessarily enjoy

1.5 Matelas

11

simple algebraic properties and because relationships change over time.

1.4.2

Abstraction and Refinement

Methods that successively refine a high-level specification to a lower-level one, until executable code is reached, rely on well-defined correctness preserving transformations. Some privacy relevant properties, such as secrecy, are not trace properties, thus, while a specification may satisfy a secrecy property, a refinement of the specification might not. The use of Formal Methods such as the B-Method adopted here allows us to ensure that the above does not happen.

1.4.3

Policy Composition and Conflicting Requirements

Different components of a system might be governed by different policies or that one system might be governed by more than one policy, we must also provide methods of compositional reasoning. Consider two components, A and B, and privacy policies, P1 and P2, if A satisfies P1 and B satisfies P2, we must be able to define rule concerning the composition of A and B with respect to P1, P2, and P1 ◦ P2. Trustworthy computing requires balancing privacy with security, reliability, and usability. These properties can at time have conflicting requirements and we need to have a formal understanding of the relationships among these properties. For example, we want auditability for security, access to what user performs what actions, to determine the source of a security breach. However, auditability is at odds with anonymity, a desired aspect of privacy. To achieve reliability, especially availability, systems often replicate data at different locations; replicas increase the likelihood that an attacker can access private data and make it harder for users to track and manage their data. These trade-offs must be studied and formalized to gain a better understanding of the system.

1.5

Matelas

The PCC framework builds on the ongoing work on Matelas [12], a B predicate calculus definition for social networking, modelling social-network content, privacy policies, social-networks friendship relations, and how these relations effect users’ policies. The above definition would be used to construct a sound socialnetwork core application that adheres to privacy and security policies defined. Matelas propose to use refinement calculus [22, 23] to build the core application. The work done on the above poses a number of fascinating research questions [12]: • How can we construct a viable, secure social network? • How can we model the trade-off between the ”buzzness” of a social network against the privacy restrictiveness placed on user content?

12

Chapter 1 – Introduction

• What are all the privacy and security issues that a secure social network imposes? • How can one guarantee that personal data is secure when one is no longer prepared to assume that the maintainers of the application are totally honest? • How can one enforce authentication and access control mechanisms that are trustable, whilst reducing the responsibilities of the social network systems in enforcing such policies?

1.5.1

Access Permissions for Social-Networks in B

The following presents ongoing work [12] in modelling social-networks in predicate calculus.It uses B notation [2] to present the proposed model. The B social-network abstract specification would model social-network content, social networks friendship relations, and privacy on content. Privacy issues have generated a bunch of theories, and approaches [35]. Nonetheless, as stated by Anita L. Allen in [3], “while a no universally accepted definition of privacy exists, definitions in which the concept of access plays a central role have become increasingly commonplace”. Following Allen’s approach,privacy is modeled with the aid of a relation that registers users’ access privileges on social-network resources, and a content ownership relation. The abstract specification model for social networks defined in Matelas starts by distinguishing the independent aspects of social networks, for example, user content and privacy issues, friendship relation in social-networks, user content and how it is affected by friendship relations, external plug-ins, and the user interface. Starting with the abstract specification model for social-networks, it is refined into a social-networking core system. A first abstract model defined in Matelas views the system as composed of users and “raw content”, representing photos, videos, or text that a person has in his personal page. Four relations concerning raw contents were modelled at this level: content, visibility, ownership, and access privileges. The “content” relation associates a person with all raw contents currently in the person’s page. Each user owns some of the content in his page. The “visible” relation associates a person with visible raw content. Visible raw contents are those raw contents a user is allowed to view at some point. Only those raw contents for which a user has “view” privilege can be visible. The “content” relation contains the “visible” relation. The “view” privilege and other types of privileges (e.g., edition of a particular content) are to be defined in the access privileges relation act. Elements in act might be defined as triplets (rc, op, pe) stating that person pe has op privilege on raw content rc. In B language notation a triplet (a, b, c) is written a 7→ b 7→ c. Based on the above the following invariant properties of the abstract model state were defined [12], (1) the owner owner(rc) of a raw content rc has all

1.5 Matelas

13

Figure 1.1: Proposed system architecture. [12] privileges over it1 , (2) each raw content owned by a user is in the user’s page content, (3) a raw content is visible for a user only when the user has “view” privilege over it, and (4) all user’s visible raw contents are in the user’s page. (1) ∀rc.(rc ∈ rawcontent ⇒ (∀op.op ∈ OP S ⇒ ( rc 7→ op 7→ owner(rc) ) ∈ act)) (2) ∀rc.(rc ∈ rawcontent ⇒ (owner(rc) 7→ rc) ∈ content) (3) ∀(rc, pe).(rc ∈ rawcontent ∧ pe : person ⇒ ((pe 7→ rc) ∈ visible ⇒ (rc 7→ view 7→ pe) ∈ act)) (4) visible ⊆ content The model also defines actions (so-called “operations”) for creating, transmitting, making visible, hiding, editing, commenting and removing a raw content. As an example code for the operation representing a user removing from his page a raw content owned by some other user is shown in Table 1.1. The pre-conditions requires the user in question not being the owner. A user can only remove visible raw contents. The SELECT clause has two cases. The first one is when the rc to be eliminated is not the only one present in pe’s page. The second one is the opposite. Since the web page of each person in the system must have at least one content, pe must be deleted from the system in this case. In B notation, C C r and r B C denote restriction of a relation r to a subset C 1 OP S

is the set of privilege types in the system.

14

Chapter 1 – Introduction

Table 1.1: Operation for removing a raw content [12] remove rc ( rc , pe ) = PRE rc ∈ RAW CON T EN T ∧ rc ∈ rawcontent∧ pe ∈ P ERSON ∧ pe ∈ person∧ pe 7→ rc ∈ visible ∧ pe 6= owner(rc) THEN SELECT pe ∈ dom(content − {pe 7→ rc}) THEN visible := visible − {pe 7→ rc} k content := content − {pe 7→ rc} k act := act − {rc 7→ view 7→ pe} k WHEN pe 6∈ dom(content − {pe 7→ rc}) THEN visible := {pe} C − visible k content := {pe} C − content k act := act B − {pe} k person := person − {pe} END END

of its domain and its range, respectively. Similarly, C C − r and r B − C denotes restriction of a relation r to elements of its domain or range, respectively, not belonging to C. [12]

Chapter 2

State of Art 2.1

Proof Carrying Code

Proof-Carrying Code (PCC) [29, 30] is a technique that is implemented to ensure that a third party application built to plug-in to an existing system adheres to all of the policies defined by the developers of the existing system. The code consumer establishes a set of safety rules that guarantee safe behavior of programs, and the code producer creates a formal safety proof that proves, for the un-trusted code, adherence to the safety rules. The code consumer can then check that the proof is valid and plug-in is safe to execute. The typical process of generating and using proof-carrying code is shown in Figure 2.1 below.

Figure 2.1: Overview of Proof Carrying Code Framework [8] The framework in the figure above works as follows: The Code Producer develops a plug-in and feeds the Source Program of the same into a Compiler.

16

Chapter 2 – State of Art

This compiler shall translate the Source Program in to Byte Code. This Byte Code feeds into the Verification Generators on both the Code Producer and Code Consumer sides. The Verification Generators (VCGen) accept the Byte Code and the Safety Policy, that the Code Consumers have established, and generates a set of Verification Conditions (VCs). VCs are a set of conditions that must hold true to prove that the plug-in adheres to the Safety Policy. The Code Producer must prove that the VCs are satisfied. They do so by feeding the the VCs into a Prover that dispels all the VCs and generates a Certificate of Proof. This Certificate feeds into the Proof Checker along with the VCs generated by the code consumer. Based on the information in the Certificate the Proof Checker attempts to dispel all the VCs. If it is successful the plug-in is safe to execute and the Byte Code provided to the the Code Consumers is executed. There are many advantages in using PCC [29]: • Almost the entire burden of ensuring security is shifted to the code producer. The host has to only perform a fast, simple, and easy-to-trust proof checking process. • PCC programs are tamperproof, in the sense that any modification will either result in a proof that is no longer valid or one that does not correspond to the enclosed program. In both these cases the program will be rejected. • No trusted third parties are required because PCC is checking intrinsic properties of the code and not its origin. In this sense, PCC programs are self-certifying. • PCC is completely compatible with other approaches to un-trusted code security. Therefore when a system obtains a piece of software from third party developers meant as a plug-in, it needs to ascertain that that application is safe to execute, that is, it accesses only the information or files that it has access to and respects the private invariants of the system to which it is linked. The primary components on the receiver side in implementing this methodology are 1. Safety Policy, 2.PCC verifier The safety policy provides the third-party developer (TPD) with a set of safety rules which their code (TPC) must obey and an interface to the system. These will be derived from the invariants defined in the core application under development. The format of these rules would be based on the design of the PCC verifier and decisions made during development of same. These decisions would be based on the dependancies between the format of these rule and the platform of development of both the core application and the TPC. It would be necessary to define a translator to convert the safety proof provided by a TPD before feeding it into our verifier. This may not be feasible if the dependancies on the platform of development are high. Due to this we plan to restrict the

2.2 Formal Methods Tools

17

possible platforms of development to Java /JML or C with the use of ACSL both of which are discussed later in this document. The PCC Verifier checks the third party code (TPC) against the safety rules and ensure that they are all adhered to.

2.2 2.2.1

Formal Methods Tools SMT - Solvers

The SMT-LIB [7] initiative was set up with the goal of establishing an online library of benchmarks for Satisfiability Modulo Theories systems. Logical formula that can be proved true under certain common theories is referred to as one of the satisfiable modulo theories. Common theories used include the theories of integers, real numbers and arrays or lists. The main reason for the establishment of the library is that there has been substantial advances in the field of proving satisfiability modulo theories and these benchmarks provide a means to evaluate and compare various SMT provers. In SMT-LIB, a benchmark is a closed SMT-LIB formula with additional information specified in a benchmark declaration. In addition to the formula itself, a benchmark declaration contains a reference to its background logic, and an optional specification of additional sort, function and predicate symbols. Benchmark declarations can also contain user-defined attributes and their values formalized again as annotations. In the SMT-LIB format, logical formulas are assumed to be checked for satisfiability, not validity. SMT-LIB calls an SMT solver, e.g. Yices. The solver must distinguish among: 1. A procedure’s underlying logic (e.g., first-order, modal, temporal, secondorder,etc) 2. A procedure’s background theory, the theory against which satisfiability is checked 3. A procedure’s input language, the class of formulas the procedure accepts as input. In order to compare these solvers and to promote their standard, in 2005 the SMT-LIB initiative set up an annual competition known as SMT-COMP. In this competition, solvers are scored on their speed and accuracy in checking various SMT-LIB benchmarks in various divisions based on different background theories. Yices Recent breakthroughs in boolean satisfiability (SAT) solving have enabled new approaches to software verification. SAT solvers can handle problems with millions of clauses and variables that are encountered in varied domains. SAT solving has thus become a major tool in automated analysis of finite systems. Satisfiability modulo theories (SMT) generalizes SAT by adding a number of

18

Chapter 2 – State of Art

useful first-order theories such as those related to equality reasoning and arithmetic. An SMT solver is a tool for deciding the satisfiability of formulas in these theories. SMT solvers enable application of bounded model checking to infinite systems. Yices [6] is such an SMT solver developed by SRI that decides the satisfiability of arbitrary formulas containing uninterrupted function symbols with equality, linear real and integer arithmetic, scalar types, recursive data types, tuples, records, extensional arrays, fixed-size bit-vectors, quantifiers, and lambda expressions. Yices integrates an efficient SAT solver based on the Davis-Putnam-LogemannLoveland (DPLL) algorithm with specialized theory solvers that handle the first-order theories. A core theory solver handles equalities and uninterpreted functions. It is complemented by satellite solvers for other theories such as arithmetic, bit vectors, or arrays.

Figure 2.2: Yices Architecture [6] The SAT solving algorithm used in Yices is a modern variant of DPLL. However, the Yices DPLL implementation is considerably more flexible than traditional SAT solvers to support efficient interaction with the core and satellite solvers. The latter are allowed to dynamically create literals and add clauses during the DPLL search, and to send explanations to the SAT solver when they detect inconsistencies or propagate implied literals. SMT-LIB Benchmarks and Yices together could be used as the theorem prover for our proof verifier. The safety proof provided by the TPD would be added to the SMT-LIB benchmarks which would then feed into Yices. Yices would take as input the Verification Conditions generated by the framework and based on the new benchmarks would verify whether the proof holds for our safety rules. In this approach we would need to build a component to translate the safety proofs provided to us into the format required to add a benchmark to the SMT Library.

2.2 Formal Methods Tools

2.2.2

19

The Java Modeling Language (JML)

JML is a specification language for Java that provides support for B. Meyer’s design-by-contract principles [28]. The idea behind the design-by-contract methodology is that a contract between a class and its clients exists. The client must guarantee certain conditions, called pre-conditions, to be able to call a method of the class. In return, the class must guarantee certain conditions, called postconditions, that will hold after the method is called. JML specifications use Java syntax, and are embedded in Java code within special marked comments /*@ ... @*/ or after //@. A simple JML specification for a Java class consists of pre- and post-conditions added to its methods, and class invariants restricting the possible states of class instances. Specifications for method pre- and post-conditions are embedded as comments immediately before method declarations. JML predicates are first-order logic predicates formed of side-effect free Java boolean expressions and several specification-only JML constructs. Because of this side-effect restriction, Java operators like ++ and -are not allowed in JML specifications. JML provides notations for forward and backward logical implications, ==> and Q) and (\exists T x; P && Q) respectively. JML supports the use of several mathematical types such as sets, sequences, functions and relations, in specifications. JML specifications are inherited by subclasses – subclass objects must satisfy super-class invariants, and subclass methods must obey the specifications of all super-class methods that they override. This ensures behavioural sub-typing. That is, a subclass object can always be used (correctly) where a super-class object is expected. In the following, we briefly review JML specification constructs, and the JML common tools. The reader is invited to consult [27] for a full introduction to JML.

2.2.3

Krakatoa/WHY Toolset

The Krakatoa tool [13] is part of the Why platform for deductive program verification , developed by the ProVal research group. Krakatoa is the frontend dedicated to Java source code. The general approach is to annotate source by formal specifications and then to generate Verification Conditions (VCs for short), logical formulas whose validity implies the soundness of the code with respect to the given specifications. Then the VCs can be discharged using one or several theorem provers. The main originality of this platform is that a large part is common to C and Java. In particular there is a unique, stand-alone, VCs generator called Why, which is able to output VCs in the native syntax of many provers, either automatic or interactive ones. The overall architecture is presented below.

20

Chapter 2 – State of Art

Figure 2.3: Krakatoa/WHY Toolset [13] Krakatoa expects as input a Java source file, annotated with the Krakatoa Modeling Language (KML for short). KML is largely inspired from the Java Modeling Language. As JML, KML specifications are given as annotations in the source code. KML also shares many features with the ANSI/ISO C Specification Language, which is the specification language for the Frama-C platform. A key idea for the Krakatoa/Why is that Why is used as a programming language to program the semantics of Java. We use the Krakatoa tool to translate the the Social Network core application and the third party plug-in into the WHY language. The WHY tool implements a programming language designed for the verification of sequential programs. This is an intermediate language to which existing programming languages can be compiled and from which verification conditions can be computed. The verification condition generator implemented by WHY is based on historical Hoare Logic rules. The Why tool accepts the the Social network core application and plug-in translations and generate verification conditions that feed into the prover. The following are the benefits and limitations identified in using the Krakatoa/WHY tool in our framework. Krakatoa can handle the translation of the following JML components in the WHY language:

2.2 Formal Methods Tools

21

• Method Specification: The Krakatoa tool allows one to certify that a JML-annotated method of a Java program meets its specifications. It focuses on verifying the soundness of implementations with reference to pre/post-conditions (given in the specification). • Class Invariants: Class invariants are declared at the level of class members, they have the form /*@ invariant id: e; It states that property e must hold for the current object [13]. The Krakatoa tool is capable of proving that class invariants, as well as post-conditions, hold at the end of a method execution, provided that invariants and pre-conditions were valid at the beginning. • Ghost Variables: The purpose of ghost code is to compute some additional information in order to monitor the program execution. Ghost variables are additional, specification only variables that can be added to method bodies [13]. They are declared with /*@ ghost ?

x = e ; and can be assigned with /*@ ghost x = e ;

Krakatoa cannot handle the translation of the following JML components in the WHY language: • Model Variables:Model fields are like member variables that can only be used in behavior specifications.Here’s an example of a model field declaration: //@ public model JMLEqualsSet person; This component of JML is essential in the translation of the B model into JML as discussed later in the document. • Specification Library for JML: This library for JML contains description of the semantics used in JML specification. Krakatoa has been found to be unable to correctly use these libraries to translate JML to the WHY language. The above limitation would need to considered before a decision is made whether or not the Krakatoa/WHY Toolset can be used within the proposed framework.

Chapter 3

Proposed PCC Framework Following is a more in-dept look at the proposed PCC Framework based on Figure:2.1 and how our work aims to implement each module defined in the framework defined in Section 3.1.

3.1

The Source Program

The framework shall accept from the third party developers the native code of the plug-in. The choice of the development platform used for this plug-in is of utmost importance and can greatly effect how successfully the PCC framework works. The primary design objectives for any development language adopted should be: 1. rigorous definition 2. simple semantics 3. security 4. verifiability 5. bounded resource requirements The language in which the plug-in is developed and specified must have rigorous definition so as to provide enough expressive power. It would also be necessary to provide the ability to make strong assertions about the use of variables, so they can be more easily monitored. The language should use relatively simple semantics. This reduces the cases of ambiguity found and allows for simpler tools and testing in turn allowing for greater reliability. The language should be able to check a rule it possesses effectively. The language must also behave in a way so as to maintain the integrity of the system. Thus the need for a secure language.

3.2 The Compiler and Byte Code Modules

23

One of the main reasons for the language is that programs written in it should be verifiable, therefore the code should be able to be scrutinized by mathematical methods. This means that program flow should be easy to follow with most pieces of code only having one point of entry and few points of exit. The language must observe bounds of time and storage space and must be able to predict such bounds before execution. In particular, this means that dynamic storage cannot be used and loops should be checked to ensure termination. Based on the above requirements the third party development platforms might need to be restricted to Java/JML and C/ACSL. At present the framework is based around Java as the choice for the programming language and JML for the specification of properties.

3.2

The Compiler and Byte Code Modules

The compiler would takes as input a Java source file annotated with JML specification and its Java class produced by a non optimizing compiler containing a debug information. This process needs to be independent of any specific java compiler. It would output the third party application as Byte code accessible to the code consumers. BML is the bytecode encoding of an important subset of JML . The latter is a rich specification language tailored to Java source programs and allows to specify complex functional properties over Java source programs. Thus, BML inherits the expressiveness of a subset JML and allows to encode potentially complex functional and security policies over Java bytecode programs. The class files augmented with the BML specification must be executable by any implementation of the JVM specification. Because the JVM specification does not allow inlining of any user specific data in the bytecode instructions BML annotations are stored separately from the method body. BML encoding is different from the encoding of JML specification where annotations are written directly in the source text as comments at a particular point in the program text or accompany a particular program structure.The relation of the verification conditions over bytecode and source code can be used for building an alternative PCC architecture which can deal with complex security policies. In case of a non-trivial security policy neither an automatic inference of the specification, nor an automatic generation of the proof will be possible. In those cases, such an equivalence can be exploited by the code producer to generate the certificate interactively over the source code. Because of the equivalence between the proof obligations over source and bytecode programs, their proof certificates are also the same and thus, the certificate generated interactively over the source code can be sent along with the code.

24

3.3

Chapter 3 – Proposed PCC Framework

Safety Policy

The Safety Policy shall consist of the the safety rules and interface to the library of pre-defined methods. The safety rules shall be in format recognizable by the the Verification Condition generators (VCGen), i.e., JML. The social network core model built in B would be translated and a corresponding model in JML would be generated. This translation would be performed by a B to JML translator built specifically for this purpose because the tools employed in the framework might have certain limitation as to what kinds of formal specification they can handle (eg. KML). As an example of how a B model translates to a JML model the following translations show how we correspond notations in either modeling language:

B Model Machine Machine Refinement PRE Operation Body INVARIANT Operation ANY x

JML Model Class Sub-Class requires ensures + \old invariant JML Method specification \forall x

This model in JML would allow the operations defined to be extracted into a library that would now be accessible to third party developers for use when building their applications. This library would include all privacy and security critical operations that might be performed over the social network, such as transmitting, creating or editing content, etc. Any change within the access privileges that the plug-in might enforce would only be done with the use of the library of operations. The JML model would also be used to define the interface to the library of operations. The interface would employ the concept of Design by Contract, thus the plug-in developer is provided with a set of pre-condition, that must be satisfied before the operation is called, and a set of post-conditions, that the operation call guarantees, for every operation defined in the library. Therefore the burden of proof on the third party developers of the plug-in would consist of proving that any and all pre-conditions of an operation are satisfied before it is called.

3.4

Verification Condition Generator

The VCGen CP/CC modules are used to generate a set of Verification Conditions or VCs and accepts as input the safety policy. VCs are used to attempt to establish that certain properties hold for a given subprogram. If a postcondition is added to the specification, it will also generate VCs that require

3.4 Verification Condition Generator

25

the user to show that the post-condition will hold for all possible paths through the subprogram. A verification condition or a VC is a logical formula for the correctness of a path between two cut points, where a cut point is a pre-condition, a postcondition or another assertion. Each of these VCs would be written in the form of a list of hypotheses and a list of conclusions with an implication between them. These VCs are produced on a subprogram-by-subprogram basis, with files containing lists of VCs being written for each subprogram. For each subprogram two corresponding files might be generated. The first of these is the file which contains a list of declarations for types, functions, constants and variables used in the VCs. The other is the file which contains various rules about the behavior of the functions and additional information, such as the first and last allowed values of types. If each of these VCs can be proved true for a procedure then that procedure is correct. The process of deriving VCs is illustrated through the following example. The third party must provide invariants Inv = [P, Inv1, ..., InvN, Q] which must hold at particular points of the plug-in program Prog =[I1, ..., IN]. P is the precondition of the third party plug-in and Q its postcondition. These invariants are used to calculate the verification condition (VC) for the plug-in. As general rule, if Ik is a method call instruction, then the invariant Invk implies the precondition of Ik, that is, Ik => Ik.PRE. We use weakest precondition calculus to calculate the VC of a program. The verification condition of a program Prog = [I1, ..., IN] with respect to the list of invariants Inv = [P, Inv1, ..., InvN, Q] is WP(I1;I2;...;IN,Q) in which ”;” represents sequential composition of statements. Therefore, the plug-in must very P => WP(I1;I2;...;IN,Q) to be safe to run. The weakest precondition calculus rules for calculating VC follow. The ”Assignment Rule” and the ”Composition Statement Rule” below are the usual Dijkstra’s predicate transformer’s rules for assignment and statement composition. The ”Method Rule” states that the weakest precondition of a call to method ”m()” with postcondition. The first rule says that in order for the postcondition Q be true after the assignment ”x = E” then the property ”Q[E/x]” must be true before the assignment. ”Q[E/x]” is ”Q” in which every free occurrence of ”x” has been replaced by ”E”, a terminating expression in the language without side-effects. The ”Method Call Rule” says that in order for the postcondition ”Q” be true after the method call ”m()”, the precondition of ”m()” should be true before the method call, and either the post-condition of ”m()” implies the post-condition ”Q”, or (it’s not the case and) the conjunction between the precondition and the postcondition of ”m()” implies ”Q”.

26

Chapter 3 – Proposed PCC Framework

Assignment Rule Method Rule - Logic Form

Method Call Rule - let Form

WP(x = E, Q) = Q[E/x] WP(m(),Q) = ( (m().POST => Q) & m().PRE ) || ( (m().POST /=> Q) & m().PRE & ( m().PRE & m().POST => Q ) ) WP(m(),Q) = let r = m().PRE,t = m().POST in ( (t => Q) & r ) || ( (t /=> Q) & r & ( r & t => Q) )

The translation of the Method Call Rule to Yices follows. The expression (translation-pre m) is the precondition of method m written in Yices and (translation-post m) of the postcondition of m written in Yices. Example. Let’s supposse we want to very the plug-in implemented by the method install_plugin below written in Java. public class MyPlugIn { /*@ PRECONDITION true; @ POSTCONDITION Q; @*/ public void install_plugin(SocialNetwork sn) { Integer ow1 = sn.create_content(); // I1 Integer rc1 = sn.create_rc(); // I2 sn.upload_rc(ow1,rc1) // I3 Integer pe1 = sn.create_content(); // I4 sn.transmit_rc(rc1,ow1,pe1); //I5 } } In the following we calculate the verification condition VC for the plug-in. We use Yices notation to show how VC is calculated. VC = (true => WP(I1;I2;I3;I4,Q)) WP(I1;I2;I3;I4;I5,Q) WP(I1, WP(I2, WP(I3, WP(I4, WP(I5,Q))))) = (define Q5 (let ( (r5::bool (translation-pre (transmit_rc sn rc1 ow1 pe1))) (t5::bool (translation post (transmit_rc sn rc1 ow1 pe1))) ) (or (and (=> t5 Q) r5) (and (not (=> t5 Q)) r5 (=> (and r5 t5) Q) )))) )))) WP(I1, WP(I2, WP(I3, WP(I4, Q5))))

3.4 Verification Condition Generator

27

It reduces to the expression below. (define Q5 (let ( (r5::bool (translation-pre (transmit_rc sn rc1 ow1 pe1))) (t5::bool (translation post (transmit_rc sn rc1 ow1 pe1))) ) (or (and (=> t5 Q) r5) (and (not (=> t5 Q)) r5 (=> (and r5 t5) Q) )))) )))) ;; (define Q4 (let ( (r4::bool (translation-pre (create_content sn))) (t4::bool (translation post (create_content sn))) ) (or (and (=> t4 Q5) r4) (and (not (=> t4 Q5)) r4 (=> (and r4 t4) Q5) )))) ;; (define pe1 (result (create_content sn))) ;; WP(I1, WP(I2, WP(I3,Q4))) It reduces to the expression below. (define Q5 (let ( (r5::bool (translation-pre (transmit_rc sn rc1 ow1 pe1))) (t5::bool (translation post (transmit_rc sn rc1 ow1 pe1))) ) (or (and (=> t5 Q) r5) (and (not (=> t5 Q)) r5 (=> (and r5 t5) Q) )))) )))) ;; (define Q4 (let ( (r4::bool (translation-pre (create_content sn))) (t4::bool (translation post (create_content sn))) ) (or (and (=> t4 Q5) r4) (and (not (=> t4 Q5)) r4 (=> (and r4 t4) Q5) )))) ;; (define pe1 (result (create_content sn))) ;; (define Q3 (let ( (r3::bool (translation-pre (upload sn ow1 rc1))) (t3::bool (translation post (upload sn ow1 rc1))) ) (or (and (=> t3 Q4) r3) (and (not (=> t3 Q4))

28

Chapter 3 – Proposed PCC Framework

r3 (=> (and r3 t3) Q4) )))) ;; WP(I1, WP(I2, Q3)) It reduces to the expression below. (define Q5 (let ( (r5::bool (translation-pre (transmit_rc sn rc1 ow1 pe1))) (t5::bool (translation post (transmit_rc sn rc1 ow1 pe1))) ) (or (and (=> t5 Q) r5) (and (not (=> t5 Q)) r5 (=> (and r5 t5) Q) )))) )))) ;; (define Q4 (let ( (r4::bool (translation-pre (create_content sn))) (t4::bool (translation post (create_content sn))) ) (or (and (=> t4 Q5) r4) (and (not (=> t4 Q5)) r4 (=> (and r4 t4) Q5) )))) ;; (define pe1 (result (create_content sn))) ;; (define Q3 (let ( (r3::bool (translation-pre (upload sn ow1 rc1))) (t3::bool (translation post (upload sn ow1 rc1))) ) (or (and (=> t3 Q4) r3) (and (not (=> t3 Q4)) r3 (=> (and r3 t3) Q4) )))) ;; (define Q2 (let ( (r2::bool (translation-pre (create_rc sn))) (t2::bool (translation post (create_rc sn))) ) (or (and (=> t2 Q3) r2) (and (not (=> t2 Q3)) r2 (=> (and r2 t2) Q3) )))) ;; (define rc1 (result create_rc sn)) ;; WP(I1, Q2) It reduces to the expression below.

3.4 Verification Condition Generator

29

(define Q5 (let ( (r5::bool (translation-pre (transmit_rc sn rc1 ow1 pe1))) (t5::bool (translation post (transmit_rc sn rc1 ow1 pe1))) ) (or (and (=> t5 Q) r5) (and (not (=> t5 Q)) r5 (=> (and r5 t5) Q) )))) )))) ;; (define Q4 (let ( (r4::bool (translation-pre (create_content sn))) (t4::bool (translation post (create_content sn))) ) (or (and (=> t4 Q5) r4) (and (not (=> t4 Q5)) r4 (=> (and r4 t4) Q5) )))) ;; (define pe1 (result (create_content sn))) ;; (define Q3 (let ( (r3::bool (translation-pre (upload sn ow1 rc1))) (t3::bool (translation post (upload sn ow1 rc1))) ) (or (and (=> t3 Q4) r3) (and (not (=> t3 Q4)) r3 (=> (and r3 t3) Q4) )))) ;; (define Q2 (let ( (r2::bool (translation-pre (create_rc sn))) (t2::bool (translation post (create_rc sn))) ) (or (and (=> t2 Q3) r2) (and (not (=> t2 Q3)) r2 (=> (and r2 t2) Q3) )))) ;; (define rc1 (result create_rc sn)) ;; (define Q1 (let ( (r1::bool (translation-pre (create_content sn))) (t2::bool (translation post (create_content sn))) ) (or (and (=> t2 Q2) r1) (and (not (=> t2 Q2)) r1 (=> (and r1 t2) Q2) )))) (define VC::bool (and Q5

30

Chapter 3 – Proposed PCC Framework

(and Q4 (and Q3 (and Q2 Q1))))) One possible solution that would need to be explored is the use of the Krakatoa and WHY tools to form this module and would work as follows: 1. The Krakatoa tool translates the JML Model and the plug-in code into the WHY language which feeds into the WHY tool . The JML model need only be translated once and stored on file unless any changes are made to it. The plug-in code might merely satisfy the pre-conditions defined for the operations they call but might also include some additional specification that defines their own pre and post conditions. In the second case these would need to be considered as well in order to prove the plugins conformance to our safety policy. 2. The WHY tool accepts the input from the Krakatoa tool in the WHY language. Based on the translations of the JML model and the plug-in code the WHY tool generates a set of verification conditions, i.e.., conditions that must be satisfied to prove that the plug-in adheres to the social networks safety policy. These verification conditions are in a language understood by the prover described below.

3.5

The Prover/Proof Checker

The next component would be the theorem prover. The Theorem prover uses the logic provided by us (e.g. SMT-LIB Benchmarks) that are built in or the user defined ones provided in the safety proof file to reduce more complex equations to simpler ones and dispel a majority of the VCs automatically. Theorem Proving deals with the verification of computer programs that show that some statement or the conjecture is a logical consequence of a set of statements or the axioms and hypotheses. The specification written in a formal language would be manipulated by the theorem prover. The VC at hand must have been described precisely and accurately, and this process in itself would lead to a clearer understanding of the problem domain. The proofs produced by this method should describe how and why the conjecture follows from the axioms and hypotheses. The output must not only be a convincing argument that the conjecture is a logical consequence of the axioms and hypotheses, it must describe a process that may be implemented to solve some problem if the output is negative. Once the Proof Checker verifies that all VCs generated have been satisfied it informs the Execution module that the byte code is safe to execute. A possible implementation of this module is the use of the Yices prover. The Yices prover is based on SMT-Library benchmarks. The prover accepts the verification conditions generated by the VCGen modules and automatically

3.6 The Certificate

31

proves whether all these conditions can proved to hold true or not. If all the verification conditions are proved to hold true the plug-in adheres to the social networks safety policy and can safely be integrated to the social network. If all the verification conditions can not be satisfied it either means the plug-in is violating one of the policies defined in the interface or that it cannot dispel some of the conditions automatically and would need to be proved manually.

3.6

The Certificate

Based on the choice of provers chosen to implement the Prover and Proof Checker modules a Certificate may or may not be generated. The Certificate consists of the proof steps required to be executed by the prover to dispel all the VCs. This steps from the fact that the automation in various provers are different some require more interactive proofs. e.g.. Yices is a completely automated prover and can dispel all satisfiable VCs automatically without user input where as Coq [24] require more user interaction, therefore if Yices were to be used a the Proof Checker a certificate would not be necessary whereas with Coq it would be.

Bibliography [1] Acquisti A. and Gross R. Imagined communities: Awareness, information sharing, and privacy on the facebook., 2006. [2] J. R. Abrial. The B-Book: Assigning Programs to Meanings. Cambridge University Press, 1996. [3] A. L. Allen. Uneasy Access: Privacy for Women in a Free Society. Rowman and Littlefield, 1988. [4] Rajeev Alur. Model checking: From tools to theory. In Orna Grumberg and Helmut Veith, editors, 25 Years of Model Checking, volume 5000 of Lecture Notes in Computer Science, pages 89–106. Springer Berlin / Heidelberg, 2008. [5] Atelier b. http://www.atelierb.eu/index en.html. [6] Dutertre B. and Moura L. The yices smt solver. Technical report, Computer Science Laboratory, SRI International, 2010. [7] Stump A. Barrett C. and Tinelli C. The SMT-LIB Standard Ver.2, March 2010. [8] Gilles Barthe, Pierre Cr´egut, Benjamin Gr´egoire, Thomas P. Jensen, and David Pichardie. The mobius proof carrying code infrastructure. In FMCO, pages 1–24, 2007. [9] C. Boettcher, R. DeLong, J. Rushby, and W. Sifre. The mils component integration approach to secure information sharing. In 27th IEEE/AIAA Digital Avionics Systems Conference (DASC), 2008. [10] Jonathan Bowen. Formal specification and documentation using z: A case study approach., Revised 2003. [11] Ellison N. B. Boyd D. Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication, 13(1)(11), 2007. [12] N´estor Cata˜ no and Camilo Rueda. Matelas: A predicate calculus common formal definition for social networking. In ASM, pages 259–272, 2010.

BIBLIOGRAPHY

33

[13] WHY by Jean-Christophe Filliˆatre Claude March´e. The krakatoa tool for deductive verification of java programs. winter school on object-oriented verification, viinistu, estonia, 2009. [14] Boyd D. Facebook’s privacy trainwreck: Exposure, invasion, and social convergence. [15] Hiltz S. R. Dwyer C. and K. Passerini. Trust and privacy concern within social networking sites: A comparison of facebook and myspace., 2007. [16] D. Peled E. Clarke, O. Grumberg. Model Checking. The MIT Press, 1999. [17] J. Wing E. Clarke. Formal methods: State of the art and future directions. ACM Computing Surveys, 28, December 1996. [18] Stutzman F. An evaluation of identity-sharing behavior in social network communities. Journal of the International Digital Media and Arts Association, 2006. [19] University of Miami Geoff Sutcliffe. Overview of automated theorem proving. [20] A. George. Living online: The end of privacy? September 2006.

New Scientist, 2569,

[21] R. Gross and A. Acquisti. Information revelation and privacy in online social networks. In Workshop on Privacy in the Electronic Society (WPES), pages 71–80, 2005. [22] J. He, C. A. R. Hoare, and J. W. Sanders. Data refinement refined. In European Symposium on Programming (ESOP), pages 187–196, 1986. [23] C. A. R. Hoare. Proof of correctness of data representations. Acta Informatica, 1:271–281, 1972. [24] INRIA. The Coq Proof Assistant Reference Manual, 8.1 edition, 2006. [25] Kornblum J. and Marklein M. B. What you say online could haunt you. USA Today, March 2006. [26] G.T. Leavens, A.L. Baker, and C. Ruby. Preliminary design of JML: A behavioral interface specification language for Java. ACM SIGSOFT Software Engineering Notes, 31(3):1–38, 2006. [27] G.T. Leavens, E. Poll, C. Clifton, Y. Cheon, C. Ruby, D. Cok, P. M¨ uller, J. Kiniry, and P. Chalin. JML reference manual. http://www.eecs.ucf.edu/∼leavens/JML/jmlrefman/jmlrefman toc.html, 2008. [28] B. Meyer. Applying “design by contract”. Computer, 25(10):40–51, October 1992.

34

BIBLIOGRAPHY

[29] G. C. Necula. Proof-carrying code. In Symposium on Principles of Programming Languages (POPL), page 106119, Paris, January 1997. [30] George Necula and Peter Lee. Research on proof-carrying code for untrusted-code security. In Proceedings of the 1997 IEEE Symposium on Security and Privacy, page 204, 1997. [31] A. Robinson and A. Voronkov. Handbook of Automated Reasoning. MIT Press, 2001. [32] John Rushby. Design and verification of secure systems. In 12-21, editor, 8th ACM Symposium on Operating System Principles, volume 15, 1981. [33] Barnes S. A privacy paradox: Social networking in the united states., 2006. [34] Preibusch S. Hoser B. Gurses S. and Berendt B. Ubiquitous social networks—opportunities and challenges for privacy-aware user modelling., June 2007. [35] F. D. Schoeman. Philosophical Dimensions of Privacy: An Anthology. Cambridge University Press, 1984. [36] Jeannette M. Wing and Michael Carl Tschantz. Formal methods for privacy. In Formal Methods, 2009.