A Dynamic Knowledge Based Approach to the Problem of Deduction

2 downloads 0 Views 1011KB Size Report
of Deduction in a Non-Statistical. Multilevel. Secure Database. Michael Anderson. Computer. Science and Engineering. Department. University of Connecticut.
A Dynamic

Knowledge

to the Problem in a Non-Statistical

of Deduction

Multilevel

Michael

Computer

Based Approach

Secure Database

Anderson

Science and Engineering University

260 Glenbrook

of Connecticut

Road, Storrs,

Connecticut

[email protected].

by

Of all research problems associated with the security of data in a multilevel secure database, perhaps the most dlj?icult and least understood is the problem of controlling indirect inference of sensitive data. The phenomenon of inferring sensih”ve data through the combination of non-sensitive data and realworld knowledge has been tenned deduction We do not agree with the prevailing attitude that all such deduction is unpreventable and believe that our dyamic hrrowledge-based approach to this problem provides a high degree of secun”~ with less restrictive infonnationflow than previously achieved. Our d~amic knowledge-based system is instrumental in the prevention of deduction, given sujlicient knowledge of a domain. As a query is presented to the database, our system attempts to use the knowledge in the KB, combined with the data that would be released l~the query were permitted, to deduce sensitive data as a user moderately knowledgeable in the domain might. If such data is found to be so deducible, the system has detected a possible undesirable channel for sensitive data disclosure. /kqywords: security, inference detection, knowledge-based interface] Introduction Of all research problems associated with data security, haps

the

most

controlling

problem, in that vent, ment

difficult

indirect

disclosures

controls

To

fully

it is important focused

on

Clearly ferent

the inference

of databases. levels

of

In fact,

data.

is termed, have

been

flow, that

a severely

the

classifications

(that

is,

this

of

the

~nning79, broader

contains

any

of

untiuitfhl

subset

a much

that

to imple-

of this

database

affects

difficult

easy to circum-

constrained

any database

of

inference

magnitude

the bulk

in a statistical problem

The

has proved either

per-

problem

or impossible

appreciate

to note

problem, namely inference Denning83, Adam89].

is the

sensitive

of information

~enning83]. has

understood

of

postulated

too restrictive

difficulty,

trum

least

as this form of disclosure most

research

and

spec-

data of dif-

multilevel

secure

Permission to copy without fee all or part of this matarial is granted provided that the copies aro not mad. or distributed for direct commercial advantaeo, the ACM copyright notice and the title of the publioetion and its date appeer, and notice is given thet copying ia by permission of the Association for Computing Mechinery. To copy otherwise, or to republish, requires e fee and/or spacific permission. CIKM “93 -1 l/93/D. Q

1993

ACM

is vulnerable to such an attack.

C., USA 0-89791-626-3/93/001

1 . . ..$1.50

154

deftition,

means curity quire that

06269-3155

edu

database)

Abstract

1

Department

is related

and

of undesirable

inference

classifications.

Making

the use of real-world is not

inferring

even

contained

sensitive

data

these

if they such

knowledge in the

Data

in any database,

relationships relate

an

will

these The

by combining

provide

of different

infkreuce about

database.

in a database

can

data

often

a sere-

relationships

phenomenon

of

non-sensitive

data in a database with such real-world knowledge has been This distinguishes it from more termed deduction ~i seman90]. traditional inference control research that does not concern itself with such external knowledge. We believe that a dynamic knowledge based approach applied to the problem of deduction can provide a high degree of security with less restrictive information flow than previously achieved. 2

Proposed

Solution

The deduction problem, with respect to multilevel secure databases, seems amenable to solution by the integration of a knowledge base (KB) of real-world information with a database of facts. The IU3 should contain knowledge about the facts in the database including relevant properties and relationships of not only the attributes of each relation but each en~”ty in the domain of these attributes. Previous work in this area has concentrated on Entity-Relationship Graphs that only deal with relationships between attributes &Iinke89, Bonyun90]. It is important to realize that this approach to deduction prevention is too restrictive, restricted finer grain semantics are required to achieve minimally information flow. For example, consider the PATIENT relation presented in Table 1. If the obvious protection scheme were implemented, all (and only) values of the DISORDER attribute would be unavailable at all times to any user not privy to that information. The deduction problem arises in this ease because of the implicit relationship between PRESCRIPTION and DISORDER. In some cases, a reasonably knowledgeable medical worker who is denied access to the DISORDER attribute might still be able to deduce the value of a DISORDER given the corresponding value of PRESCRIPTION. For example, given the PRESCRIPTION value “theophyll~ this user might reasonably deduce that the patient in question has asthma. Traditionally ~onyun90, Haigh90, SU90, Henning89, Hinke89], if the possibility of such deductions were deemed undesirable, control was accomplished by statically analyzing the DB to fmd these channels and then permanently raising the

PRESCRIPTION

NAME

THERAPY ,.,.,.,.:.:.,.,.,

DOE, JOHN

THEOPHYLLIN

SMITH, PETER

ACE INHIBITOR

~~ ,:,.,.,.,,:,:, ........x. ,.,.,,,.,.,.,.,., ,.,.,.,.:.:.,.:.: fg$ ,,., ... ::::::,:W ..................

JONES, ANN ......... 1j~i~

DIETARY

INSULIN

KELLY ALEX DAVIS, TOM

Table

1: Patient

Relation

(DISORDER

attn”bute

representation, the character of the inference mechanism and it algorithms, and the integration of the system with its database.

classification level of the attributes horn which sensitive data could bededuced tothelevel ofthis sensitive data. In the example, deduction would be remedied by the classification of the entire PRESCRIPTION attribute at the same level as the DISORDER attribute. Although this does prevent such deductions, over classi@ng non-sensitive data in this manner restricts access to such data and compromises theuseftdness of the database. If, however, the database was somehow given theknowledge that “theophyllin is used in the treatment of asthma” then the example deduction could be prevented by simply not allowing that particular value of PRESCRIPTION to be divulged while still allowing some of its other values to be released. What distinguishes this particular value of PRESCRIPTION horn others is that its use can often be linked to a particular disorder. This opens the possibility of deducing the value of DISORDER tlom the value of PRESCRIPTION. Such would not be the case if the value of PRESCRIPTION were, say, “darvon”, a drug used to alleviate pain in any number of disorders. Our dynamic knowledge based system is instrumental in the prevention of such deductions, given sutllcient knowledge of a domain, via its ability to make inferences from such knowledge. As a quay is presented to the database, this inference mechanism attempts to use the knowledge in the KB, combined with the fact (or facts) that would be released if this query were permitted, to deduce sensitive data as a moderately knowledgeable user might. If such data is found to be so deducible, the system has detected a possible undesirable channel for sensitive data release. The goal of the current system is the detection of single queries that could be instrumental, in concert with real-world knowledge, in the deduction of sensitive information in a relational database. Issues concerning problems of queries made over time and actions to take upon detection are presented as titnre research topics. 3

Problem

sensitive)

3.1 Database

Dimension

Currently, our work focuses on the relational database model as it seems to be the emerging defacto standard. Although research in semantic and object-oriented database models is attempting to infuse semantics in the data, we believe that the richness of the semantics required by our approach and the fact that we are dealing with knowledge that is external to the database precludes their use. This does not mean that the our system will not work with such models, but that the knowledge required is itself external to the database and resides in a separate knowledge base. In actuality, our goal is database model independence. That is, we hope to provide a general system that, with simply a change of database interface, is usable by all database systems. The current work expands the traditional domain of inference research to include not only statistical databases but any multi-level secure database. We will restrict our focus to medical information systems that are considered “at high risk” with respect to the privacy and confidentiality of their data @iskup90]. Again, this is not to imply that our system will only be applicable in this domain. Although, the KB will necessarily be domain dependent, the inference mechanism will not be so and, therefore, a system in a new domain can be readily created by changing the contents of the knowledge base. An example relation horn a medical database (see Table 1) will be used to illustrate our approach.

3.2 Classification

Scheme Categories

restrictive of all Ideally, we seek a system that is minimally its data and that incurs a reasonably small amount of inferencing overhead to achieve this minimal restriction. The finer grain semantics proposed generate two promising classification schemes, the fwst termed &msitiv~ htribti~~ Ckz@ication and the second termed No Attribute Classl>cation. By Sensitive Attribute Classification we mean that only the attribute that is considered sensitive is hidden. Left at this rather

Dimensions

Central to the current work are the various dimensions of the problem and their subsequent constraint, the type of database to be protected, the nature of the KB and its internal

155

large grain level, this scheme offers no protection from deduction but protects all sensitive data tlom direct inspection — sensitive data is maximally restricted while non-sensitive data is maximally unrestricted. Adding the proposed protection horn deduction to this scheme incurs the cost of making access to non-sensitive data somewhat more restrictive but, unlike the classification of the entire attribute, only minimally so. This is so because only certain values of the non-sensitive deductive attribute must be hidden, namely those from which sensitive data can be inferred. By No Attribute Classification we mean that no attributes are entirely hidden, that is classification takes place solely at the value level. on its own, this scheme obviously affords no protection to any of the data but is maximally unrestrictive of both sensitive and non-sensitive data. Under the proposed approach, the only data values that are hidden are those which are deemed most sensitive and data from which these most sensitive values can be inferred. This is minimally restrictive of both sensitive and non-sensitive data and assumes that not all data under a sensitive attribute is equally sensitive. In comparison, both classification schemes are minimally restrictive of non-sensitive data. Sensitive Attribute Classification has the disadvantage that it is maximally restrictive of sensitive data while No Attribute Classification will require a larger KB in order to denote which data is deemed sensitive and could exact a greater inferencing penalty. These trade-offs must be considered when choosing a dynamic classification scheme. We will consider both schemes in further discussion of our approach.

Query

3.3

3.5

Dimension

3.6 Knowledge

It is necessary to speci~ the role of the user in the current work. This is so because we need to create a ICB that models the real-world knowledge that this user possesses in this domain. The level of classification and domain chosen to explore the problem of deduction point to a user that is moderately wellversed in medical knowledge but is not of the level of an MD. Further, this user must have a need-to-know that corresponds to the classification level specified. Such a user can be considered as a high level medical assistant that manages prescription medicines and therapies. It is important that the KB reflect the knowledge of this user some

On

the

fidelity. than

other

knowing”

If it is the caae that

the

KB

hand,

in its

then

it

deduction

is not

domain.

clear

a subset

of this

ficiently.

It should

also be noted

will

be more

edgeable

likely

tected

more

tem that

knowledge

to make

one. If the KB’s

it is attempting

to model,

fkquently

is more

than

restrictive

that

If a system

data with

KB

a more

warranted.

than

could

its efuser

the user that will

result

the

construction

of a uaetid

KJ3 will

put

great

It is de-

mands upon available resources. It is important that the costs involved in such an undertaking be amortized across as large a range of uses as possible. Such uses could include more intelligent interfacing with a user [Anderson90, Anderson91], a broader range of integrity and consistency enforcement, and, in general, support for a new generation of intelligent DB applications ~orgenstem87]. Many knowledge representation issues must be addressed. The exact structure of each piece of information, how these

be “all-

to do so more

of deduction

of its data than

constrained,

than a less knowl-

is greater This

knowl-

protect

knowledgeable

deductions

knowledge

should

adequately

it is bound

the possibility

is more

cause of potential for use beyond the current one postulated. should be noted that, even though the domain of a database

be less preventable.

the

can

that

correct

the user will

Base Dimension

The knowledge base is comprised of the relationships between various values in the domain of each attribute in a given database as well as the relationships between each attribute. Such knowledge is, by necessity, domaindependent. That is, only the relationships required by the particular set of attributes of the database and their range of values are included. This knowledge is stored separately tkom the data in the database. This is the ease because, by nature, such knowledge will not fit neatly into a relational database model. A more flexible structure is required to contain the declarative information denoted. Although, intuitively we might expect the KB to contain facts that pertain specifically to the deducablity of a fact horn another, a more general KB structure has been decided upm be-

3.4 User Dimension

edgeable

Dimension

Two types of uncertainty need to be addressed in the current work uncertain~ of knowledge and uncertainty of deduction. Uncertainty of knowledge pertains to the level of contldenee that can be attributed to facts in the KB by the detection system. Not all facts are knowm with 100VO cotildence and, further, many facts ordy pertain a certain percentage of the time. The medical domain is particularly subject to such uncertain knowledge. Uncertainty of deduction pertains to the level of cotildence that can be attributed to a deduction drawn by a user. Obviously, conclusions drawn through deduction are not all equal in degree of validity. The current work considers the certainty of facts in the system knowledge base to be absolute. This constraint eventually needs to be removed but dces not reflect on the viability of the system proposed. It is a knowledge representation and inference complication that can be ignored for the moment without undermining the main thrust of the approach. Deduction uncertainty, on the other hand, is more germane to this thrust and is modeled by ambiguity in the inferencing process. That is, the more ambiguous the inferences made by the system are, the more uncertain any deduction made by a user, based on these inferences, will be. For example, given the PRESCIUPTION value “darvon”, the inferencing process of the system will deduce many possible disorders— so many that a user making the same deduction will not be able to choose between them and, therefore, have high uncertainty associated with any deduction hefshe dares to make. This will not be the case if the PRESCRIPTION value was “theophylhn” because it is used in relatively few disorders and especially in asthma. Low uncertainty in this deduction would be reflected in the few disorders that the system would be able to infer given the PRESCRIPTION value “theophyllin”. (Note that the bias of “theophyllin” towards “asthma” is a function of KB certainty and is not represented in the current work. )

The richness of SQL allows a wide range of queries to be presented to a database. We will focus on single, straightforward requests to the database for non-sensitive data (and sensitive data when considering the No A~”bute Classl~cation scheme) horn the PATIENT relation m the hope that the extension to more complicated queries will be easier once these are better understood. Although it will be a necesxuy component of a complete deduction prevention system, no attempt will be made to incoprate a temporal dimension to the problem. Our first concern is to show the feasibility of our approach, queries presented over time will be considered in future work

with

Uncertainty

be dein a sys-

necessary.

156

pieces are linked, specified.

and the overall

3.6.1 Lk Knowledge

structure

of the KB must all be

Restrictors can be introduced or inherited and the c-terms they contain can themselves be manipulated, concept labels can be substituted for more specific or more general ones, c-terms can be imbedded inside other c-terms that have the appropriate constraints, and, furthermore, any c-term that is contained in the restriction of another c-term can be released to act as an autonomous c-term. These transformations are accomplished by the application of a number of inferencing rules that are an integral component of Lk itself. Downward Label Substitution allows a concept label of a cterm to be more specifically denoted. This is a plausible inference that traverses an isa link in a backward direction. In the example, DRUG_THERAPY can be substituted for THERAPY. U@w-d Label Substitution allows a concept label of a cterm to be more generally denoted. This is a valid inference that traverses an isa link in a forward direction. In the example, PRESCRIPTION can be substituted for DRUG_THERAPY. allows a c-term to be identified Membership Identtjication as a member of a set. This is a valid inference that combines related c-terms. For example, if it is known that THEOPHYLLIN is a member of the set DRUG, we can infer DRUG [vat:

Representation

To alleviate the ambiguities of natural language and facilitate the symbolic manipulation of knowledge, a more standardized internal representation is required by the system. To this end, Lk [Anderson90, Shin91, Anderson91 ] has been devised. Lk is a knowledge representation formalism that attempts to combine the rigor of formal logic with the structural flexibility of tkunes [Schank82]. The main structure of the language used in this proposal is the concept term or c-term. A c-term is a representation of some entity, event, or state. It has a concept label and can be followed by a restn”ction, which is itself a series of concept-connector :c-term pairs (each pair called a restn”ctor). For example, the fact that “Asthma displays the symptom of swollen bronchi” could be represented as the c-term DISORDER [val: ASTHMA, symptoms: SYMPTOM [val: SWOLLEN_BRONCHl]]. Here DISORDER timctions as the concept label and everything contained in the uppermost square brackets is its restriction. val: ASTHMA and symptoms: SYMPTOM [val: SWOLLEN BRONCHI] are restrictors with val and

THEOPHYLLIN]. Concept Connection Ident@ation

symptoms

to till a restrictor

fimct70ning as concept-connectors that connect the c-terms ASTHMA and [val: SYMPTOM SWOLLEN_BRONCHl] to DISORDER. A fragment of the K13 for the example domain is presented in Figure 1 with its underlying structure presented in Figure 2. The example KB contains isa relations that represent inheritance liuks, c-term restrictor constraints that relate attributes of the DB, and facts that relate values of these attributes.

3.7 Inference

Mechanism

SYMPTOM

Dimension

[val: SHORTNESS_OF_BREATH]].

Resti”ctor Release allows the promotion of c-term that is imbedded in the restriction of another c-term. For example, the restrictor DRUG[valTHEOPHYLLIN] can be released tlom c-term THERAPY [drugs: DRUG [val: THEOPHYLLIN].

3.7.2 Conceptual

Distance

the

Heuristic

A heuristic ranking of a given transformation can be formed by combining a rating of the similarity of that transformation with respect to the sensitive data and the accumulated cost of deriving that transformation. TMS heuristic can then be used to drive the modified A* search algorithm in its search for a linkage between non-sensitive and sensitive data. We propose the notion of conceptual distance [Anderson90~derson91 ,Shin93] as a heuristic means of focusing the attention of the inference mechanism. The conceptual distance between two c-terms is defined as the ranking of the set of similarities shared between them combined with the cost incurred by the transformation rules applied to produce the fust. Similarity metrics include matching concept labels, subset and superset relationships between concept labels, concept labels matching restrictors, and matching restrictions. These similarities are combined into a set and this set is ranked in accordance with the strength and numbers of similarities it contains. Transformational cost is determined by considering the validity or plausibility of each rule used in the trrmsfonuation of a c-term and SWUUiSl g up the cost of these. This heuristic provides a measure of how closely related two c-terms are— their conceptual closeness. We are most interested, in the case of finding a link-

These transformations are performed by the inferencing process associated with the Lk knowledge representation language.

Inference

allows a matching c-term This is a plausible inference

that attempts to till slots of c-terms. For example, DRUG [val :THEOPHYLLIN] can till the drug slot of DRUG_THERAPY. Resti”ctor Introduction allows the tke exchange of restrictors between c-terms with identical concept labels. This is a plausible inference that introduces filled slots into a c-term. For example, given DEFECT [val: ASTHMA] and DEFECT [symptoms: SYMPTO M[val:SHORTNESS_OF_BREATH] ], we can infer DEFECT [val: ASTHMA, symptoms:

The inference mechanism is completely dependent on the choice of knowledge representation paradigm. Given that a tkune-based representation has been chosen, a heuristic ‘search mechanism proves usefil. Issues of search space constraint, choice of heuristic, inferencing method, and search paradigm all must be addressed. It is important to note that the deduction problem as defined lends itself to goal-directed inference in that the value(s) of the sensitive data that needs protection will be known to the system as it proceeds with its infkrencing. This information can be used to guide the inferencing process and to keep extraneous inferencing down to a minimum. In the current system, a search algorithm based on the A* algorithm ~ilsson80] is used in an attempt to fmd a linkage between the non-sensitive data that would be released if a query were allowed to run to its completion and the sensitive data that is being protected horn deduction. The system, starting with the yet-to-be released data, iteratively makes a number of plausible transformations of this data into related facts using knowledge in the KB. These new facts are compared with the sensitive data and the new fact that is conceptually closer to it is chosen for the next round of transformation. If, at some point, there is a match between a new fact and the sensitive data, a linkage has been found between the non-sensitive and sensitive data and a possible channel for deduction has been detected.

3.7.1

of another c-term

age between sensitive and non-sensitive data, in the conceptual distance between the fact(s) that would be released by a given

In Lk

The process of inferencing in Lk is one of syntactically manipulating c-terms and is therefore domain independent.

query and the sensitive horn possible deduction.

157

data that we are attempting

to protect

DEFECT INJURY DISEASE

isa isa isa

DISORDER DISORDER DISORDER

PHYSICAL_THERAPY DRUG_THERAPY

isa isa

THERAPY THERAPY

PRESCRIPTION

isa

DRUG

DEFECT[symptoms: SYMPTOM] SYMPTOM [treatment: THERAPYl DRUG_THERAPY[drugs: DRUG, dose: DOSAGE] DEFECT[val:ASTHMA, symptoms: SYMPTOM [val:SHORTNESS_OF_BREATH],SYMPTOM[val:SWOLLEN_BRONCHl]] SYMPTOM[vai:SWOLLEN_BRONCHl, treatments: THERAPY[drugs:DRUG[val: dose: DOSAGE[mg :300,freq:DAlL~]]

Figure

1: Knowledge

THEOPHYLLIN],

Base Fragment

DISEASE

INJURY

\

DEFECT

Dls~RDER~ J SYMPTOM J

(

‘ ‘HE”PYR PHYSICAL

DRU”G THERAPY

$ DRUG

\ b DOSAGE

has

t

is

PRESCRIPTION

Figure

2: Knowledge

THERAPY

b

Base Fragment

Structure

158

4

The following example query is based on all previous assumptions. Further it assumes that Sensib”ve Attribute Classljlca-

System Integration The KB and its infkrence mechanism must be integrated

with the user interface and database. Since it is currently assumed that the relational database model will be used in conjunction with this system, the main juncture between the deduction detection system and the database will necessarily be a communication link between the database’s SQL dialect and the detection system’s knowledge representation. The system’s architecture is presented in Figure 3. A user fmt presents a query to the system. l%e Deduction Detector (DD) intercepts this query before it is presented to the SQL interface. (It should be noted that, although the DD is currently only concerned with the detection of deduction, all matters of data security should be eventually handled by this module.) The query is modified to request any sensitive data in the relation and this modified query is presented to the SQL interface. This module formulates the data request, gathers the appropriate tuples from the database, and returns these to the DD. The DD translates the data in the tuple into Lk representation using information in the original quay. These new facts will correspond to all the non-sensitive data that would be released by the user’s query and the sensitive data that is being protected. The DD then calls on the inferencing mechanism to attempt to link the nonsensitive and sensitive data. If such a linkage is found, within some systemdefmed time and ambiguity limits, the possibility of deduction has been found and controls for such should be invoked. The following section presents an example of the system’s operation.

4.1

Example

tion is being used and therefore the entire attribute DISORDER is deemed sensitive and all its values need to be protected tlom deduction. A discussion of the differences encountered when No Attribute Class@ation is used follows. The user f~st issues an SQL query

SELECT PRESCRIPTION FROM PATlENT WHERE NAME = “doe,john” The Deduction Detector (DD) intercepts difies it to include sensitive data by attaching to the SELECT statement

SELECT PRESCRIPTION, DISORDER FROM PATlENT WHERE NAME = “doqjohn” The DD then submits this modified query to the SQL interface which in turn submits a request for the data to the DB and gathers the returned data into the tuple (THEOPHYLLIN,ASTHMA) The SQL interface returns this tuple to the DD and the new about-to-be-released

fact and sensitive

PRESCRIPTION[val: & DISORDER[val:ASTH

fact are generated:

THEOPHYLLIN]

MA] The DD then tills in details it knows about the sensitive fact in general:

DISORDER[val:ASTHMA, symptoms: SYMPTOM [val:SHORTN ESS_OF_BREATH], SYMPTOM[val:SWOLLEN_BRONCHl]] This provides more information in choosing the most likely transformation.

Query

that can be used by the heuristic candidate c-terms for tiu-ther

Control&d Response

c)

Deduction Detector

interface

[Security]



Query

Figure

3:

System

the query and mosensitive attributes

Modijied Query

Architecture

159

/ 4

PRESCRIPTION[val: THEOPHYLLIN] PRESCRIPTION

is a DRUG and Upward Label Substitution

application

produces

DRUG[val:THEOPHYLLIN] DRUG_THERAPY

constraints

DRUG_THERAPY[drugs: DRUG_THERAPY

Identi~cation

application

produces

DRUG[val:THEOPHYLLIN]l

is a THERAPY

THERAPY[drugs:DRUG[val:

SYMPTOM

and Concept Connection

and Upward Label Substitution

application

produces

THEOPHYLLIN]]

constraints

and Concept Connection

sYMPTOM[treatments:

Indentijication

application

produces

THERAPY[drugs:DRUG[val:THEOPHYLLlN]]l

SWOLLEN_BRONCHl

symptom and Restrictor

Introduction

application

produces

SYMPTOM[val:SWOLLEN_BRONCHl, treatments: THERAPY[drugs:DRUG[val: THEOPHYLLIN]], dose: DOSAGE[mg :300,freq:DAlL~ ASTHMA

defect and Concept Connection

Identification

application

produces

DEFECT[val:AsTHMA,symptoms:SYMPTOM[val:SWOLLEN_BRONCHl],

SYMPTOM[val:SHORTN DEFECT is a DISORDER

ESS_OF_BREATH]] and Upward Label Substitution

produces

DISORDER[val:ASTHMA]

Figure

4:

Linkage

between

PRESCRIPTION

and DISORDER

It should be noted that this linkage is not found directly except in the best of eases. At each step of the transformation process, each rule is applied to the transformation currently deemed conceptually closest to the sensitive fact and a set of single-step transformations is produced. This new set is combined with all unexplored transformations and the transformation of this set that is deemed conceptually closest to the sensitive data is heuristically chosen and the process is reiterated. As stated previously, if No Ati”bute Class&ation was used, the system would follow a slightly different path of operations. When the sensitive data value was returned, the system

The DD then calls on its inference mechanism to attempt to link these facts. This linking process is accomplished by applications of a number of transformation rules previously discussed and is detailed in Figure 4. The bold text in the figure indicates the chrmges made to the previous fact in the current trattsformation. The effect of the heuristic can be seen in Figure 4. The transformation of the PRESCRIPTION c-term to the ftrst SYMPTOM c-term is produced by blind search since they are all significantly conceptually distant from the sensitive DISORDER c-term From this point on, each new transformation in the path is conceptually closer to the sensitive c-term due to applications of rhe heuristic. Since a unrnnbiguious path can be found tkom the non-sensitive c-term to the sensitive c-term, a linkage is found between the non-sensitive and sensitive data and the possibility of deduction has been detected warranting the invocation of deduction controls. If such a link had not been found within a system-defied litnit, the data returned fi-om the SQL interface (less the sensitive data) could simply be returned to the user as requested. If a set of disorders, including the patient’s actual disorder, could be linked with the non-sensitive data, it would have be determined if “enough” ambiguity (system defined) was present to mask the sensitive data.

would

need

to ch=k

how

sensitive

this

wduc

was.

This

sensitiv-

ity value would have to be stored in the KB. If the sensitivity of the value was above some systemdefmed limit, the inference mechanism would be called, otherwise, if the sensitivity of the value was below this limit, it could simply be released to the user. This procedure is based on the intuition that some of the values of the sensitive attribute are less sensitive than others, say that the patient is suffering horn influenza versus depression.

160

5

System Status

A prototype system, based on the Sensitive Attn”bute Classification scheme, has been implemented on a Sun workstation in Common Lisp and interfaced with the Oracle relational database system. A small KB and DB pertaining to the medical domain have been provided, of which the examples in the current work are subsets. Future testing will be based on more realistically sized KB’s and DB’s. A number of queries have been tested, including the example presented in this work. These tests include queries that directly ask for sensitive data, queries that ask for non-sensitive data in which no linkage to sensitive data can be found, queries that ask for non-sensitive data in which a linkage to sensitive data can be found, and queries that find a linkage tkom the non-sensitive data to a set of data that includes the sensitive data. In all appropriate cases, deduction is detected. Such detection does contribute to query response time but not unduly when this increase is weighed against the measure of security provided.

6

Detection

and Control

condition requires that each value have an associated probability that corresponds to the likelihood that this non-sensitive value will actually be found in conjunction with the given sensitive value in the relation. Such uncertainty of knowledge is not modeled in the current KB and therefore this condition is not considered. The inclusion of such uncertainty is an obvious next step in our research. So far, we have considered how to handle deduction detection when a query releases a single new fact. A simple extension to this method is postulated to handle cases where more than one fact is released by a single query (e.g., PRESCRIPTION and THERAPY). It is based on the intuition that the sensitive values that could be linked to both simultaneously are exactly those values that are in the intersection of the values that can be linked to both individually. First, each new fact is individually linked with values from the sensitive attributes and each sensitive value linked to it is collected into a set. The intersection of each of these sets produces the final set that contains sensitive values that are consistent with all the non-sensitive values. This set is then cheeked for sufficient numbers to mask the sensitive value. If so, it is nonsensitive values are released, otherwise deduction control is invoked.

6.2

6.1 Deduction

Detection

Notice in the previous example, a given value of PRESCRIPTION will most oflen link with its corresponding value for DISORDER. In fact, the only times such a linkage could not be produced would be when the link is not known by the KB or when the value of DISORDER is *unknown*. The latter might occur if the attending physician was not sure of the disorder but was simply prescribing something for a symptom that the patient suffered. If we are to invoke deduction controls whenever such a link occurs, we will seemingly be controlling ahuost every value of PRESCRIPTION and, therefore, circumventing the niinimal restriction of non-sensitive data policy we desire! What actually occurred in the example was that a single value of DISORDER, asthma, linked to the given value of PRESCRIPTION, theophyllin. If the value of PRESCRIPTION had been say, darvon, then it would have linked to any value of DISORDER that involves pain. To be sure, the actual value of the sensitive attribute DISORDER would be among those linked (if known to the system) but the certainty of the user’s deduction would be severely curtailed if there was a sufficient number of other, more or less equally likely, possible values of DISORDER that linked with this value of PRESCRIPTION. Assuming Sensitive Attribute Classification, this leads to the conclusion that deduction controls need only be invoked when 1. there is an insufficient number of values of the sensitive attribute that link with the current value(s) for the nonsensitive

attribute(s),

2. some value(s) current stantially

attribute

of the non-sensitive

more likely

Controls

concerned with the detection of the possibility of deduction. This is the case because deduction controls are domain dependent whereas the system proposed can be used in any domain with a change of KB. The typical types of controls that could be invoked on the detection of possible deduction are: 1. withhold

the non-sensitive

attribute’s

value and state that

attribute’s

value and state that

the value is sensitive 2. withhold

the non-sensitive

there is no value 3. provide

some innocuous,

but incorrect,

value instead

In the example domain of the current work, it is not clear which is the best choice. The fwst leads to the partial deduction that the DISORDER value is sensitive. Release of this simple fact could cause the patient ha-m. Further, the second and third choices could lead to dangerous circumstances in relation to the patient’s well-being. What is clear is that the solution to the problem of what types of controls to invoke upon the detection of possible deduction needs to weigh a large number of domain-dependent factors against the sensitivity of the data being proteeted before any decision is made concerning their implementation.

6.3

Temporal

Deduction

Detection

and

of the sensitive

value(s)

Deduction

The current work is strictly

Constraining the deduction problem to single queries is a strategy to help bring the main issues into sharper foeus and show that detection of deduction is possible. It is obvious that this work will need to be extended to include queries presented over time. Since the facts released to users are persistent it would be possible for users to ask a series of queries that would individually not be particular usefid in deduction but in combination could result in the deduction of sensitive idormation. Research in maintaining structured audit trails [Jajodia90] seems to point to a possible solution to the problem. Another pasible solution lies in the maintenance of individual ICI% for each user. These could contain the facts released to the user and could be used in tandem with the main KB when attempting to

that link with the attribute

are sub-

to be associated with it.

(If No Attribute Class@cation was used, the sensitivity level of the sensitive value would have to be beyond a system-defmed level before any of these conditions were checked.) The fwst condition is considered in the current work— this inferencing ambiguity directly influences the detection of deduction. Deduction detection is only indicated if the number of values linked is above some system-defmed limit. The second

161

model the deductions possible by that particular case, this problem is left to future research.

deduction of sensitive data. Such a system can provide a high degree of security with less restrictive information flow than previously achieved.

user. In any

Related Work

7

Acknowledgement Traditionally, the topic of inference control in data security has focused on the problem of inferring cotildential data via agin a statistical database and cross-reference gregation [Denning79, Denning83, Adam89, 1eong89, MatlofR9]. Sucha focus is natural inthatthe problem, constrained in this manner, seems to be more understandable and therefore solutions more forthcoming. It should be noted that this problem is only a subset of theproblem ofdeductionin general. That is, inferences made horn the cross-referencing of statistical summaries can be made only ifthe user has some previous knowledge atmut an object in the database (e.g. inferring a classified fact about someone given there is only one person in the database that has some number of unclassified attributes) andunderstands thesemantics of the statistical fimctions used to produce these summaries. These assumptions, although implicit in all such research, have not been explicitly stated. We seek to explore the problem of inference beyond the tight limits previously imposed upon it. Specifically, the removal of the constraint imposed that requires the database in question to be statistical and the inclusion ofreal-world knowledge possessed by a user into the inference problem considerably extends the traditional boundaries. The use of expert system technology has been exploredin relation totheproblem of inference. Notably, the Database Inference Controller [Buczkowski90] is an expert system tool for the static inspection of a multilevel secure database that attempts to determine the probability of inferring classified data from unclassified data, Although it does succeed in reducing the risk of inference, it does so at the cost of increased restriction of data in the database stemming from static classification at the level of the attribute. [Morgenstem87] is an early paper that deals with the problem of inference in a non-statistical database. It attempts to bring the rigor of the Bell and LaPadula @el175] model of security into the realm of inference control by introducing a series of formal notions to fully delineate the problem. A set of general algorithms is offered that, although interesting, are largely abstractions whose implementation is impractical. Other approaches designed to alleviate the interence problem recognize the need for some artificial intelligence techniques in its solution but at best these only provide for static detection of undesirable paths from unclassified data to classified data [Bonyun90, Haigh90, SU90, Henning89, Hinke89]. None provide the dynamic, query-time, detection required for maximum data access and none consider a classification granularity freer than the attribute. 8

[Adam89]

can be generalized

to include

secure

search

that

has considered

that

statically

threat

base

that

the database. simulate

the the

problem

a DB

ACM Computing [Anderson90]

Anderson,

Distance.

Connecticut, [Anderson91]

M. E., Response Evaluation Dialogue

of Conceptual

Manager Masters

by the

integration

real-world

such knowledge

deduction

processes of

non-sensitive

has produced

to dynamically of a database

knowledge

about

and an inference of a user data

can alert could

data

in

process

to

the lead

Bell, D. and LaPadula, Technical

~iskup90]

an of the

on Developing

L. J., “Secure

Foundations

and

D.C,

Biskup,

USAF,

Guidelines”,

Database

Systems:

Model”,

Vols. 1&2, Electronic

1973.

J., “Protection

Information

Computer

& A Mathematicrd

Report ESD-TR-73-278,

in Medical

of Privacy

and Contldentiality

Systems: Problems Secun”ty, III:

Spooner, D.L. and Landwehr,

and

Status and Prospects,

C., editors,

North-Holland,

1990. ~onyun90]

Bonyun,

D.A.,

Secure Databases”, Prospects,

“Using

EXCESS

Database

as a Framework

Security,

III:

Spooner, D.L, and Landwehr,

North-Holland, [Buczkowski90]

for

Status and

C., editors,

1990.

Buczkowski,

Controller”,

Database

L. J., “Database Security,

Spooner, D.L. and Landwehr,

III:

Inference

Status

and Prospects,

C., editors, North-Holland,

1990. @lemning79] Denning, Computing @Mming83] ~aigh90]

D.E. and Denning,

Survqvs, Vol.

Denning,

for Statistical

P.E., “Data

D.E. and Schlorer,

Databases”,

Securi@, III:

1979.

J., “Inference

Computer,

Haigh, J.T. et. al., “The LDV Database

Security”,

11, No. 3, September

Controls

July 1983

Approach

to Database

Status and Prospects,

C., editors, North-Holland,

1990. &Ienning89] Henning, R.R and Simonian, R.P., “Security Analysis of Database Schema Information”, Database Securi~,

II: Status and Prospects,

North-Holland, ~e89]

Hinke,

Landwehr,

C. E., editor,

1989.

T.H., “Database

Inference

Approach”,

Database

Landwehr,

C. E., editor, North-Hollamd,

[Ieong89]

system to

Proceedings

Conference

Expert System Programs,Washington,

Systems Division,

the

the

of

Database for Two

Communication”,

International

Mathematical

stat-

knowl-

Thesis. The University

1991. ~el173]

a number

detect

in a

via the Computation

with a Relational

Way Man-Machine IEEE/ACA4

Re-

and

1989.

M.E. and Shin, D. G., “Integrating

Interface

Managing

Study”,

1990.

Anderson,

Intelligent

J.C., ‘

Suggest Documents