Design Models and the Complexity of the Testing ...

2 downloads 0 Views 199KB Size Report
Testing is a vitally important activity within software engineering, but it is also a ..... By contrast, the mechanism introduced in [21] to asure mutual exclusion (i.e. ...
Design Models and the Complexity of the Testing Problem for Distributed Systems Joaquin Aguado and Anthony J. Cowling Department of Computer Science, Sheffield University Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK

Abstract. A design framework for producing testable implementations for distributed systems is analysed. The framework is based on modelling individual processes as stream X-machines, with their associated testability results, and the computational complexities of two integration testing approaches are compared. For the bottom-up approach of using the communicating stream X-Machine system model it is shown that this testing problem is an exponential time one, while for the top-down approach of using the Data Flow Algebra model it is shown that the equivalent problem is a polynomial time one.

1. Introduction Testing is a vitally important activity within software engineering, but it is also a difficult and time-consuming one. For instance, according to Koné et al. [1], telecommunication companies are paying up to 50% of the overall development cost for testing their communication software. This is because distributed and concurrent systems pose particular problems for testing, as the high degree of non-determinism means that errors are not easily detected, and so traditional testing techniques are of limited use for such systems [2]. These techniques therefore need to be combined with formal methods of validation and verification, but the testing of systems modelled in terms of data flows and parallel components has been proved to be difficult to represent in current languages (i.e. Petri Nets and pure CCS) [3]. This paper is concerned with an alternative approach to developing testable implementations of such systems, that is based on modelling the individual components within them as X-machines. This X-machine model was originally introduced by Eilenberg [4] more than twenty-five years ago. The machine was presented as an alternative to Finite State Machines, Pushdown Automata, Turing Machines and other types of those kinds of models. It was not until 1988, however, that this form of abstract model was used as a possible specification language by Holcombe [5]. The X-machine is based on the concept of a finite state machine that controls actions on a data structure. The stream X-machine (SX-M for short) is a subclass of the X-machine model that was proposed by Laycock [6]. A basic model for assembling X-machines into a communication system was introduced by Barnard et al. [7]. The communicating X-machines that they describe are X-machines with one or more input and output ports, where each output port of each machine is connected to an input port of another machine by a channel. A data item or a signal can be transmitted through a channel. This basic model for communicating X-machines was extended later by Balanescu et al. [8] with the name Communicating stream X-machines systems (CSXMS henceforth) in order to derive the input-output relationships for such systems (i.e. the output corresponding to an input sequence). A CSXMS consists of a collection of communicating machines together with a communication subsystem (i.e. communication matrix). The growing interest in CSXMS models for distributed systems comes from the fact that a reliable technique for testing SXM exists ([9] and [10]). This technique views testing is the activity of obtaining information from an Implementation Under Test (IUT), and it is based on the W-method [11] and Wp-method [12]. The approach that these methods take is to test the response of the IUT to all the necessary sequences of input events that can be generated from the specification (which is expressed as an SX-M), so as to establish the conformance of the outputs from the IUT to this specification. Recent investigations have shown that this SX-M testing approach can be extended to the CSXMS. In particular in [8] it is proved that the CSXMS model has the same computational power as the SX-M model, which in principle allows the application and extension of existing testing strategies for SX-M (e.g. in [11], [10] and [13]). This paper investigates the practicality of this testing approach, and in particular it addresses the issue of its computational complexity. It shows that in practice the direct conversion from one model to the other faces a combinatorial explosion problem, resulting in a SX-M with an exponential number of states. It also analyses the possibility of a particular reduction by means of some class of equivalence, namely independence, and this reduction is shown here to be an NP-complete problem. Consequently, while the equivalence between CSXMS and SX-M is theoretically important, these results demonstrate that for practical purposes alternative approaches are required, in order to reduce either the search effort or the exponential explosion of the state space, or both. The paper therefore goes on to propose one alternative approach to providing a testing 1

method of lower complexity for distributed systems, while retaining the established benefits of the SX-M model for individual components. The hypothesis underlying this alternative approach is that the bottom-up nature of the CSXMS can be extended with extra information about the system behaviour from a complementary top-down model. Then, the combination of both models can guide the generation of testing sequences for the system but with a lower cost than the direct transformation from CSXMS to SX-M. The eventual aim is to unify both specification formalisms into a design framework that can produce testable implementations, following the methods that have been already developed for SX-M. For the purposes of this paper, though, the analysis of this unified approach is restricted to developing a basic computational complexity model, so as to establish that in this approach the testing problem becomes a polynomial time one, rather than an exponential time one. The specific top-down model that is proposed for this purpose is that of a Data Flow Algebra for discrete data flow systems. This approach develops an algebra of data flows which can be considered orthogonal to process algebras, in the sense that it is data-oriented, but emphasises the flows of data through a system as a whole, rather than into and out of the individual processes within it. The data flow algebra model is also complementary to the CSXMS one, in that both of them represent explicit patterns of communication, in the form of communicating states and communicating transitions for the CSXMS case, and in the form of actions (i.e. data flows) and sequences of actions for the data flow algebra. The rest of the paper is therefore organised as follows. Section 2 gives the basic definitions of the SX-M and CSXMS models. In section 3 the behaviour for the CSXMS is discussed from the perspective of changes of configuration (i.e. changes of global states). Section 4 discusses independence and dependence criteria with respect to the changes of configurations, introducing for CSXMS the concept of executions that has been commonly used for distributed systems, and showing how the optimisation in terms of compacting “equivalent” behaviours can be treated from a graph theoretical perspective. Section 5 presents the proofs of the main results, that the construction for transforming a CSXMS to an equivalent SX-M proposed in the literature [8] faces a combinatorial explosion problem, however it is approached, and so is impractical. Section 6 provides a brief introduction to the data flow algebra. In Section 7, the general design framework is presented for obtaining testable implementations for distributed systems, based on the combination of the CSXMS and data flow algebra approaches. Section 8 summarises the conclusions of the paper.

2. Stream X-machines and Communicating Stream X-machines Systems The intuition of an X-machine is straightforward; in general, given a certain “computable” problem a finite state machine with a set of states is created, and a basic data set is identified. The transitions between states contain processing functions that operate on this data. In this way, the X-machine integrates the control and the data processing while at the same time allowing them to be specified separately [9]; additionally, the data space is independent of the control structure [14]. The set of processing (partial) functions Φ is called the type of the machine. The type of the machine defines the basic operations that the machine is capable of executing over the data. The presence of partial functions and a data set introduce the possibility of identifying a number of classes of X-machines that can be defined by means of restrictions on the data set and on the type of the machine. In an SX-M the inputs and outputs are streams of symbols. The machine processes its orderly input stream, which determines the next control-state depending on the current control and memory states. At the same time, the machine produces an orderly stream of outputs and updates the internal memory [10]. Definition 1: A stream X-machine (SX-M) is a tuple (X, Σ, Γ, Q, M, Φ, F, I, T, m0) where: • M is a possibly infinite set called the internal memory. • The fundamental data set X is of the form: X = Γ* x M x Σ* where Σ and Γ are the input and output alphabets respectively. • Q is the finite set of control states. • Φ is the set of processing relations on X of the form Φ : P(X ↔ X). The type Φ is of the form: [9] ∀ g* ∈ Γ*, ∀ m ∈ M, φ(g*, m, ) = ⊥ ∀ g* ∈ Γ*, ∀ m ∈ M, ∀ h ∈ Σ, ∀ s* ∈ Σ*, if ∃ m’ ∈ M, t ∈ Γ that depend on m and h then φ(g*, m, h::s*) = (g*::t, m’, s*) otherwise φ(g*, m, h::s*) = ⊥ Each transition function φ depends on the current memory value m and the head (i.e. the first symbol) of the input stream h. It adds a symbol t to the end of the output stream and produces a new memory value m’. Also φ removes the head of the input stream. 2

• F : Q x Φ → P(Q) defines the next state function, this is often described by means of state-transition diagram. • I and T are the sets of initial and final states respectively: I ⊆ Q, T ⊆ Q. The X-machine and associated models have been applied to a large number of specification and testing problems, e.g. in [3], [9], [10], [15], [16], [17], [18], [19] and [20]. The CSXMS is a relatively new specification formalism that has been developed and presented for specifying distributed systems. The CSXMS combines the advantages of the X-machines (i.e., to model both the system and data control structure while allowing them to be specified separately) with the possibility of sending and receiving messages among several of these abstract machines. Definition 2: A CSXMS with n components is a tuple Wn = (R, CM, C0) where: • R: is the set of n = |R| components of the system of the form Pi = (Λi, INi, OUTi, ini0, outi0) ∀ 1≤ i ≤ n. From now on such components are called communicating machines. Λi is an SX-M with memory Mi (For further details see definition 5). INi and OUTi are input and output ports respectively of the ith communicating machine, with the property that INi, OUTi ⊆ Mi ∪ {λ} and λ ∉ Mi. The symbol λ is used to indicate an empty port. The initial port values are ini0 and outi0. • CM is the set of matrices of order n x n which form the variable values of a matrix variable that is used for communication between the communicating machines, so for any C ∈ CM and any pair i, j we have that the value in C[i, j] is a message from the machine Pi ∈ R to the machine Pj ∈ R. Thus C[i, j] can be thought as a variable that is used as a temporary buffer. • All the messages passing between communicating machines Pi and Pj will be values from Mi (i.e. the memory of Λi and Λj). The symbol λ in the matrices is used to indicate that there is no message. The symbol @ is used to indicate a channel that is not going to be used (e.g. from a communicating machine to itself), so that there is no need to send such kind of messages. The elements of the matrices are therefore drawn from M ∪ {λ, @} where: w

M=

U Mi and λ, @ ∉ M. i =1

• C0 is the initial communication matrix where it is defined that C0[i, j] = λ if communication is allowed between machines Pi and Pj, otherwise C0[i, j] = @ (e.g. C0[i, i] = @). • The ith-communicating machine can only read from the ith column and write to ith row of the matrix (see the definition of communicating functions in definition 4). Definition 3: For any C ∈ CM, any value v ∈ M ∪ {λ} and any pair of indices 1 ≤ i, j ≤ n, with i ≠ j • If C[i, j] = λ an output variant of C, denoted by Cij ⇐ v is defined as: (Cij ⇐ v)[i, j] = v and (Cij ⇐ v)[k, m] = C[k, m] ∀ (k, m) ≠ (i, j) • If C[i, j] = v a input variant of C, denoted by ⇐ Cij is defined as: (⇐ Cij)[i, j] = λ and (⇐ Cij)[k, m] = C[k, m] ∀ (k, m) ≠ (i, j) The variants in this definition effectively describe the allowable transitions from one matrix to another. The concept of variants is fundamental to the concepts of configuration and change of configuration described below. Definition 4: A Communicating machine is a tuple P = (Λ, IN, OUT, in0, out0) where: Λ = (Σ, Γ, Q, M, Φ, F, I, T, m0) is a stream X-machine with the following properties: • Σ and Γ are finite sets called the input and output alphabets respectively. • The set of finite states Q is required to be partitioned such that Q = Q’ ∪ Q” where Q’ is called the set of processing states and Q” is the set of communicating states. Q’ ∩ Q” = ∅. If q’ ∈ Q’ then all the functions emerging from q’ are processing functions. If in q’ several functions can be applied one of them is arbitrarily chosen otherwise (i.e. no function can be applied) the communicating machine blocks and the entire system also. If q” ∈ Q” then all the functions emerging from q” are communicating functions. If in q” several functions can be applied one of them is arbitrarily chosen otherwise the machine does not change the state and waits until one function can be applied. • M is a (possibly infinite) set called the memory. 3

• The type of the machine is the set Φ = Φ’ ∪ Φ” where Φ’ is called the set of processing functions or ordinary functions and Φ” is the set of communicating functions where Φ’ ∩ Φ” = ∅. The elements of Φ’ are relations (partial functions) of the form: Φ’: IN x M x OUT x Σ → Γ x IN x M x OUT (I) A processing function φ’ acts as in an ordinary SX-M that is: ∀ x ∈ IN, ∀ m ∈ M, ∀ y ∈ OUT φ’(x, m, y, ) = ⊥ ∀ x ∈ IN, ∀ m ∈ M, ∀ y ∈ OUT, ∀ h ∈ Σ, ∀ s* ∈ Σ*, ∀ g* ∈ Γ* if ∃ m’ ∈ M, t ∈ Γ, x’ ∈ IN, y’ ∈ OUT that depend on m, h and x then φ’(x, m, y, h::s*) = (g*::t, m’, x’, y’, s*) otherwise φ’(x, m, y, h::s*) = ⊥ (II) A communicating function φ”: Φ”: IN x OUT x CM → IN x OUT x CM can be one of the following forms: (II.A) An output move. This form is used by machine Pi to send a message to machine Pj, using C[i, j] as a buffer. The set of moves from the output port of Pi to C[i, j] are called output moves and are denoted by MVOi ∀ 1 ≤ i ≤ n. Thus MVOi = {mvoij | 1 ≤ j ≤ n, i ≠ j} where mvoij: OUTi x CM → OUTi x CM is defined by: ∀ y ∈ OUTi , ∀ C ∈ CM if ∃ j | j ≠ i and y ≠ λ and C[i, j] = λ (i.e. y is not empty and C[i, j] is empty) mvoij (y, C) = (y = λ, (Cij ⇐ y)) (i.e. the result is an output variant of Cij) (II.B) An input move. This form is used by machine Pi to receive a message from machine Pj using C[j, i] as a buffer. The set of moves from C[j, i] to the input port of Pi are called input moves and are denoted by MVIi ∀ 1 ≤ i ≤ n. Thus MVIi = {mviji | 1 ≤ i ≤ n, i ≠ j} where mviji: INi x CM → INi x CM is defined by: ∀ C ∈ CM if ∃ j | j ≠ i and x = λ and C[j, i] ≠ λ (i.e. the input port is empty and C[i, j] is not) mviji(λ, C) = (x = C[j, i], (⇐ Cji)) (i.e. the result is an input variant of Cij) • The set of communicating functions Φ” is made up as follows: Φ” ⊆ MVOi ∪ MVIi ∀ 1 ≤ i ≤ n. Since mvoij can be executed only when C[i, j] is empty and mviji can be executed only when C[i, j] is not empty, it is assumed in the model that all the operations concerning the same cell of the communication matrix are done under mutual exclusion. • All the relations (partial functions) of Φ = Φ’ ∪ Φ” are extended to: ΦE: IN x M x OUT x CM x Σ → Γ x IN x M x OUT x CM All the extended relations (partial functions) will operate only on the variables for which the relation (partial function) is defined leaving the rest unchanged. For uniform treatment, in what follows the same notation will be used (we will use Φ instead of ΦE) where the relations (partial functions) will be distinguish by their domains. • The next state function is: F : Q x Φ → P(Q) with dom(F) ⊆ (Q’ x Φ’) ∪ (Q” x Φ”) • I ⊆ Q and T ⊆ Q are the sets of initial and terminal states respectively. m0 ∈ M is the initial memory.

3. Changes of Configuration To avoid confusion between the states and transitions of a single communicating machine and the states and transitions of the CSXMS (i.e. global states and global transitions), henceforth, the following conventions are used. The term configuration is utilised for the set of all possible global states of the system. The term state is used for each component (i.e. for each communicating machine) and the term machine-state is used for the control state of an SX-M embedded in a component. The term transition is used to refer to the allowable global transitions. The term event is used for the change of state inside a component. The terms next state function and type-function are utilised for SX-Ms. 4

Definition 5: The state of a component P = (Λ, IN, OUT, in0, out0) ∈ R of a CSXMS has the form cf = (m, q, x, y, s, g) where: • m ∈ M is the value of the memory of P. • q ∈ Q is the machine-state of P. • x ∈ IN and y ∈ OUT are the values of the ports of P. • s ∈ Σ* is the input sequence of P. • g ∈ Γ* is the output sequence of P. Definition 6: A configuration of a CSXMS Wn = (R, CM, C0) has the form CF = (cf1, cf2,…, cfn, C) where ∀ 1 ≤ i ≤ n, cfi = (mi, qi, xi, yi, si, gi) is the state of the ith component and C is the current communication matrix of the system. A configuration is initial iff ∀ cfi, mi = mi0, qi ∈ Ii and C = C0. A configuration is final iff ∀ cfi, si = and qi ∈ Ti. To explain informally, a configuration of a CSXMS consists of the states of all the components together with the collection of messages in transit. Initial configurations are those in which each component is in an initial machine-state and the communicating matrix is empty. A configuration is a final if the input stream is empty and the machine-state is a final one for all the components. A change of configuration then involves the application of the type-functions φi in some of the communicating machines, as follows. Definition 7: A change of configuration CF = (cf1, cf2,…, cfn, C) CF’ = (cf’1, cf’2,…, cf’n, C’), where ∀ 1 ≤ i ≤ n, cfi = (mi, qi, xi, yi, si, gi), cf’i = (m’i, q’i, x’i, y’i, s’i, g’i), is possible if CF ≠ CF’ and ∃ c0, c1, …, cn ∈ CM with c0 = C and cn = C’ such that ∀ 1 ≤ i ≤ n either • cfi = cf’i and ci = ci-1 or • ∃ φi ∈ Φi such that qi’ ∈ Fi(qi, φi) and φi(xi, mi, yi, ci-1, hi) = (ti, x’i, m’i, y’i, ci) where si = hi::s’i, g’i = gi::ti, hi ∈ Σi ∪ {ε} and ti ∈ Γi ∪ {ε} A single change of configuration of a CSXMS is defined as a change of configuration in which one and only one typefunction φi ∈ ΦI is applied. A single change of configuration is denoted by CF 1 CF’ or as CF 1 CF’ if it is appropriate to identify the type-function associated with the change. The reflexive and transitive closure of is denoted by * and the reflexive and transitive closure of 1 is denoted by 1*. The change of configuration concept has the following intuitive meaning in the literature [8] and [21]. Let CF0 be the initial configuration when the system initiates its execution, from that point in time all the components may start their executions concurrently changing from state to state whilst they are almost certainly sending and receiving messages. This is the reason why a configuration is defined as the Cartesian product of the local states of the components and the communication matrix in a given instant, nevertheless, the events can occur at different times and with different duration. As a result, a global observer who has access to the whole system can note that some components may be in specific states while others are in the course of a transition from one state to another (i.e. executing a function). In order to establish what a configuration CF(t) in time t is, let us assume that a change of state in a component occurs atomically at the end of each function, and such a new state is the current state of the component until the next function terminates. In the same way, it is assumed that the matrix is modified (when required) at the end of the corresponding function. The key point about a change of configuration respect to time (following Cowling et al. [21]) can be represented as follows. If t is the time when CF is reached and if t’ > t is the closest following instant at which a component finishes the execution of a function then CF’ is the configuration at time t’. For the rest of the components that do not change their states this does not necessarily mean that such components have done nothing, but rather that they have not entirely completed executing a function. The incomplete execution of functions do not affect the complete ones mainly because the modifications of the communication matrix are done under mutual exclusion and the memories are local to each component. This fits with our assumption that every change of state and every modification to the communication matrix occurs atomically at the end of each function. Note. By contrast, the mechanism introduced in [21] to asure mutual exclusion (i.e. macrofunction) cannot be interpreted in quite the same manner as the one just described, since the macrofunctions perform a protocol associated with the send and receive operations. Thus, these operations are composed of a number of functions resulting in various “intermediate” states during which the matrix might be modified several times.

4. Dependence and Independence There are two sets of type-functions that can be distinguished for a configuration CF: 5

• The set of functions that can be applied, namely the applicable functions denoted by AplCF = {φ ∈ Φ | q’ ∈ F(q, φ)} ∀ cfi = (m, q, x, y, h::s, g) ∈ CF such that φ(x, m, y, C, h) is applicable and • The set of functions that cannot be applied, that is the inapplicable functions denoted by InaCF = {φ | φ ∉ AplCF}. Henceforth, the set of all extended relations (partial functions) of all the communicating machines in the system is denoted by ΦCSXMS and as one might suspect AplCF ∪ InaCF = ΦCSXMS and AplCF ∩ InaCF = ∅. Our main interest, however, has to do with the order in which the type-functions can be executed resulting in the same configuration. The components of ΦCSXMS can be classified from another perspective as: • Independents, which are the events that influence disjoint parts of the system or • Dependent, which are the events that affect common parts of the system. The significance of this intuitive observation can be formalised in terms of four relations as follows. Definition 8: For all φp, φq ∈ ΦCSXMS (1) If φp and φq correspond to different arcs of the same machine then φp idp φq and φq idp φp. (2) If φp = mvoij ∈ MVOi and φq = mviij ∈ MVIj then φp cdp φq and φq cdp φp. (3) φp Dep φq ⇔ (φp idp φq ∨ φp cdp φq). (4) φp Ind φq ⇔ not (φp Dep φq) (5) If φp = φq then φp Ind φq The relations idp, cdp, Dep and Ind are called respectively internal dependence, communication dependence, dependence and independence. Part (5) of the definition establishes that Ind is reflexive and part (4) hence implies that Dep is not. Additionally, it is not hard to see that both relations Dep and Ind are symmetrical, however, neither dependence nor independence is transitive. AplCF and Ind model two distinct aspects of a system but they can be combined in order to achieve a better understanding of how the system operates. It is convenient to view AplCF as an aggregate composed of several not necessarily disjoint subsets, each one with elements (i.e. type-functions) that are independent each other. Furthermore, each subset should be maximal in the sense that any function independent to all the others in a subset ought to be included in it. More formally: Definition 9: The set of applicable functions of a given configuration CF can be partitioned as AplCF = A1 ∪ A2 ∪ … ∪ Am where ∀ 1 ≤ i ≤ m : ∀ φp, φq ∈ Ai, φp Ind φq ; and there is no other φ ∉ Ai such that ∀ φp ∈ Ai, φ Ind φp. Each Ai is called an independent subset of AplCF and A1 ∪ A2 ∪ … ∪ Am is an independent partition of AplCF. Lemma 1: For every AplCF there is one and only one independent partition. Proof. Let n be the cardinality of AplCF. The proof is by induction over n. Base Cases: If n = 1, from definition 8 we have φ Ind φ and so there is only one independent subset A1 = {φ} resulting in a unique independent partition for AplCF. If n = 2 then there are two possibilities, if φp Ind φq then there is only one independent subset A1 = {φp, φq} otherwise there are two independent subsets A1 = {φp} and A2 = {φq}. In either case there is a unique independent partition for AplCF. Induction: Hypothesis: Let us assume that for any AplCF there is a unique independent partition A1, A2, … Aj, ∀1 ≤ j ≤ n. (note that the number of independent subsets is in the worst case equal to the number of applicable functions in AplCF). Let φ be a new type-function, which is added to AplCF by the following procedure. For all subset Ai with 1 ≤ i ≤ j, a new subset A’i is obtained. Initially A’i = {φ} then ∀ φx ∈ Ai, if φx Ind φ then A’i = A’i ∪ {φx}. Clearly, A’i \ {φ} ⊆ Ai and there are two cases. If A’i \ {φ} = Ai then A’i = Ai ∪ {φ} and the new subset A’i replaces Ai in the partition. If A’i \ {φ} ⊂ Ai then the new subset A’i is included in the partition without eliminating the original subset Ai To show that the new partition is unique, let us suppose the contrary that there is an independent subset Aw of Apl’CF = AplCF ∪ {φ} different from those obtained by the procedure. Since the union of all the subsets must be equal to Apl’CF the elements of Aw were included in several other subsets of AplCF. If φ ∉ Aw, by hypothesis Aw was in the partition before the insertion of φ and because such a set has not been modified then Aw must be also in Apl’CF, which gives a contradiction. If, on the other hand, φ ∈ Aw the subset Aw \ {φ} ⊆ Ai where by hypothesis Ai was in AplCF before the insertion of φ. Thus Ai 6

was revised in the procedure and because φ ∈ Aw then ∀ φx ∈ Aw \ {φ}, φx Ind φ this implies that Aw is in the partition, which also gives a contradiction. Finally, it is also direct that each subset obtained in this way is maximal. Corollary 3: For ΦCSXMS there is one and only one independent partition. Proof. Changing ΦCSXMS for AplCF in lemma 1. Definition 10: Given two independent partitions A = A1 ∪ A2 ∪ … ∪ Am and B = B1 ∪ B2 ∪ … ∪ Bn. It is said that A is a sub-partition of B denoted by A ≤ B, if and only if, for all Ai ∈ A, there is one Bj ∈ B such that Ai ⊆ Bj. Lemma 2: If A = A1 ∪ A2 ∪ … ∪ Am and B = B1 ∪ B2 ∪ … ∪ Bn are the two independent partitions for AplCF and ΦCSXMS respectively, then A ≤ B. Proof. From lemma 1 and corollary 1, A1 ∪ A2 ∪ … ∪ Am and B1 ∪ B2 ∪ … ∪ Bn must be unique. Now, let us assume the contrary, that there is one Ai ∈ A such that there is no Bj ∈ B with Ai ⊆ Bj. However, ∀ φp, φq ∈ Ai, φp Ind φq thus if there is no other φ ∈ ΦCSXMS and φ ∉ AplCF with ∀ φi ∈ Ai, φ Ind φi, this implies Ai = Bj for some Bj in the independent partition of ΦCSXMS. On the contrary, if there is a φ ∈ ΦCSXMS and φ ∉ AplCF with ∀ φi ∈ Ai, φ Ind φi then necessarily φ ∉ Ai and φ ∈ Bj, moreover ∀ φi ∈ Ai also φi ∈ Bj thus Ai ⊂ Bj, for some Bj in the independent partition of ΦCSXMS.

Definition 11: An execution of a CSXMS is a maximal sequence E = (CF0, CF1, CF2,…), where CF0 is an initial configuration and ∀ i ≥ 0, CFi 1 CFi+1. The sequence E is maximal if it is either infinite or ends in a terminal configuration. A partial execution EP is defined as any sub-sequence of an execution E where E0 denotes the empty partial execution and EP ≠ E0. An associated (may be partial) execution of a change of configuration CF CF’, is a sequence ρ = (CF0, CF1, …, CFn) where ∀ 0 ≤ i ≤ n-1, CFi 1 CFi+1 and CF = CF0 and CF’ = CFn. Definition 12: An implied execution ΦE = of an execution E = (CF0, CF1, CF2,…) is a sequence of applications of type-functions such that ∀ i ≥ 0, CFi 1 CFi+i. An implied partial execution ΦEP is defined in the same CF’ is a sequence way for any partial execution. An implied (may be partial) execution of a change of configuration CF Φρ = such that ∀ CFi, CFi+1 ∈ ρ we have CFi 1 CFi+1. It will be convenient to write CF Φρ CF’ to indicate a change of configuration and one of its associated implied executions. For brevity in what follows, we shall be writing ϖ(Φ) to indicate a set that contains the same elements as the implied execution Φ. Definition 13: A change of configuration of a CSXMS is an independent change of configuration, denoted by CF IND Φρ CF’, if for all φp, φq ∈ Φρ it is true that φp Ind φq. An independent change of configuration CF MIND Φρ CF’ is a proper independent change of configuration denoted by CF MIND Φρ CF’ if and only if ϖ(Φρ) contains the same elements as one of the independent subsets of AplCF. The idea of lemma 2 is illustrated by means of the abstract example presented in figure 1. There it is assumed that the independent partition for ΦCSXMS is given by B1 ∪ B2 ∪ B3 ∪ B4 and is supposed that in configuration CF0 the independent partition for AplCF0 is A2 ∪ A4 with A2 = B2 and A4 ⊂ B4. If a proper independent change of configuration of the form CF0 IND Φ4 CF1 takes place, where ϖ(Φ4) = A4 then a new configuration CF1 is reached where AplCF1 = A2 ∪ A3. The changes M to CF2 and CF3 continue in the same manner. The complexity properties of the problem of finding an independent partition for AplCF (IPP from now on) can be derived from the following graph theoretical perspective. The concept of an independent subset can be regarded as a clique, or as an independent vertex set, and thus IPP can be reduced to another well-known problem, namely the clique cover problem (CCP for short). The general strategy for transforming the IPP into CCP may be formulated as follows: (1) Each type function of AplCF is represented using the notion of a vertex in the graph. (2) The edges are obtained between a pair of type-functions/vertices that are independent each other. (3) Once the entire graph is constructed all its cliques must be obtained.

7

Independent partition for ΦCSXMS

B1

B1

CF0

B1

CF1

A2 A4

A2

B4

B2

CF2

B3

B4

A3

B2

B2

B3

AplCF0 = A2 ∪ A4

A3

B4

B3 AplCF2 = A1 ∪ A3

AplCF1 = A2 ∪ A3 CF0 =MIND Φ4 CF1 where: ϖ(Φ4 ) = A4

A1

CF1 = MIND Φ2 CF2 where: ϖ(Φ2 ) = A2

Figure 1 As a consequence, the cliques represent all the independent subsets of the independent partition of AplCF. Thus, all the vertices in the same clique must correspond to the set of type-functions in an independent subset of AplCF and the total of cliques in the graph must be equal to the number of independent subsets in the independent partition of AplCF. Several techniques to find cliques of a given undirected graph have been studied (in [22], [23], [24] and [25] among others).

5. Testing from CSXMS to Stream X-machines The construction of a SX-M from a CSXMS Wn = (R, CM, C0) that is proposed by Balanescu et al. [8], following theorem 1 of that paper, can be summarised as: SX-M = (Σ, Γ, Q, M, Φ, F, I, T, m0) where: n

•M=



(Mk x INk x OUTk) x CM



(Σk ∪ {ε}) and Γ =



Qk



(Φk ∪ {ε}), where Φ : M x Σ ↔ Γ x M. The type Φ is of the form:

k =1 n

•Σ=

n

•Q=



(Γk ∪ {ε})

k =1

k =1 n

k =1 n

•Φ=

k =1

(φ1, φ2,…, φn)[((m1, in1, out1),…, (mn, inn, outn), C), (h1,…,hn)] = ((t1,…,tn), ((m’1, in’1, out’1),…, (m’n, in’n, out’n), C’) if for any Λk either φk = hk = tk = ε (i.e. no function is applied) or φk(xk, mk, yk, C, hk::sk*) = (gk*::tk, xk’, mk’, yk’, C’) (i.e. function was defined and applied). and if for all Λi, Λj where φi ≠ ε, φj ≠ ε then φi and φj are not communicating function corresponding to the same matrix location. • F : Q x Φ → P(Q) is defined as (q’1, q’2,…,q’n) ∈ F((q1, q2,…,qn), (φ1, φ2,…, φn)) if If φk = ε then q’k = qk otherwise qk ∈ Fk(qk, φk) • I = {(q1, q2,…,qn) | qk ∈ Ik} and T = {(q1, q2,…,qn) | qk ∈ Tk}

8

It is not hard to see that the SX-M obtained from the previous algorithm works in the same way as the CSXMS, where all component functions φk acting in parallel. Moreover, it is direct from the above construction and from definition 13 that every φ ∈ Φ in the new SX-M must correspond to an independent change of configuration of the CSXMS. Now changes of configuration can have different granularities, which range from a single change to a proper independent change of configuration. The requirements for a test method based on this kind of construction do not prescribe this granularity, and so there is a choice for the number of type-functions of the CSXMS that should be collapsed together to produce a new type-function for the equivalent SX-M. Let us analyse the two extremes of this choice. On one hand, if we consider just single changes of configuration, the obvious drawback is that the resulting number of states for the SX-M grows exponentially. In other words, if each component of the CSXMS has at most s = |Qk| states, then the number of states for the new SX-M will be O(sn) where n is the number of components of the system. On the other hand, in order to reduce the number of states as much as possible, we could collapse as many type-functions of the system as possible into one type-function of the new SX-M, which in the best of the cases would give us O(s) states. Such type-functions thus correspond to proper independent changes of configuration, and so to determine them it is necessary to solve IPP. Unfortunately, with regard to complexity, it has been proved by Karp [26] (reported in [27]) that CCP is NP-complete, implying that therefore IPP is too, and hence is intractable, since no member of the family of NPcomplete problems is known to have a polynomial time algorithm [27].

6. Data Flow Algebra As described in the introduction the approach that is proposed for avoiding this exponential explotion problem is to unify the bottom-up CSXMS model with a top-down model of the data flows within the system. The term data flow is commonly used for two very different kinds of distributed and concurrent systems, which for this purpose will be termed discrete and continuous data flow systems respectively. The common feature of both of these kinds of system is that they consist of concurrent processes that are connected by unidirectional data paths (usually referred to as channels), along which data flows between the processes. Where they differ is in the behaviour of the processes and the data flows with respect to time. In what is being called here a continuous data flow system (sometimes referred to as a signal flow system) there is a notional global system clock, such that at each notional clock tick some data will flow along each channel. Consequently, the behaviour of each process is essentially that at one clock tick it accepts items of data from each of its input channels, and then performs some computation such that at the next clock tick it is able to send items of data out on each of its output channels. By contrast, in what is being called here a discrete data flow system, there is not necessarily a global clock, because at any given time instant there will only be data flowing along a small number of the channels, rather than along all of them. Thus, many of the processes may (at least notionally be inactive), and they are triggered by the arrival of an item of data. The activity that is triggered in this way may, however, involve waiting for a number of different items of data to arrive, possibly along different input channels, before finally some data is produced and output along one or more (but not necessarily all) of the output channels. This difference between the two kinds of systems has an important consequence in the way that they are modelled. In a discrete data flow system the notion of the sequence in which the data flows occur, either for a specific process or across the system as a whole, is an essential feature, and one that must be represented in any formal model. By contrast, in a continuous data flow system the behaviour is such that the idea of a sequence of flows along different channels is not a particularly meaningful one, and there are other features that are much more important to the modelling process. In terms of the CSXMS model, the behaviour of the individual SX-M components is much closer to that of the processes in a discrete data flow system, and so it is the modelling of this type of data flow system that needs to be considered here. Hence, the data flow algebra that is described here is an algebraic model for the sequences of data flows within such a system. This data flow algebra (DFA from now on) was developed originally [28] as a formal equivalent to the graphical notation of data flow diagrams (DFDs, from now on) that were used in many of the early systems analysis and design methodologies. On their own, though, DFDs do not include sufficient information to enable a useful formal model to be built, and additional information has to be added about the sequencing of the data flows. This can either be done by annotating the data flows of a DFD, as in the object interaction diagrams of UML [29], or by using different kinds of diagrammatic representations, such as message sequence charts [30] or UML sequence diagrams. Thus, the DFA also provides a formal equivalent to these diagrammatic notations, in which the system dynamics are described in terms of regular expressions and production rules that define the allowable sequences of data flows between the component processes.

9

To do this, the DFA approach is based on the idea that the formal specification of any distributed system can be structured into three levels of abstraction: • The lexical or topological layer defines the structure of the DFD, that is the components and the connections between them. • The syntactic layer is concerned with the sequences of data flows that can occur in the topological layer (i.e. within a system). • The semantic layer specifies the values that are communicated in the sequences of the syntactic layer. The topological layer is defined as the description of all the components and the data flows that can take place independently of time; such a description can be done by means of a DFD. The DFA therefore provides a formal representation for such description. Primitive actions correspond to one message flowing from one process (its source) to another process (its destination) via some channel. Such a primitive action will be written in the form: source ! channel ? destination. In particular is important to remark that an action models a complete data flow rather that just one end of it (i.e. an input or an output as in process algebras). One can describe the structure of a DFD simply by means of writing down all the actions (which can be viewed as a set of lexemes), if one uses such actions not only for describing data flows, but also for representing the elements and the relationships among them. The relationship of this with the CSXMS model is that the processes can be viewed as SX-Ms and the data flows as the communications through the matrix. In order to prove the correctness of systems, an important feature is introduced into the DFA, which is the distinction between successful and fail termination [28]. First of all, consider a system which has stopped abnormally, such a situation is almost always referred to as a deadlock and it is modelled in the DFA as an action, called the forbidden action, which is denoted by the constant symbol φ. Secondly, consider the situation where no action takes place, such a situation is called the silent action and is denoted by the constant symbol ε. Let us define S as the set of sequences of actions and let s0, s1, s2, …, sm range over S. With respect to the syntactic layer a sequence of actions is defined as an action or as a composition of actions. There are two basic forms of composition: sequential and alternative. Given two sequences s0 and s1 one writes: • s0 ; s1 for the sequential composition of two sequences of action which yield a new sequence, where the semicolon is the sequencing operator and ε is the left and right identity for this operator. • s0 | s1 for the choice between two alternative sequences of action which yields a new sequence, where the vertical bar is the alternation operator and the forbidden action φ is the left and right identity for such operator. Sometimes the set I is used to stand for an indexing set (i.e. where sequences or actions are constructed over a subscript structure). One can write {ai | ai ∈ T, i ∈ I} for an indexing set of actions and {si | i ∈ I} for an indexing set of sequences. Thus, for a set of actions ai with index i the sequence: a1 ; a2 ; a3 ; ... ; an-1 ; an is denoted,



n

i =1

ai and

For the sequence: s1 ; s2 ; s3 ; ... ; sn-1 ; sn with index i similarly



n

i =1

si

The sequential composition is not commutative, that is the order in the sequence is important. The alternative composition over an indexed set I is denoted respectively for a set of actions or for a set of sequences by: n

U

n

ai and

i =1

U

si

i =1

For any sequence of actions s, the occurrence of n repetitions of s is denoted sn. The occurrence of at least one repetition or the occurrence of zero or more repetitions of s is denoted respectively s+ and s*. 10

These operators give the basic algebra, but while this algebra can in principle describe any possible sequence of actions, there are practical situations when the analyst may want to denote the parallel occurrence of sequences. The parallel composition of two given sequences s0, s1 is denoted by s0 || s1. The operator does not contain a possibility of communication. Parallel composition is defined as the choice among all the possible interleavings of s0 and s1 where it is assumed that no two actions can occur simultaneously. The parallel composition over an indexed set I is denoted by: n



n

ai and

i =1



si

i =1

In DFA, a basic sequence is a sequence of actions constructed only with the sequential composition operator. Then every sequence can be expressed in a fully distributed form: s1 | s2 | s3 | ... sn-1 | sn where each component of the choice (i.e. each si) contains no alternations; in other words, each component is in fact a basic sequence. The fully distributed form of a given sequence is called the canonical form for an expression. Given any sequence, all the strings of actions that it generates are its associated basic sequences, and its canonical form is defined as the alternative composition of all of them. The alternative and parallel compositions are commutative. The axioms of the DFA are presented in table 1: (s ; ε) = (ε ; s) = s (s | φ) = (φ | s) = s (s || ε) = (ε || s) = s si ; (sj | sk) = (si ; sj) | (si ; sk) (s | s) = s (si | sj) = (sj | si) si | (sj | sk) = (si | sj) | sk (si || sj) = (sj || si) si || (sj || sk) = (si || sj) || sk si || (sj | sk) = (si || sj) | (si || sk)

Silent action is the identity for the sequencing operator. Forbidden action is the identity for the alternation. Silent action is the identity for the parallel composition. Alternation distributes over sequencing. Alternation operator is idempotent. Alternation operator is commutative. Alternation operator is associative. Parallel composition is commutative. Parallel composition is associative. Alternation distributes over parallel composition. Table 1

The definition of the parallel composition assumes that the sequences are independent. Nevertheless, DFA can be used to construct composition operators that are similar to those of process algebras, where two sequences have to be synchronised, so that the common actions should occur simultaneously. Given any two sequences of actions si and sj, the common alphabet of them is defined as C(si, sj) = {a | a ∈ si ∧ a ∈ sj) (i.e. the set of actions which are contained in both sequences). Let C be an alphabet (i.e. a set of actions), for any two actions a1 and a2 the synchronised composition of them over C is formally defined as: a1 M a2 = if (a1 ∈ C ∧ a2 ∈ C) then if (a1 ≠ a2) then φ (deadlock) else a1 A given sequence s can be of three types with respect to the alphabet; If all its associated basic sequences (i.e. the sequences of the canonical form) contain at least one action of the alphabet then the sequence is fully dependent on C. If at least one of its basic sequences contains one or more of the actions of C then the sequence is partly dependent on C. If none of its basic sequences contains actions of C then the sequence is independent of C. The operator DepC(s) generates the part of s that is dependent on C. In what follows the word dependent is used to refer to both fully and partly dependent, making the distinction only when it is needed. The axioms for the dependence operator are: DepC(ε) = φ 11

DepC(φ) = φ ∀ a ∈ C, DepC(a) = a ∀ a ∉ C, DepC(a) = φ DepC(a) = if (a ∈ C) then a else φ DepC(si ; sj) = if (si is dependent on C) then if (sj is dependent on C) then DepC(si) ; DepC(sj) else DepC(si) ; sj else if (sj is dependent on C) then si ; DepC(sj) else φ DepC(si | sj) = DepC(si) | DepC(sj) The operator IndC(s) generates the part of the sequence that is independent of C. If the sequence is partly dependent on C then: IndC(ε) = ε IndC(φ) = φ ∀ a ∉ C, IndC(a) = a ∀ a ∈ C, IndC(a) = φ IndC(si ; sj) = IndC(si) ; IndC(sj) IndC(si | sj) = IndC(si) | IndC(sj) Moreover, there is the next invariant: For any sequence s and any alphabet C, s = Depc(s) | Indc(s). The operator that merges synchronised two sequences si and sj over a common alphabet C is denoted si M sj and it is defined: si M sj = (Depc(si) Mc Depc(sj)) | (Indc(si) || Indc(sj)) Where, the operator Mc merges the dependent parts of both sequences. The operator Mc is defined as follows: si Mc sj = if (si = φ or sj = φ) or (si is independent on C) or (sj is independent on C) then φ else the operator is defined for alternation by a pair of symmetrical axioms The left-hand version of such axioms is: (s1 ; (s2 | s3) ; s4) Mc s5 = ((s1 ; s2 ; s4) Mc s5) | ((s1 ; s3 ; s4) Mc s5) Where Indc(s1) = s1 (which applies in the case where s1 = ε). For the sequential composition, there is one axiom: ∀ a1, a2 ∈ C and ∀ s1, s2 such that Indc(s1) = s1, Indc(s2) = s2 (s1 ; a1 ; s3) Mc (s2 ; a2 ; s4) = if (a1 ≠ a2) then φ else if (Indc(s3) = s3 and Indc(s4) = s4) then (s1 || s2) ; a1 ; (s3 || s4) else if (Depc(s3) = s3 and Depc(s4) = s4) then (s1 || s2) ; a1 ; (s3 Mc s4) else φ Nike [31] has demonstrated that the synchronised composition is commutative and associative; si M sj = sj M si (si M sj) M sk = si M (sj M sk) The restriction operator extracts from a given specification (i.e. a sequence) certain sub-sequences depending on a given set of actions [32]. Originally, the restriction operator was defined with respect to a set of processes instead of a set of actions [28] in the following way. Let s be any sequence of actions and ps a set of processes. The restriction operator denoted by s \ ps produce a new sequence, which includes only the actions that have an element of ps as their source or destination. More formally, let us define source(a) and destination(a) in terms of a pair of functions, which map any action to a DFD element (i.e. process) that corresponds to its source or destination respectively, and are undefined in any other case (i.e. when a flow 12

comes from or goes to a terminator). For instance, source(b ! y ? c) = b and destination(b ! y ? c) = c. For the restriction operator the formal definition consists of the next set of axioms, ε \ ps = ε φ \ ps = φ a \ ps = if (source(a) ∈ ps or destination(a) ∈ ps) then a else ε (s1 ; s2) \ ps = (s1 \ ps) ; (s2 \ ps) (s1 | s2) \ ps = (s1 \ ps) | (s2 \ ps) (s1 || s2) \ ps = (s1 \ ps) || (s2 \ ps) In the same way, in [33] the restriction operator s \ sa is defined as a function that maps any action of s into itself, if such action is also in sa, otherwise the function maps to the silent action. The set of axioms is the same substituting ps by sa with the exception of the third axiom that now has to be rewritten as follows, a \ sa = if (a ∈ sa) then a else ε Since, sa = {a | source(a) ∈ ps or destination(a) ∈ ps) and ps = {p | ∃ a ∈ sa with p = source(a) or p = destination(a)) it is easy to see that both definitions of the restriction operator are in fact equivalent and one can easily transform one form into the other. From now on, both forms are used interchangeably. The relationship between DFA and process algebras is that complete actions are the fundamental units here. However the separate input and output halves of an action can be represented, in terms of the restriction operator, and then reassembled, in terms of the merge operation (but this is derived here, not a primitive as in process algebra). For this reassembly process Nike [33] has proved a reconstruction theorem, that for any “well behaved” DFA specification the result of applying the restriction operator to split it up into individual process specifications in this way, and then using the merge operator to reassemble it, is to recover exactly the original specification. The semantic layer has to do with the values that are communicated among elements, the specification in this layer has to provide the data type of the messages, which can be sent/received through the channels. In order to specify such values, the specification of how such values are computed by the sources and used by the destination will be required. Several formal methods can be used for the specification of the semantic layer, as a result, the DFA does not define them directly, and instead the grammar defined in the syntactic layer is used as an attribute grammar. Basically, the research has been done for embedded OBJ and Z in the specification of the syntactic layer, in particular the OBJ approach was used in a case study of the alternating bit protocol [32]. There is a need to investigate other approaches that can be used to convert the grammar produced at the syntactic layer into an attribute grammar for the semantic layer. The semantic layer effectively defines the equivalence of what, in a CSXMS model, would be specified for the individual SX-M components.

7. Testing from CSXMS and DFA. To avoid the combinatorial aspects of the problem of testing a communicating system, the usual approach is to select specific “important” behaviours, using some criterion that reflects the purpose of the testing, and then concentrate on testing just these behaviours [34], [35]. For instance, in [36] the criterion that is used is that of conformance to particular protocol standards, as selected by experts for their relevance to a specific implementation. In the approach that is described here, the assumption that is made is that the individual components have already been tested using the well-established SX-M methods, and so the criterion that is needed is that of ensuring correct integration: that is, verifying that the components interact correctly when assembled into a complete system, as modelled by a CSXMS. As with the SX-M testing method, this criterion imposes a basic design-for-test requirement, which is that the IUT must be designed in such a way that the actual data flows between the components (as modelled by the communication matrix of the CSXMS) can be observed. The integration testing problem then consists of verifying that the sequencing of these data flows is correct, and in this approach the specification by which this correctness is determined will be provided by a DFA specification of the allowable sequences. Hence, the complexity of this problem will be determined by the number of comparisons to be made between the actions of the DFA specification that prescribe what data should flow, and the actual observed data flows within the IUT. This therefore provides a basis for estimating an upper bound to the computational complexity of this approach. The starting point for this estimation is that, given a DFA specification in its canonical form, the number of such actions can be approximated by the number of alternate sequences in the form, and the average length of each sequence. An upper 13

bound on the length of a sequence can be approximated from the observation that it would be a very unusual specification in which the number of actions in a sequence exceeded the number of channels in the system. For a system with n processes, there is an upper bound that can be placed on the average fan-in or fan-out of the data channels to or from a process, and for systems whose underlying topology is at all tractable this upper bound will not grow with n, but can be treated as a constant, which will be denoted k. Hence, the number of channels in the system will be bounded above by kn, and this therefore acts also as an upper bound for the length of a sequence within a DFA specification. Deriving a realistic bound for the number of alternate sequences in a DFA specification is slightly more complex, as again one could have a system where the architecture was intractable, in the sense that each message arriving at any process in it could then give rise to k different choices of channel along which the next message would be sent out from that process. For such an architecture the total number of possible choices would therefore be O(kn), but again the point here is that it is not the testing problem itself that would be intractable, but rather the architecture of the system itself, and such an architecture would usually be described as not being easily scalable. Thus, if the designers were attempting to create an architecture that was intended to be easily scalable, they would try to avoid such situations, and the case studies which have been investigated so far have suggested that a simpler set of assumptions is more appropriate to scalable architectures. These simpler assumptions are that the DFA specification will describe a number of different patterns of behaviour, where each pattern of behaviour will then be represented by a number of different sequences. Typically, the number of patterns of behaviour will either be independent of the number of processes in the system, or at worst will grow linearly with it. The number of sequences that correspond to one pattern of behaviour will then take one of two forms. The simpler of these forms is that it will be proportional to the number of processes, so that in the worst case the total number of different sequences to be considered will be O(n2), and hence the complexity of the testing problem will be O(n3). The more complicated form is where a particular pattern of behaviour is expressed as a parallel composition, for here the corresponding sequences of actions in the canonical form will consist of all the possible interleavings of the actions concerned, and so for m actions, which must be selected in an order that is consistent with the sequences being interleaved, there will be an upper bound on the number of different sequences of the form 2m. Fortunately, though, not all of these need to be tested, for many of the actions that are to be interleaved will be data flows between pairs of processes that are disjoint, and so the independence of such actions means that it is sufficient to just test one interleaving, rather than all of them. Hence, the only cases where all the different interleavings need to be tested are those where the actions being interleaved involve the same process, and there can only be n such situations (one for each process). For each such process a crude upper bound on the number of interleavings to be considered can be approximated by (2k)!, since 2k is the upper bound on the number of channels connected to a process. Hence, the number of different sequences to be tested for one pattern of behaviour would in this case be bounded above by (2k)! × n, which is O(n), and so again the complexity of the testing problem would be O(n3).

8. Conclusions The principal conclusion to be drawn from the results presented in this paper is that a simple bottom-up approach to trying to develop a testing strategy for distributed systems is not sufficient, as it involves problems that are inherently intractable, either through combinatorial state explosion or the NP-complete nature of the independent partition problem. Thus, while the CSXMS model offers the advantage of being expressed in terms of components that can be tested individually using the SX-M methods, on its own it does not provide an acceptable basis for integration testing. The other main conclusion that can be drawn from this work is that the combination of the CSXMS model with a top-down specification method, namely the data flow algebra presented here, does appear to provide a more tractable basis for testing implementations. In particular, the analysis that has been presented of the computational complexity of this approach has shown that it depends on the architecture of the system that is to be tested, so that if this is designed to be scalable then the testing problem should be reducible to polynomial time, although it may still be an exponential time problem for architectures that are exhibit either complex topologies or complex behaviours. Hence, this approach to specifying such systems can both help to guide the testing process, and characterise features of the system specification and behaviour that will give rise to problems of scale in the testing. This latter property should therefore help in identifying aspects of system architectures where simplification is desirable, although the precise details of how it would guide such simplification still require further investigation.

14

References [1] O. Koné and R. Castanet. Test generation for interworking systems, Computer Communications, Vol. 23, pp. 642-652, 2000. [2] S. Mauw and G.J. Veltink. Algebraic Specification of Communication Protocols. Cambridge Tracts in Theoretical Computer Science 36, 1993. [3] F. Ipate. Theory of X-machines and Applications in Specification and Testing, Ph.D. Thesis, University of Sheffield, 1995. [4] S. Eilenberg. Automata, Languages and Machines, Vol. A, Academic press, N.Y. 1974. [5] M. Holcombe. X-Machines as Basis for Dynamic System Specification, Software Engineering Journal, Vol. 3, No. 2, pp. 69-76, 1988. [6] G. Laycock. Theory and Practice of Specification based testing, Ph.D. Thesis, University of Sheffield, 1995. [7] J. Barnard, J. Whitworth and M. Woodward. Communicating X-machines, Journal of Information and Software Technology, Vol. 38, pp. 401-407, 1996. [8] T. Balanescu, A. J. Cowling, H. Georgescu, M. Gheorghe, M. Holcombe and C. Vertan. Communicating stream X-machines are no more than X-machines, Journal of Universal Computer Science, Vol. 5, No. 9, pp 494-507, 1999. [9] M. Holcombe and F. Ipate. Correct Systems: Building a Business Process Solution, Springer Verlag Series on Applied Computing, 1988. [10] F. Ipate and M. Holcombe. An Integration Testing Method that is Proved to Find all Faults, Intern. J. Computer Math. Vol. 63, pp. 159-178, 1997. [11] T.S. Chow. Testing Software Design Modeled by Finite-State Machines. IEEE Transactions on Software Engineering, Vol. 4, No. 3, pp. 178-187, 1978. [12] S. Fujiwara, G. Von Bochmann, F. khendek, M. Amalou and A. Ghedamsi. Test Selection Based on Finite State Models, IEEE Transactions on Software Engineering, Vol. 17, No. 6, pp. 591-603, 1991. [13] F. Ipate and M. Holcombe. Generating test sets from non-deterministic stream X-machines. Formal Aspects of Computing, Vol 12, No. 6, pp. 443-458, 2000. [14] G. Laycock. Introduction to X-machines, Research Report CS-93-13, Department of Computer Science, University of Sheffield, 1993. [15] M. Fairtlough, M. Holcombe, F. Ipate, C. Jordan, G. Laycock and Z. Duan. Using an X-machine to model a video cassette recorder, Current issues in Electronic modelling, Vol. 3, pp. 141-151, 1995. [16] M. Holcombe and F. Ipate. Almost all Software Testing is Futile. Research Report CS-95-03, Department of Computer Science, University of Sheffield, 1995. [17] M. Holcombe and F. Ipate. Another look at Computability, Informatica, Vol. 20, pp. 359-372, 1996. [18] K. Bogdanov, M. Fairtlough, M. Holcombe, F. Ipate and C. Jordan, X-machine Specification and Refinement of Digital Devices, Research Report CS-97-16, Department of Computer Science, University of Sheffield, 1997. [19] F. Ipate and M. Holcombe. A method for refining and testing generalised machine specification. Intern. J. Computer. Math. Vol. 68, pp. 197-219, 1998. [20] F. Ipate and M. Holcombe. Specification and Testing using Generalised Machines: a Presentation and a Case Study, Software Testing, Verification and Reliability, Vol. 8, pp 61-81, 1998. [21] A. J. Cowling, H. Georgescu and C. Vertan. A structured Way to use Channels for Communication in X-machines Systems. Formal Aspects of Computing, Vol 12, No. 6, pp. 485-500, 2000. [22] R. E. Bonner. On some clustering techniques, IBM J. Res. Develop., Vol. 8, No. 1, pp. 22-32, January 1964. [23] J. G. Auguston and J. Minker. An Analysis of Some Graph Theoretical Cluster Techniques, Journal of the Association for Computing Machinery, Vol. 17, No. 4, pp. 571-588, October 1970. [24] G. D. Mulligan and D. G. Corneil. Corrections to Bierstone’s Algorithm for Generating Cliques, Journal of the Association for Computing Machinery, Vol 19, No. 2, pp. 244-247, April 1972. [25] C. Bron and J. Kerbosch. Algorithm 457 Finding All Cliques of an Undirected Graph [8], Communications of the ACM, Vol. 16, No. 9, pp 575-577, September 1973. [26] R. M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, R. E. Miller and J. W. Thatcher, ed Advances in Computing Research, Plenum Press, NY, pp. 85-103, 1972. [27] M. R. Garey and D. S. Johnson. Computers and intractability: A guide to the Theory of NP-Completeness. W.H. Freeman and Company, NY, 1979. [28] A. J. Cowling. Dataflow Algebras as Formal Specifications of Data Flows, Research Report CS-95-18, Department of Computer Science, University of Sheffield, 1995. [29] J Rumbaugh, I Jacobson & G Booch, The Unified Modelling Language reference Manual, Addison Wesley, 1999. [30] ITU-T Standard Z.120, Message Sequence Chart, ITU, 1993. [31] M C Nike, Using Dataflow Algebra As A Specification Method, PhD Thesis, University of Sheffield, May 2000. [32] A. J. Cowling and M. C. Nike. Using Dataflow Algebra to Analyse the Alternating Bit Protocol, Software Engineering for Parallel and Distributed Systems (ed. I. Jelly, I. Gorton, and P. Croll), Chapman and Hall for IFIP, London, pp. 195-207, 1996. [33] A. J. Cowling and M.C. Nike. Dataflow Algebra Specifications of Pipeline Structures, Department of Computer Science, University of Sheffield. [34] S.L. Pfleeger. Software Engineering: Theory and Practice, prentice-Hall, 1998. [35] R. Cardelli-Olivier. Conformance Test for Real-Time Systems with Timed Automata Specification, Formal Aspects of Computing, Vol. 12, No. 5, pp. 350-371, 2000. [36] M. Phalippou and R. Groz. From Estelle Specification to Industrial Test Suites, Using an Empirical Approach, Proceedings of the IFIP International Conference FORTE’90, Madrid Spain, 1990.

15

Suggest Documents