Subsumption of Program Entities for Efficient Coverage and Monitoring

Subsumption of Program Entities for Efficient Coverage and Monitoring Raul Santelices,† Saurabh Sinha,‡ and Mary Jean Harrold† †College of Computing, Georgia Institute of Technology ‡IBM India Research Lab E-mail: {raul|harrold}@cc.gatech.edu, [email protected] ABSTRACT Program entities such as branches, def-use pairs, and call sequences are used in diverse software-development tasks. Reducing a set of entities to a small representative subset through subsumption saves monitoring overhead, focuses the developer’s attention, and provides insights into the complexity of a program. Previous work has solved this problem for entities of the same type, and only for some types. In this paper we introduce a novel and general approach for subsumption of entities of any type based on predicate conditions. We discuss applications of this technique, and address future steps. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging—Testing tools, Monitors General Terms: Algorithms, Experimentation, Performance Keywords: Subsumption, predicate conditions, entity hierarchies, coverage criteria

1. INTRODUCTION

Program entities such as statements, branches, function calls, and definition-use pairs (du-pairs) are used in a variety of softwarerelated tasks. These tasks include testing, security, and understanding. Entities are extracted automatically from programs using staticanalysis tools and linked to intermediate representations, such as the interprocedural control-flow graph (ICFG) and the system dependence graph (SDG) [8], which represent various dependencies among these entities. In software testing, entities are used to measure coverage adequacy of a test suite. For example, a common practice in industry is to attempt to exercise all program statements. Theoretical and empirical studies have also identified other control-flow entities, such as branches, and data-flow entities, such as du-pairs, as alternatives for greater fault-detection rates [6, 9]. Studies have suggested that control-flow and data-flow elements be combined for testing [7], as they are complementary. Because of the overhead imposed by instrumentation to monitor entity coverage, previous work has produced methods to identify a minimal subset of entities of the same type that need to be instrumented [1, 2, 3, 4, 10], so that the coverage of any entity outside

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SOQUA’06, November 6, 2006, Portland, OR, USA. Copyright 2006 ACM 1-59593-584-3/06/0011 ...$5.00.

the subset is guaranteed by the coverage of some entity in the minimal subset. In this case, the entity in the minimal subset is said to subsume the entity outside this minimal subset. However, to date, only limited work has been done to identify subsumption among entities of different types. We are aware of only one method to determine du-pairs subsumed by nodes [11], but this method is incomplete and is only defined intraprocedurally. Existing work on path conditions [12, 14] can be used as a starting point for subsumption of du-pairs and other entities by branches, but they still need to account for destroying nodes, such as definition killings. The need for subsumption of entities of different types arises in a variety of scenarios. For instance, a tester creates a test suite that satisfies the all-branches criterion for her program and then she is interested in measuring the coverage of all-uses by the test suite. Her motivation is to obtain a deeper insight into the confidence the test suite provides. However, adding instrumentation to the program to monitor for all-uses coverage can be expensive in runtime overhead. Moreover, she might want additional coverage after the test cases have been run. Hence, it is convenient or even necessary to perform some analysis on the branch coverage reports to determine which du-pairs were incidentally covered. In this paper, we present a general method to statically compute subsumption among entities of any type. We focus on the predicate conditions (i.e., set of decisions or branches) that must be met to cover an entity. These conditions are always necessary. In some cases, however, they are not sufficient to guarantee that an entity is indeed covered. We will show how to identify those cases.

2.

TECHNIQUE

Our technique for computing subsumption among program entities consists of several steps. First, the technique performs static analysis to determine conditions under which entities will be covered during execution. Then, the technique uses these conditions to construct a table representing the conditions for each entity under consideration. Some conditions in the table are necessary but not sufficient for coverage of an entity. Using the condition table and sufficiency information, the subsumption algorithm creates a hierarchy of entities. This section discusses these steps in turn.

2.1

Definitions and Static Analyses

An interprocedural control-flow graph (ICFG) is a collection of single-procedure control-flow graphs (CFG) linked by interprocedural control-flow edges. Only valid paths are permitted, where each incoming edge to a CFG from a call site must be matched, at that call depth level, with the outgoing edge from the CFG to the original call site. A cycle in the ICFG corresponds to an intraprocedural loop or a recursive set of calls. Each cycle is defined by a backedge. A backedge is any edge whose tail dominates its head.

(1, 3, x) (1, 6, x) (5, 6, x) (3, 5, y) ((3, push), (6, pop)) ((3, push), (5, pop), (6, pop))

1-T ✕ ∗ ∗ ✕ ✕ ✕

1-F ✓ ∗ ∗ ✓ ✓ ✓

4-T ∗ ✕ ✓ ✓ ✕ ✓

4-F ∗ ✓ ✕ ✕ ✓ ✕

Table 1: Condition table for example without loop. (1, 3, x) (1, 6, x) (5, 6, x) (3, 5, y) (6, 5, y) ((3, push), (6, pop)) ((3, push), (5, pop), (6, pop))

1-T ✕ ∗ ∗ ✕ ∗ ✕ ✕

1-F ✓ ∗ ∗ ✓ ∗ ✓ ✓

4-T ∗ ∗ ✓ ✓ ✓ ∗ ✓

4-F ∗ ✓ ∗ ∗ ∗ ✓ ∗

6-T ∗ ∗ ∗ ∗ ✓ ∗ ∗

6-F ✓ ✓ ✓ ✓ ✓ ✓ ✓

Table 2: Condition table for example with loop. Figure 1: Example CFG with optional loop. A node n1 dominates a node n2 if all paths from the program entry to n2 include n1 . n1 postdominates n2 if all paths from n2 to the program exit include n1 . In this paper, an ICFG node represents a maximal basic block (MBB), which has single entry and exit points, and covers as many statements as possible. An MBB begins at either the entry of a procedure or the target of a branching edge, and ends at the next predicate (i.e., statement with more than one successor) or at an exit of the procedure. An atomic entity is any entity wholly contained in a single node or edge. For example, all statements, branches, and procedure calls are atomic. An entity involving more than one statement can be atomic as well, if all its statements belong to the same node. A non-atomic entity is a sequential entity, because an order1 exists among the nodes that compose it. Examples of sequential entities are du-pairs and system-call sequences. A sequential entity may also have destroying nodes associated with it. A redefinition (i.e., killing) of variable v on some path from a definition d to a use u, for example, is a destroying node for du-pair (d, u, v). A path between two ordered nodes in a sequential entity is a destruction-free path if there is no destroying node associated with the node pair along this path. A predicate condition is a simplified version of a path condition [12], consisting of a finite disjunction of predicate terms. A predicate term is a conjunction of decisions that represents one or more paths in the ICFG from the entry to the exit point. We call these paths full paths. A term covers more than one full path (actually, an infinite number) if it includes one or more cycle backedges. A decision is denoted by a predicate name and a branch label. We name predicates in a program as P1 , P2 , ..., Pn . If we normalize all predicates in the ICFG to have only two branches, the decisions for each predicate Pi are Pi -T and Pi -F . A term covering a cycle might include both decisions of a predicate. A straightforward approach for calculating the predicate condition of an entity uses the system-dependence graph. We are currently working on a more efficient method for computing such conditions, which we will present in future work. In this paper, we assume that the predicate condition of any entity being mentioned has been or can be computed.

2.2 Condition Table

The intuition behind our condition-table-based technique is that if we can determine the condition for covering an entity e, we can 1

More generally, a partial order.

check whether this condition also satisfies the conditions for the coverage of other entities, regardless of their type. Thus, an entity ei is possibly subsumed by e if cond(e) =⇒ cond(ei ). We define a condition table where each column represents a decision, and each row represents the predicate condition of an entity. There are three possible values in a cell that refer to the decision associated with the column: (1) 3 marks the decision as a necessary part of all full paths in the condition; (2) ∗ marks the decision as part of some, but not all, full paths covering the entity; and (3) 5 marks the decision as definitely not part of any full path in the condition. These values form a partially ordered set: ∗ < 5 and ∗ < 3. No order exists between 3 and 5. Figure 1 shows a CFG with two types of entities: du-pairs and sequences of operations on a single stack. Without the backedge (6, 4), the du-pairs are (1, 3, x), (1, 6, x), (5, 6, x), and (3, 5, y). The call sequences are ((3, push), (6, pop)) and ((3, push), (5, pop), (6, pop)).2 Such sequences might need to be monitored for safety or security, since they can not always be detected statically. Table 1 shows the condition table for these entities, without backedge (6, 4). For du-pair (1, 6, x), any decision in node 1 lets definition (1, x) reach use (6, x). Hence, both decisions are marked ∗ in the corresponding row. However, (1, x) is killed at node 5, so decision 4-T is marked 5 and 4-F is marked 3. When adding backedge (6, 4), many conditions whose full paths necessarily had to exclude a decision with a mark 5 now can reach those decisions before the first entity node is reached or after the whole entity is covered. Hence, those marks change to ∗ in Table 2. Table 2 shows that predicate conditions, despite being necessary conditions (i.e., covering e implies cond(e), or e =⇒ cond(e)), might not be sufficient conditions in the presence of cycles (i.e., cond(e) does not imply e). An example is the condition of the newly formed du-pair (6, 5, y), which requires decision 4-T (3) but also allows 4-F (∗), because of the cycle. Covering these decisions does not guarantee that the definition (6, y) is covered before the use (5, y), because the predicate condition also satisfies the sequence (5, 6). Consequently, the du-pair might not be covered and so the condition is not sufficient. This example illustrates the imprecision of a predicate condition in the presence of cycles. Another example is the path condition for du-pair (1, 6, x), which is no longer sufficient. Decision 4-T is now part of some full paths covering (1, 6, x), even though node 5 should be covered only after use (6, x) to avoid killing the definition. The condition, however, cannot guarantee the correct order of the events. The method to determine sufficiency is detailed in the next section. 2

The second sequence might pop from an empty stack.

2.3 Necessary and Sufficient Conditions

For the subsumption algorithm to work, first we need to determine the sufficiency of each condition in the condition table. As we saw in Figure 1 with the backedge, there are entities whose predicate conditions are not sufficient because there may be more than one coverage order for the nodes that constitute an entity, including the destroying nodes. This situation occurs when two or more entity nodes are mutually reachable (i.e., they are enclosed in a common cycle). If this were not the case, we could collapse all cycles into super nodes without merging entity nodes, and find a unique node ordering by traversing the resulting acyclic graph. In particular, the condition for an atomic entity is always sufficient, because an atomic entity involves only one node. Consider the sequential entity e = (si , sj ) where nodes si and sj are ordered and enclosed in the same cycle. An iteration of the cycle follows any path from the cycle head H (i.e., the sink node of the backedge of the cycle) to the cycle tail T (i.e., the source node of the backedge). If there is a path from si to T that does not include sj , and a path from H to sj that does not include si , then each program path of the form F1 = Entry → ... si → T → H → sj → ... → Exit covers e and, hence, the decisions in the path are part of at least one term in cond(e). However, this set of decisions also allows paths of the form F2 = Entry → ... → H → sj → ... → si → T → ... → Exit, where si never occurs before sj . Thus, e). cond(e) is not a sufficient condition for e (i.e., cond(e) =⇒ If, in contrast, all paths from si to T include sj (so sj postdominates si ), or all paths from H to sj include si (so si dominates sj ), then paths of the form F2 are not covered by cond(e). In this case, si is covered before sj , and, hence, cond(e) =⇒ e. Consequently, we have established that cond(e) is a sufficient condition for e if and only if at least one of the following holds: (1) si and sj are not enclosed in a common cycle; (2) si dominates sj ; or (3) sj postdominates si . More generally, the predicate condition for a sequential entity of any size is sufficient if and only if, for every ordered node pair (si , sj ) in the entity, (1) or (2) or (3) holds. Although all paths we consider between two entity nodes are destruction-free, a predicate condition can still satisfy destroying paths. Let k denote any destroying node associated to the ordered pair (si , sj ). A sequence (k, si , sj ) or (si , sj , k) does not affect the coverage of (si , sj ). However, as we just saw, if the predicate condition satisfies the pair (k, si ), then it might also allow (si , k) and, in particular, (si , k, sj ). Analogously, it might allow (k, sj ) if it covers (sj , k). For a condition to be sufficient, it must guarantee that (si , k, sj ) cannot occur. Hence, the condition is sufficient for (si , sj ) if and only if (1) or (2) or (3) holds for all pairs of the form (k, si ) and (sj , k) allowed by the condition, for all nodes k that destroy (si , sj ).

2.4 Subsumption Procedure

The subsumption relationship between two entities can be computed using the condition table only if the predicate condition for the subsumed entity is a sufficient condition. In other words, if e1 =⇒ cond(e1 ) and e2 ⇐⇒ cond(e2 ), then e1 subsumes e2 if and only if cond(e1 ) =⇒ cond(e2 ). In the condition table, subsumption is computed by a cell-wise comparison of the respective rows: e1 subsumes e2 if and only if rowe2 [br] ≤ rowe1 [br], for all branches br in the graph. In Figure 1, without the backedge, du-pair (3, 5, y) subsumes du-pairs (1, 3, x) and (5, 6, x), and the call sequence ((3, push), (5, pop), (6, pop)). Actually, ((3, push), (5, pop), (6, pop)) subsumes (3, 5, y) too, because the respective rows are equal. Consequently, monitoring coverage of any of these two entities guarantees the coverage of both, as well as du-pairs (1, 3, x) and (5, 6, x).

Figure 2: Subsumption hierarchy of entities, without loop. Figure 2 shows the resulting subsumption hierarchy for all entities in Figure 1, without the backedge. Because subsumption is transitive, some edges are omitted. If we add the backedge, however, no entity except (1, 3, x) and (5, 6, x) has sufficient conditions, so we can only compute subsumption of these two entities.

3.

APPLICATIONS

The subsumption hierarchy partitions a set of entities in two: those that are subsumed by another entity, and those that are not. The latter are called unconstrained entities, using Bertolino and Marr´e’s terminology [4, 10], and they form a spanning set [10] that is minimal for a given subsumption hierarchy. Covering the spanning set guarantees that all entities are covered. We discuss two major application areas of our technique: monitoring in Section 3.1 and program understanding in Section 3.2.

3.1

Monitoring

Any runtime monitoring task where some degree of subsumption among entities exists can benefit from a reduction in the instrumentation required and the execution overhead it imposes. By reducing the set of entities to monitor, we help to achieve that goal. In this section we analyze the application of our technique to three specific areas: testing, fault detection, and security.

3.1.1

Testing

Running a program with a test suite or a selected subset of it can be expensive. Thus, it is important to minimize the number of test cases for the criterion selected, as well as the runtime overhead imposed by the instrumentation. It is also desirable to prioritize test cases [13], so that test cases with the greatest probability of revealing faults are run first. A development team might even decide to establish a testing budget and use this ordering to obtain maximum confidence for that budget. More test cases can be run within the budget if the monitoring overhead has been reduced. Spanning sets guide the tester in creating a small test suite [10]. The tester is concerned only with covering unconstrained entities; the remaining entities will be covered as a consequence. The tester can prioritize the unconstrained entities based on the number of entities subsumed. As we outlined in Section 1, a subsumption hierarchy of entities of different types lets the tester determine the incidental coverage of an adequacy criterion not targeted originally. Therefore, the tester can define better her confidence in the test suite, and eventually choose to cover more entities of other types. The tester can also choose beforehand the coverage-adequacy criterion of the test suite to build, based on the knowledge our technique provides about subsumption. She can then prioritize the entities for this criterion according to subsumed entities of other types.

3.1.2

Fault Detection and Security

Software products can monitor violations of correct or valid behavior rules. Safety-critical software can take action as soon as a violation is detected. Software in general benefits from detecting faults as soon as they occur. Fault localization can be more effective if performed immediately. Behavior rules relate to the correctness of the program or security violations. Such rules need to be monitored when it is too difficult

or impossible to verify them statically. For example, determining whether the predicate condition for a call sequence can be satisfied or not is incomputable in general. The different kinds of entities related to behavior rules can be monitored with minimal overhead using our technique. All sets of mutually subsuming entities can be identified, and one element from each set is chosen for monitoring. Because the developer must balance monitoring and runtime overhead, our technique lets her maximize the extent of monitoring for a given overhead limit.

3.2 Program Understanding

The software maintainer seeks understanding of the relationships among program entities at different levels of abstraction. These entities arise from program rules and styles. At the code level, our technique lets her focus on unconstrained entities, which represent the top of the subsumption hierarchy, and the sets of entities dependent on each other according to this relationship. Application of our technique to higher-level elements, such as methods, classes and components, is also possible. We conjecture that program complexity can be measured by the degree of subsumption between entities. A low degree of subsumption indicates that entities are highly independent and require equal attention. A high degree of subsumption may indicate a simple program structure. A high level of complexity compromises understanding, makes program coverage more difficult, and reduces the effectiveness of fault localization techniques. Our technique could measure complexity for different types of entities in terms of subsumption degree. Using these complexity measures, the maintainer can decide whether to refactor code or not, and find the most appropriate debugging approach. The tester can also decide whether a coverage criterion suffices for a certain component.

4. FUTURE WORK

In this section we discuss the computation of predicate conditions and additional uses of these conditions. We also explain how we expect our technique to evolve, and we mention the tool we are implementing for empirical studies. We are working on a technique to calculate predicate conditions during initial program analysis for entities of any type. Existing path-condition approaches require entities and program graphs to be pre-computed. We also expect in the future to improve the precision of predicate conditions with symbolic evaluation [5]. Predicate conditions can be used to compute possibly subsumed entities ei , when the condition for each ei is not sufficient. If we add possible subsumption edges to the subsumption hierarchy, we expect to obtain an improved estimate of the effectiveness of an unconstrained entity for prioritization. We intend to evaluate this estimate empirically. There are weaker but still potentially useful relationships among entities that we can identify through predicate conditions. If there is at least one path covering two entities, we would expect that some executions covering one entity actually cover both. The likelihood of coincidental coverage depends on the number of intersecting and non-intersecting paths for the entities. The applicability of our technique is somewhat limited because some entities are not subsumable in the presence of cycles. How many entities are subsumable is a question we intend to investigate by analyzing different subject programs. The answer also depends on the size of the entities. The precision in subsuming sequential entities can be improved if we use paths instead of unordered decisions. To make it practical, we are working on finite subpaths. Another potential improvement to our technique is the incorpo-

ration of Agrawal’s super and mega blocks [1, 2]. If two or more nodes from unconstrained entities are located in the same mega block, we only need to cover or monitor one node to guarantee that all the co-located nodes have been covered. We are currently developing a tool that implements the subsumption algorithm. The tool will allow us to perform our studies on Java programs. Initially, the goal of the tool is to identify predicate conditions for branches and data-dependences, and apply the subsumption technique to measure incidental data-flow coverage by branch-covering test suites. The tool computes inter-procedural data-dependences and supports aliasing. Aliasing increases the number of possible du-pairs, and makes subsumption more difficult, but it allows us to work with real-world programs. Another complicating factor are exceptions, which we expect to incorporate later in our technique. Exceptions make the subsumption analysis more complex, because they interrupt entities while they are being covered.

5.

REFERENCES

[1] H. Agrawal. Dominators, super blocks, and program coverage. In POPL ’94: Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, Jan. 1994. [2] H. Agrawal. Efficient coverage testing using global dominator graphs. In PASTE ’99: Proceedings of the 1999 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, Sept. 1999. [3] T. Ball and J. R. Larus. Optimally profiling and tracing programs. In POPL ’92: Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, Jan. 1992. [4] A. Bertolino. Unconstrained edges and their application to branch analysis and testing of programs. Journal of Systems and Software, 20(2):125–133, Feb. 1993. [5] L. A. Clarke and D. J. Richardson. Applications of symbolic evaluation. Journal of Systems and Software, 5(1):15–35, Feb. 1985. [6] P. Frankl and E. J. Weyuker. An applicable family of data flow criteria. IEEE Transactions on Software Engineering, 14(10):1483–1498, Oct. 1988. [7] M. Harder, J. Mallen, and M. D. Ernst. Improving test suites via operational abstraction. In ICSE ’03: Proceedings of the 25th IEEE and ACM SIGSOFT International Conference on Software Engineering, May 2003. [8] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 12(1):26–60, Jan. 1990. [9] M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In ICSE ’94: Proceedings of the 16th IEEE and ACM SIGSOFT international conference on Software engineering, May 1994. [10] M. Marr´e and A. Bertolino. Using spanning sets for coverage testing. IEEE Transactions on Software Engineering, 29(11):974–984, Nov. 2003. [11] E. M. Merlo and G. Antoniol. A static measure of a subset of intra-procedural data flow testing coverage based on node coverage. In CASCON ’99: Proceedings of the 1999 IBM conference of the Centre for Advanced Studies on Collaborative research, Nov. 1999. [12] T. Robschink and G. Snelting. Efficient path conditions in dependence graphs. In ICSE ’02: Proceedings of the 24th IEEE and ACM SIGSOFT International Conference on Software Engineering, May 2002. [13] G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Transactions on Software Engineering, 27(10):929–948, Oct. 2001. [14] S. Sukumaran and A. Sreenivas. Identifying test conditions for software maintenance. In CSMR ’05: Proceedings of the Ninth European Conference on Software Maintenance and Reengineering, IEEE, Mar. 2005.

Subsumption of Program Entities for Efficient Coverage and Monitoring

Subsumption of Program Entities for Efficient Coverage and Monitoring

Suggest Documents

Software Licenses, Coverage, and Subsumption - UCI

Efficient Probabilistic Subsumption Checking for ... - Infoscience - EPFL

Efficient Î¸-subsumption under Object Identity - CiteSeerX

Impedance Sensing for Monitoring Neuronal Coverage and ...

energy-efficient sensing coverage and

subsumption for structural matching

Learning for Dynamic subsumption

Haskell Program Coverage - CiteSeerX

Path Planning for Complete and Efficient Coverage Operation of ...

Energy Efficient Neighbor Coverage Protocol for ... - ScienceDirect.com

Energy-Efficient Coverage Measurement for ... - Semantic Scholar

Coverage Patterns For Efficient Banner Advertisement ... - Conferences

Philippines' Government Sponsored Health Coverage Program for ...

Monitoring, Aggregation and Filtering for Efficient ... - CiteSeerX

Radiotherapy Dose Calculation Program, for Monitoring and ...

Identity and Subsumption - Semantic Scholar

Assessing and monitoring vaccination coverage ... - Semantic Scholar

Monitoring Vegetation Coverage and Biomass Using Landsat ...

Using Temporal Subsumption to Generate Efficient Error ... - CiteSeerX

Increased HIV Prevention Program Coverage and ... - BioMedSearch

Cost-Efficient Deployment for Full-Coverage and ... - Semantic Scholar

Efficient Deployment of Key Nodes for Optimal Coverage of ... - MDPI

Efficient Algorithms for Social Network Coverage and ...

Coverage and Active Localization for Monitoring ... - Semantic Scholar