ISSN 03617688, Programming and Computer Software, 2015, Vol. 41, No. 3, pp. 183–195. © Pleiades Publishing, Ltd., 2015. Original Russian Text © I.S. Zakharov, V.S. Mutilin, A.V. Khoroshilov, 2015, published in Programmirovanie, 2015, Vol. 41, No. 3.
COMPUTER ALGEBRA, APPLIED LOGIC, CIRCUIT SYNTHESIS
PatternBased Environment Modeling for Static Verification of Linux Kernel Modules1 I. S. Zakharov*, V. S. Mutilin**, and A. V. Khoroshilov*** Institute for System Programming of the Russian Academy of Sciences, ul. Alexandera Solzhenitsyna 25, Moscow, 109004 Russia email: *
[email protected], **
[email protected], ***
[email protected] Received December 13, 2014
Abstract—Linux kernel modules operate in an eventdriven environment. During static verification of such modules it is necessary to take into consideration all feasible scenarios of interaction between modules and their environment. The paper presents a new method which allows to automatically generate an environment model for a particular kernel module on the base of analysis of its source code and a set of specifications describing patterns of scenarios of interaction between modules and their environment. In specifications one can describe both generic patterns that are widespread in the Linux kernel and detailed specific patterns for a particular subsystem. It drastically reduces a specification size and thus helps to verify more modules with less efforts. Proposed method was implemented as a component of Linux Driver Verification Tools and was applied for static verification of modules from almost all Linux kernel subsystems. DOI: 10.1134/S036176881503007X 21
1. INTRODUCTION
The Linux kernel is a base component of various operating systems. Depending on their utilization and on underlying hardware each of these operating sys tems requires specific set of features to be supported by the kernel. Usually the Linux kernel provides a few common functions, e.g. memory and process manage ment. To extend kernel functionality with new features one can dynamically load corresponding modules into the kernel. The Linux kernel is shipped with a large set of mod ules for device drivers, file systems, network protocols, etc. Subsets of modules available for loading depend on a system architecture and a kernel configuration. For instance latest versions of the Linux kernel contain about 4 thousand modules on architecture x86_64 in configuration allmodconfig. Linux kernel modules operate in the same address space and have the same level of privileges as the ker nel itself. But they have much less quality in compari son with the rest parts of the kernel [1]. About a half of all nonfunctional bugs in modules are caused by incorrect usage of the Linux kernel API [2]. This paper particularly focuses on this sort of bugs and suggests to find them with help of static verification. For detecting errors this technique allows to analyze all possible exe cution paths including even hardtoreproduce ones. 1 The article was translated by the authors. 2 The research was carried out with funding
from the Ministry of Education and Science of Russia (the project unique identifier is RFMEFI60414X0051
Modern tools for static verification [3–7] already can check middlesized programs. So they can be applied for Linux kernel modules that usually have a size of about several thousand lines of code. On the one hand, the main functionality of the module is to provide the means for event processing, for example, that are caused by hardware interrupts or invocation of system calls by a user. That is why module’s execution paths highly depend on its environment. On the other hand, the modules can not be verified together with the whole kernel by existing static verification tools because of the enormous size and complexity of the Linux Kernel source code. Therefore for standalone verification of the module it is required to replace missing source code by an environment model pro ducing the same execution paths during verification which occur in a real environment. It was shown that static verification tools need rather accurate environ ment models for checking programs [8]. Without such models the tools can produce large number of false alarms, since they assume during verification that some infeasible scenarios of interaction between mod ules and their environment are possible. Moreover, environment models should not omit scenarios that can happen in practice. Otherwise static verifiers likely miss actual bugs. For modeling purposes it is important to take into consideration pace of Linux kernel development, as far as the new versions are appeared several times a year. Features for new devices support continuously are added to the kernel, new subsystems appear, old and outdated functions are deleted, moreover, the existing parts are evolving. The part of the Linux ker
183
184
ZAKHAROV et al.
Userspace applications
Hardware
The Kernel Core
Modules
The Linux kernel
Fig. 1. Interaction of Linux kernel modules with their environment.
nel implementing its interaction with the modules and the modules itself are on front edge of modifications, since the extended functionality implemented in the Linux kernel often becomes the target of improve ment. Another issue which complicates the develop ment of precise environment models for kernel mod ules is a variety of interfaces and callback types pro vided by modules to the kernel. Linux kernel of version~3.17rc1 and its modules contain dozens of thousands of callbacks and hundreds of corresponding callback types. Hence, for environment model describing it is exceedingly important to minimize the manual work and a size of descriptions of kernel inter faces of a particular kernel version. The main contribution of the paper is a new method and its implementation for kernel module environment modeling. The method provides the means to define both precise, extensible and reusable specification of an environment model. It is proposed to develop specification in a simplified language to describe patterns of possible interactions between modules and the Linux kernel that helps to diminish required manual work on specifying an environment model. Parts of the specifications depending on the evolving Linux kernel interface, and patterns of possi ble interactions are separated to minimize the devel opment and maintenance efforts. Also, final represen tation of an environment model is generated as a C code to make it possible to apply any C verification tool. The rest of the paper is organized as follows. In Section 2 we consider typical scenarios of interac tion between Linux kernel modules and their environ ment. In Section 3 we present a method for modeling an environment. Its implementation is outlined in
Section 4. Section 5 presents experimental results. Related work is considered in Section 6. Section 7 makes a conclusion. 2. ENVIRONMENT OF LINUX KERNEL MODULES Linux kernel modules operate in an eventdriven environment which is reflected in Fig. 1. For simplic ity hereinafter we suppose that each module interacts with userspace applications and hardware only through the socalled kernel core. To the kernel core modules provide callbacks that are invoked to handle such events as interrupts, system calls and internal ker nel events. Callbacks mentioned above can be consid ered as divided into groups with types that include call backs that operate on corresponding resource, e.g. some specific device or data structure. Each callback is characterized by its role. The group types and corre sponding roles are defined in the kernel core, so mod ules implement groups as instances of implemented types. An example of a group type is a set of operations on file descriptors inside a kernel like open, read, write, close. Another example of callback group type is a set of power control operations for a device. The roles of the group with such type include preparation for a sleep mode and actions required at resuming a device. Also, there are group types for interaction with USB or PCI buses. So the module for portable USBmodem driver necessarily contains a callback group for con trolling USB interface and callback groups for trans mitting data through the network. Each module registers callback groups in the kernel core during its work according to the group type usu
PROGRAMMING AND COMPUTER SOFTWARE
Vol. 41
No. 3
2015
PATTERNBASED ENVIRONMENT MODELING
ally with the help of functions from the kernel core. The module begins its work after being loaded into the memory and the initialization function is called, the latter registers first callback groups. The registration imposes to pass pointers to callback group functions to kernel core for further processing of certain events. Most events are processed in parallel, hence many module callbacks can be called in parallel with each other. As soon as the module event processing is not required anymore the kernel core invokes module exit function performing deregistration of callback groups, then the module can be unloaded. The callbacks of the module are called in conform ance with a strict contract between module and its environment, since during execution callbacks per form registration and deregistration of new module groups, memory allocation and deallocation, acquir ing locks etc. For example, the callback group should not be deregistered before deallocation of its resources, hence the callback performing deregistra tion of the group cannot be called before it. The con tract of interaction between a module and its environ ment specifies the following actions: 1. Calling callbacks by the kernel core taking into account the restrictions: 1.1. on registration and deregistration; 1.2. on callback invocation order according to the roles; 1.3. on resources initialization that are passed to the callbacks; 1.4. on calling context. 2. Calling kernel core functions by callbacks. The specific features of kernel modules make up distinctions of its source code from user space applica tions. Figure 2 presents snippets of the USB CDC Phonet module from Linux kernel 3.2 (original source code was simplified). One can see that this module does not have function main. Instead it contains ini tialization and exit functions: usbpn_init (defined at line 26) and usbpn_exit (defined at line 29), reg istered via macro module_init (line 32) and module_exit (line 33). On initialization and exit the callback group for operating with USB interfaces is registered and dereg istered correspondingly. We denote its group type as usbpn_driver as the structure type name is defined in the kernel core. The module contains the global variable usbpn_driver (line 22) of this type. This variable pointer is passed to function usb_register (line 27) that registers callbacks in the kernel core. Deregistration of callbacks is performed by the kernel core function usb_deregister (line 30). The callback roles are named by the structure fields probe (line 23) and disconnect (line 24) to which ones function pointers to callbacks usbpn_probe and usbpn_disconnect are assigned. Another callback group with type net_device_ops (line 3) is a pair of callbacks PROGRAMMING AND COMPUTER SOFTWARE
185
usbpn_open (line 1) and usbpn_close (line 2) with roles ndo_open (line 4) and ndo_stop (line 5). Registration and deregistration is performed by kernel core functions register_netdev (line 15) and unregister_netdev (line 20) which are called from callbacks of group type usbpn_driver. 3. MODELING ENVIRONMENT An ideal environment model, which is required for static verification of Linux kernel modules, should be complete and correct. Completeness means that the model should contain all possible in the real environ ment interaction scenarios. If an environment model is incomplete, static verifiers can miss actual bugs. Correctness means that environment models should not contain scenarios which are impossible in the real environment. Depending on the level of environment model incorrectness the tools can report more false alarms. Specification of highly accurate environment models requires much development efforts due to large number of callback group types in the Linux ker nel. But, in order to eliminate false alarms and increase environment model completeness the speci fication method should provide the means to define any restrictions mentioned in the previous section. The paper proposes such method for modeling Linux kernel module environment. πcalculus [9, 10] fits well for specifying environ ment models, since it allows to describe arbitrary high parallel systems completely and correctly in terms of message passing and parallel composition of processes. Below we propose the environment modeling method based on πcalculus which allows to reduce efforts required for development of environment models. 3.1. Definitions In πcalculus [9, 10] we have processes, operations of parallel composition, synchronous communication between processes through channels, creation of fresh channels, replication of processes and nondetermin ism. Each channel has a name also called label α ∈ A. We use definitions from [9] with a polyadic exten sion described in [10] where labels have vectors of labels as parameters, denoted by x. Processes are described using the following syntax: P ::= P|Q (parallel composition) | !P (replication) | (να)P (new label creation) | N( x ) where N( x ) ::= 0 | K1( x ) + … + Kn( x ) | [ x = v ]K( x )
Vol. 41
• 0 is an empty process; • K1( x ) + … + Kn( x ) is a choice, where each Ki is one of No. 3
2015
186
ZAKHAROV et al.
Fig. 2. Snippets of the USB CDC Phonet module.
ˆ ( x , y )—receiving vector of input • α( y i ). K i i parameters y i over channel α; ˆ ( x ) —sending vector x over channel α; • α ( x ).K i • τ. y i ( x )—a silent action; • [ x = v ]K( x ) is a match. If x and v are equal the process behaves like K( x ) For convenience we add operator [ x ≠ v ] similarly to [ x = v ] where the latter tests inequality.
3.2. πmodel for Kernel Module and Environment A Linux kernel module and its environment can be considered as a parallel composition of processes in πcalculus. There is process Pfcall which has the same behavior as implementation in the C language for all callbacks of the kernel module. Process Pfcall receives requesting callback invocation messages f ( ret i, f i, ctx i, params i ) , where reti is a channel name for a response, fi is a callback function, ctxi is a calling context and paramsi are callback parameters. As far as callbacks can be executed in parallel, process Pfcall is
PROGRAMMING AND COMPUTER SOFTWARE
Vol. 41
No. 3
2015
PATTERNBASED ENVIRONMENT MODELING
replicated for each call. On function return Pfcall replies with return value reti(result). Hence the kernel module can be seen as the com position: Moduleπ ::= Pinit|Pexit|!Pfcall Processes Pinit and Pexit represent module initializa tion and exit functions correspondingly. They are not replicated as far as these functions cannot be called by the environment in parallel. These processes receive messages init(ret) and exit(ret), and send ret ( x ) , ret( ) , where ret is a channel for returning control and the resulting value if it is present. The environment is represented as a composition: Envπ ::= P gi …| P vi …|Pmodule| P ai … where processes P gi represent kernel core functions g1(ret1, params1), …, gl(retl, paramsl) and P vi represent global variables with set vi (x), get vi ( x ) . These pro cesses model functions and variables of the environ ment which can be used by the kernel module during initialization, execution of callbacks and exit. An active part of the environment modeling interac tion scenarios with the kernel module includes main pro cess Pmodule calling initialization and exit functions and a set of P ai processes calling callbacks of the module. The environment model is divided into parts accord ing to group types. The first group is a special group for modeling initialization after the module is loaded and exit before the module is unloaded; it can be considered as the following parallel composition: Pmodule|Ptrymoduleget. Pmodule starts interaction with the module: Pmodule ::= (νret)L0 L0 ::= init ( ret ). ret(r).[r = 0]L1 + [r ≠ 0]{ 0 } L1 ::= mstop.exit ( ret ). ret. 0 Process L0 sends init message to the module. The module performs initialization, e.g. it registers call back groups. If initialization is successful, L0 sends 0 to channel ret, and continues as L1. Otherwise, init sends an error code and execution is finished. The environment interacts with the module until message exit is sent. On receiving it the module deregisters call back groups and deallocates resources. Ptrymoduleget is an auxiliary process intended for blocking module unloading, i.e. forbidding to call exit function by an environment. The process acquiring the module sends tmg message and receives tmgret(true) answer in case of success. mstop is required for disabling all acquirements when unloading the module (calling exit function). Ptrymoduleget ::= M0 ret
M0 ::= tmg. tmg ( true ). M1 + mstop.MD PROGRAMMING AND COMPUTER SOFTWARE
187 ret
MD ::= mstop.MD + tmg. tmg ( false ). .MD For each i ≥ 1 we define the process: ret
ret
Mi ::= tmg. tmg ( true ). Mi + 1 + mput. mput . Mi – 1 3.3. Group Type Specifications We define a group type specification as a πprocess parametrized with callback roles of the group type. The process should model actions performed at calling registration/deregistration functions and should invoke callbacks with certain roles according to the contract. In the example in Fig. 2 function usb_register is called with a pointer to variable usbpn_driver of type usb_driver, containing pointers to callbacks usbpn_probe and usbpn_disconnect with roles usb_driver.probe and usb_driver.disconnect. For each callback group a registration function model creates instances of group processes parame trized by callback roles (Fig. 3). The number of cre ated instances depends on the number of correspond ing resources with which each group instance operates with. For instance, if group file_operations is operating with file resources a separate instance for each file may be created. A usb_driver specification has parameters probe and disconnect for roles usb_driver.probe and usb_driver.disconnect. In the example for each registered callback group we create one instance of process and a new resource intf which is initialized with initialize. Then this resource is passed to call backs usbpn_probe and usbpn_disconnect as a parameter. Pusb_driver is defined as follows (we use Si to denote nodes shown in Fig. 3). Pusb_driver ::= S0 S0 ::= !(νintf, νret)register(deregister, probe, discon nect).S1 S1 ::= deregister. 0 + tmg .S2tmgret(r) .〈S3〉([r = true|S4 + [r = false]S1) S4 ::= initilize ( intf ). S5 S5 ::= f ( ret, probe, intf ). 〈S6〉ret(r) .〈S7〉([r = 0]S8 + [r ≠ 0]S9) S8 ::= f ( ret, disconnect, intf ). 〈S10〉ret.S9 S9 ::= mput. S1 Semantics of πprocess interactions allow to define both dependencies between groups and dependencies with initialization and exit functions. In the example in Fig. 3 the module cannot be unloaded after the pro cess came to state S4 where it is ready to call callbacks usbpn_probe and then usbpn_disconnect. For the environment it means that we cannot call the
Vol. 41
No. 3
2015
188
ZAKHAROV et al. !(vintf, vret) register(deregister, probe, disconnect)
S1
S0
tmg
tmgret(r)
S2
S3
deregister
[r = false]
mput [r = 0]
0 ret
ret(r)
[r = 0]
S8
S10
S4
S7
S9
f(ret, disconnect, intf)
[r = true]
initialize(intf)
S5
S6 f(ret, probe, intf)
Fig. 3. Example of Pusb_driver modeling callback group of type usb_driver.
exit function in process Pmodule. Process Ptrymoduleget stores the number of module acquirements. Before calling any callbacks process Pusb_driver sends tmg . In case of success it goes to process S4. Otherwise, if the module is unloading (r = false), it goes to S1 and waits for deregistration. 3.4. Group Pattern Specifications We analyzed a plenty of Linux kernel modules and found that there are similar restrictions for different roles of different group types. For example, in Fig. 4 we have callback group 2430_driver of type platform_driver. It has two roles platform_driver.probe and platform_driver.remove. The roles are the same as usb_driver.probe and usb_driver.disconnect of group type usb_driver. The groups are operating with different resources of types usb_interface and platform_device in the same manner. We say that two group types have the same group pattern specification in case of correspon dence between roles and resources, i.e. contracts for the groups are equivalent. We define a group pattern specification as a pair of a group pattern and a parametrized πprocess. The group pattern is defined as abstract roles and an abstract resource type, which describe a set of concrete roles and a set of concrete resource types. In the exam ple we have two abstract roles probe describing con crete roles platform_driver.probe and usb_driver.probe, and disconnect describing concrete roles platform_driver.remove and usb_driver.disconnect. The abstract resource type represents a set of concrete resource types. Usually abstract resources determine parame ters of callbacks with abstract roles. Abstract resources are usually passed as parameters. In the example the abstract resource is passed as a first parameter to func tions with abstract roles probe and disconnect.
For group pattern specifications we have πpro cesses parametrized by abstract roles, abstract resource types, registration and deregistration func tions. Abstract roles may be set as mandatory or optional, in the later case a callback with a concrete role may be absent. For the example, behavior may be described by the process similar to Pusb_driver with addition of registration and deregistration functions for groups containing callbacks for abstract roles probe and disconnect. Abstract resource intf is described as a first parameter of callbacks. The presented specification pattern describes call back groups of such types as usb_driver, platform_driver, sdio_driver, pcmcia_driver, etc. (see source code of Linux kernel version 3.13). Note, that these group types can have additional roles, e.g. suspend and resume. If a group has callbacks with roles which are not matched by the group pattern then it is not applicable. 3.5. Method for Environment Modeling With help of πprocesses it is possible to describe precise models of the Linux kernel modules environ ment. In practice development of precise models for all group types takes too much time. Moreover, the majority of bugs in modules can be found with less pre cise models. That is why we propose a method for modeling the kernel modules environment that pro vides means to specify patterns which are applicable to many group types. Also the method still allows to define a detailed model for a particular group to meet requirements on completeness and correctness. An individual environment model is constructed for each kernel module. It is composed of πprocesses that model an active part of the environment related with each callback group identified in the module. The method for modeling the environment consists of three steps. On the first step the environment model developer defines a kernel activity specification (TS, PS, DS, KS), where
PROGRAMMING AND COMPUTER SOFTWARE
Vol. 41
No. 3
2015
PATTERNBASED ENVIRONMENT MODELING
1. TS is a map from a group type to a group type specification; 2. PS is a set of group pattern specifications; 3. DS is a default group specification; 4. KS is a set of kernel core specifications. Group type specifications describe precise models of group types. Where it is possible group pattern spec ifications are used to describe sets of callback groups of corresponding types. The default group specification describes a process that invokes callbacks in an arbi trary order. Kernel core specifications contain descriptions of kernel core functions and additional processes shared between group specifications, e.g. Ptrymoduleget. On the second step a πmodel of the environment is constructed for a particular kernel module. First of all, source code of the kernel module is analyzed to identify callback groups used in it. For each callback group extracted information includes callback roles, each with an associated callback function if it is iden tified, registration/deregistration functions if present and concrete resource types. For each callback group either a precise group type specification is found or the most relevant group pat tern specification is searched for. To select a group pattern abstract roles are matched with concrete ones, abstract resources are matched with concrete resources passed as parameters to regis tration functions or to callbacks with abstract roles. Each nonoptional abstract role in the specification should have a corresponding concrete role. Among all available patterns a pattern with the largest set of matched abstract roles is chosen. The pattern is applied to callback groups of the kernel module by replacing abstract roles with concrete ones, and replacing abstract resource types by concrete ones. If a relevant group pattern specification is not found then the default group specification is applied. Finally, all πprocesses of each callback group identified in the module are combined in a parallel composition with the kernel core specification KS into a πmodel of the module environment. On the third step the πmodel of the environment is translated into the input format of static verifiers (see the next Section for details). 4. DRIVER ENVIRONMENT GENERATOR The suggested approach was implemented as a component of Linux Driver Verification (LDV) Tools [11, 12]. LDV Tools allow to check that Linux kernel modules do not violate rules of correct usage of the Linux kernel API. The method for generating environment model described in the previous section is implemented in Driver Environment Generator (DEG) component of LDV Tools. According to the method, the user pro vides a kernel activity specification on the first step. PROGRAMMING AND COMPUTER SOFTWARE
189
Fig. 4. Example of platform_driver group type from Inven tra Controller OMAP2430 driver.
The specification is defined in terms of callback groups (i.e. defining roles, resources, registration and deregistration), yet in practice these groups are not defined by the module explicitly. DEG should extract callback groups from the module source code. For that purpose the user defines Modules Interface Queries to extract information from the source code and a Group Type Descriptions containing the hints for selecting callback groups on the base of the extracted informa tion. The kernel activity specification is provided by the user in Kernel Activity Specification Language (KASL) which has the means to describe parameterized πpro cesses. In the KASL the group type specification is defined as a pair of factory process and a set of behav ioral processes. The factory process takes care of group registration and deregistration and creates a number of behavioral processes. The behavioral process describes interaction scenarios of the group type. It sends signals to the Moduleπ processes. As far as in practice there are a lot of similarities between group types, the KASL allows to reuse parameterized processes between them. A group pattern specification is also divided into two parts: a factory process and a set of behavioral pro cesses. The behavioral processes are parameterized by abstract roles and abstract resources which are matched by factory process with concrete ones. In KASL the factory process of the group type specifi cation may also create behavioral processes in the same way as for group pattern specification. The reuse of behavioral processes in group type specifications and in group pattern specifications allows to minimize the specification sizes. The second and third steps of the method are auto mated by DEG, see Fig. 5. The second step is divided into three stages which are correspondingly performed by following DEG subcomponents: GIF, Group Builder and Environment Model Generator. The third step of the method is implemented in Model Transla tors.
Vol. 41
No. 3
2015
190
ZAKHAROV et al.
1. GIF extracts data about C expressions which cor respond to callback groups. 2. Group Builder composes lists of callback groups from extracted data by determining registration and deregistration functions, callback function names, resources and callback group type. 3. For each extracted callback group Environment Model Generator matches list of extracted callback groups with group type or group pattern specification in KASL and translates it into the Core Notation—the internal representation for the environment model. 4. Finally DEG employs various Model Translators which translate the Core Notation to C code in differ ent ways depending on user choice. On the first stage DEG extracts data using C Instrumentation Framework (CIF) [13] which is also a part of LDV Tools. CIF allows C source code query ing [14]. Currently CIF supports queries to get infor mation on function calls, macro expansions, global variable declarations, including structure initializa tion, parameters passed to registration functions and macros. As an input DEG takes manually prepared Modules Interface Queries file with requests to CIF. The queries in the file are used to extract module interfaces which are used for interaction with Linux kernel like callback declarations, registration and deregistration function calls, global structure initial izations with assigned to fields function pointers to module callbacks. The file can be updated with new queries if the new callback group specification is added. CIF analyzes source code of the module under verification and generates Module Interface Raw Info for the further usage in DEG. Second, DEG composes extracted data about C code expressions and statements from Module Inter face Raw Info into callback groups. DEG generates callback groups with defined callback group type, roles, callbacks, resources, registration and deregistra tion functions. DEG configuration contains the Group Type Descriptions containing instructions for building callback groups from extracted raw interface data. The descriptions of groups are divided into categories depending on the expressions which were used to extract the data. One of the main supported categories is a structural category where callback groups are com posed from the initialization of structural global vari ables containing pointers to module callbacks. Call back roles in this case are defined as structure fields, callbacks as corresponding values stored in such fields, resources as callback arguments, registration and deregistration functions can be set by the user or deter mined automatically. The other categories are for the kernel function API like interrupt handlers, timer handlers. In this case the user specifies existing roles, registration and deregistration functions manually, but callbacks are extracted from registration function invocations automatically. New callback group type can be added by supplementing a configuration file.
As it has been mentioned before, the extraction of callback groups from the module source code is sepa rated into Modules Interface Queries and Group Type Descriptions to simplify updating of specification when the Linux kernel changes. If the changes in the Linux kernel affect only the group extraction, e.g. renaming the registration function, then we do not need to change behavioral processes. On the third stage DEG translates kernel activity specification in KASL to the Core Notation using extracted callback groups. The semantics of Core Notation is defined in terms of πcalculus with an extension which additionally allows to send broadcast signals, where at least one receiver exists for successful send, and some parts of the process can be specified directly in the C language to make the specification easier. Kernel activity specification in KASL is not defined in πcalculus itself, but it describes parame trized processes. Environment Model Generator gets specification in KASL and on the base of it and pro vided callback groups generates parametrized pro cesses for the environment model and then prepares the environment model in the Core Notation setting the parameters. As it was mentioned a specification in KASL is represented as a pair: a factory process and a set of behavioral processes. For each callback group the Environment Model Generator finds the corre sponding factory process according to the group type. If a factory for a type does not exists, the default fac tory is used to generate the process. According to the factory the corresponding behavioral processes are generated by DEG for the environment model. More over, it is possible to use behavioral processes from group pattern specifications for a particular group type specification. Such implementation of group type and group pattern specifications helps to reduce manual work on specifying where reusable behavioral pro cesses are exploited besides keeping strict matching of group type specifications with callback groups of a particular type. On the last stage DEG generates a C program from the model in Core Notation, because of this is an input language for the most of static verifiers. DEG imple ments approach described in detail in [15]. LDV Tools prepare a verification task on the basis of kernel mod ule source code and apply various reachability static C verification tools. Currently LDV Tools support the following staticverification tools: CPAchecker [3], BLAST [4, 5], CBMC [6] and UFO [7] based on CounterExample Guided Abstraction Refinement (CEGAR) and Bounded Model Checking (BMC). These tools proved their efficiency on annual compe titions on software verification [16]. The environment model, which meets require ments imposed by particular source code verification tool, is generated by Driver Environment Generator (DEG) component of LDV Tools. In practice DEG cannot generate precise models for all kernel modules because of restrictions of preliminary source code
PROGRAMMING AND COMPUTER SOFTWARE
Vol. 41
No. 3
2015
PATTERNBASED ENVIRONMENT MODELING
191
Module Source Code CIF Module Interface Raw Info
Modules Interface Queries Group Type Descriptions
Group Builder
Kernel Activity Specification
Environment Model Generator
Callback Groups
Environment Model in Core Notation
DEG Configuration Model Translators
Environment Model in C
DEG
Fig. 5. Stages of Driver Environment Generator.
analysis and specific requirements of static verification tools. The most important issues are the following: 1. Not all callback groups can be extracted cor rectly with CIF from module source code. For instance, currently the analysis misses dynamically assigned callbacks. 2. If extracted data does not allow to accurately define a πprocess on the base of specifications and data obtained from module source code, DEG adapts specifications when it is possible. DEG can add default registration or function stubs for missed roles heuristically. 3. The majority of verification tools can analyze only sequential C code, yet original environment models are parallel in the notation of πcalculus. The restriction is dramatically important, since it requires translation of parallel πmodels into sequential source code. To do this DEG uses a method for translating π models into sequential C program described in detail in [15]. This method throws off simultaneous execu tion of callbacks. Thus, it reduces completeness of environment models. As consequence sequential models does not allow to find specific bugs like race conditions. 4. Most of static verifiers based on BMC need sig nificantly accurate models with minimum uninitial ized variables. This obstacle increases manual work for specifying kernel core functions in DEG specifica tions and also makes application of group pattern specifications less effective (some differences in ini tialization of resources become important). That is why currently LDV Tools yield better results for PROGRAMMING AND COMPUTER SOFTWARE
CEGAR based verification tools like CPAchecker or BLAST. 5. Most of verification tools have restricted support of function pointers. If DEG extracts a function name for a concrete role instead of a function pointer, it replaces a corresponding parameter of the process by that function name. In this case generated code becomes much more friendly for verification tools. 6. Static verification tools have restricted support of pointer arithmetic. The issue carries great weight for code generation, since without lists or other dynami cally allocated structures it is too hard to describe in C code all feasible scenarios from πmodels. DEG does not support replication of processes from πmodels while an approach in [15] suggested to store a state of process instances in a list. Nevertheless, several pro cess instances can be defined manually in specifica tions. Although considered issues currently does not allow to generate precise environment models for Linux kernel modules, experimental results demon strate that models are already sufficiently precise. 5. EXPERIMENTAL RESULTS In this paper we present results of verification of modules from drivers subsystem of two Linux kernel versions: 3.13rc1 and 3.17rc1. The first one contains 3311 modules in the drivers subsystem and the second contains 3601 correspondingly on the architecture x86_64 in configuration allmodconfig. We applied for checking of both versions one version of LDV Tools with the same DEG configuration to prove
Vol. 41
No. 3
2015
192
ZAKHAROV et al.
Table 1. Verification results for drivers from the Linux kernel of 3.13rc1 and 3.17rc1 versions Mutexes
Spinlocks
Clocks
Atomic allocation
Results Unsupported Failed Safe False alarms Actual bugs Total
3.13rc1
3.17rc1
3.13rc1
3.17rc1
3.13rc1
3.17rc1
3.13rc1
3.17rc1
343 191 2748 29 6 3311
342 211 3020 28 1 3601
343 195 2755 18 2 3311
342 214 3026 19 4 3601
343 230 2699 39 2 3311
342 248 2945 66 1 3601
343 156 2804 8 2 3311
342 176 3077 6 1 3601
applicability of the tool to different kernel versions. Each module was checked against 4 rules of correct usage of kernel API with time and memory limits equal to 900 seconds and 15 Gb. The first two lines of the Table 1 reflect number of modules which were not checked successfully. The first line shows number of modules where the tool failed to build a module or to extract callback groups. The second line contains number of modules where verdict was not obtained because of errors in verification tool or toolset or because of lack of memory or time. Successful results are presented in details in the next section. 5.1. Model Correctness To evaluate correctness of generated environment models we verified all modules against rules which specify how to correctly use mutexes, spinlocks, clocks and memory allocation in atomic context. Table 1 shows the number of false alarms and actual bugs yielded by BLAST static verification tool [5]. BLAST can report only one alarm message per kernel module and rule, thus the number of yielded alarms coincides with the number of modules. After checking of lion share of all modules BLAST produced Safe verdict which means that tool did not found any violations of rule of correct usage of kernel API. Other modules, where alarms were produced, can be easily analyzed manually to find out reasons. The Table 1 shows that actual bugs constitute a consid erable amount of all alarms, yet there are still much more false alarms yielded. More than a half of false alarms were caused by incorrect environment models where the Table 2 explores the reasons. All problems in generated environment models can be divided into four classes. First one corresponds to the incorrect context modeling where invocation of callbacks should be performed under mutex or spinlock, as it happens in the kernel code. The second class contains cases where alarms are caused by an absence of suit able behavioral process for accurate callback invoca tion. The third class corresponds to an inappropriate initialization of resources which in case of providing them to control functions (ioctl) leads to availability of infeasible scenarios in the model. The last class
includes modules where DEG failed to model interac tion of several callback groups. This is often caused by incorrectly defined registration methods or insuffi cient capabilities of CIF. Overall, the number of false alarms because of incorrect models is minuscule in comparison with the number of all verified modules (0.5%). It is worth to mention that the one version of DEG configuration allowed to successfully apply the tool for different ver sions of the kernel. Toolset was able to find actual bugs in both Linux kernel versions with an appropriate rate of false alarms. The data shows that correctness of generated environment models is good enough, although the current implementation still misses some features. 5.2. Model Completeness To evaluate completeness of generated environ ment models we checked whether already fixed bugs can be found by LDV Tools. For the benchmark we had chosen 44 different bugs from a wide range of Linux kernel subsystems. LDV Tools found 13 bugs (30%), while environment model incompleteness caused missing of 8 bugs (18%). A reason for missing 6 bugs (14%) is that environment models do not model interaction between several modules (currently we model only interaction of modules with the kernel core). Another 2 (4%) bugs were missed because CIF cannot extract information on dynamically assigned callbacks and lack of a specification. The rest 23 (52%) bugs were missed because of issues both in the BLAST tool and in rule specifications as well as exhaustion of memory or time limits. 5.3. Group Patterns Matching The kernel activity specification used in experi ments consists of 7 group pattern specifications, 28 group type specifications for callback group types and the default specification. Table 3 shows how many callback groups and cor responding group types are found in both Linux kernel 3.13rc1 and Linux kernel 3.17rc1 and how they were matched. Approximately a half of total number of call
PROGRAMMING AND COMPUTER SOFTWARE
Vol. 41
No. 3
2015
PATTERNBASED ENVIRONMENT MODELING
193
Table 2. False alarms caused by incorrect environment models Mutexes
Spinlocks
Clocks
Atomic allocation
Category Not caused by DEG Caused by DEG context modeling order of callback invocation resource initialization registration modeling False alarms in total
3.13rc1
3.17rc1
3.13rc1
3.17rc1
3.13rc1
3.17rc1
3.13rc1
3.17rc1
15 14 4 7 1 2 29
12 16 5 8 2 1 28
9 9 6 1 2 0 18
10 9 7 1 1 0 19
13 26 0 3 2 21 39
29 37 0 3 2 32 66
2 6 5 1 0 0 8
1 5 4 1 0 0 6
back groups were matched by developed group type and group pattern specifications in proportion of 2:3. The number of matched group types confirms that application of group pattern specifications drastically reduces work on specification development. These results show that DEG generated environ ment models for callback groups of more than 670 group types, where only 28 of them are described explicitly in configuration, and required only 35 group specifications in total. It allowed to specify twenty times less group types than it would be necessary to define in case of describing all types manually. Thus, it evidently demonstrates considerable saving of efforts on manual specifying. 6. RELATED WORK Existing approaches for environment modeling of kernel modules significantly differ by required efforts and theirs means to specify precise models. For the Linux kernel there are two approaches implemented in DDVerify [17] and Avinux [18]. DDVerify requires manual development of envi ronment models in C language. It allows to specify models with any precision, since all possible C expres sions are available, yet this approach requires a lot of effort. In DDVerify only 4 group types, models for tim ers and interrupts were specified for Linux kernel 2.6.19. Developed models highly depended on kernel headers, that complicates migration to next versions of the Linux kernel. The developers used state variables to define an order and parameters for callback invocations. Models for registration functions transfer pointers to callbacks into state variables. Callback invocations were imple mented via function pointers. This makes environ ment model code complicated for CEGAR based static verifiers. Model precision allowed to effectively apply just BMC based static verifiers like CBMC [6]. Environment models developed for DDVerify are fully covered by environment models specified in LDV Tools. PROGRAMMING AND COMPUTER SOFTWARE
Avinux extracts some information on callbacks by analysis of modules source code. It does not impose any restrictions on an invocation order of callbacks and provides only initialization of their parameters. The tool considers each export function as separate entry point which significantly reduces model com pleteness. Additional entry points can be specified manually. Moreover, the tool is not able to automati cally analyze modules that consist of several files. SDV [19] was developed for verification of Win dows device drivers. The Windows driver developers who want to check their drivers with SDV have to write annotations for driver callbacks manually. These annotations are used to determine which callbacks the driver under verification provides to the kernel. On the base of these annotations an environment model is generated using manually prepared models for corre sponding callback group types. Also model includes stubs for kernel functions which are used in drivers. Microsoft developers manually prepared models for supported callback group types. It is worth to mention that in comparison with the Linux kernel Windows kernel interfaces are stable and Windows kernel con tains much less callback group types. Moreover, in contrast to Microsoft we could not force the commu nity to annotate drivers. Another approach for environment modeling is implemented in the tool DC2 [20] for scopebounded software verification. Scope bounding limits means excluding functions that are deeply nested in the call graph, thereby enhancing scalability. Environment model is generated as function constraints instead of real environment including global variables and unknown calling context. Instead of library function calls, missing source code, or functions deemed out side the scope of the tool stubs are generated. Function constraints and stubs are automatically generated with the help of a lightweight and scalable whole program analysis called SPECTACKLE. For model improving authors proposed to use counterexample guided envi ronment refinement (CEGER) approach where alongside with a counterexample user also gets con straints generated as a part of an environment model.
Vol. 41
No. 3
2015
194
ZAKHAROV et al.
Table 3. Matching of callback groups and group types with DEG specifications Callback groups
Group types
Matched with Group type specifications Group pattern specifications Default group specification Total
3.13rc1
3.17rc1
3.13rc1
3.17rc1
4284 6105 9932 20321
4784 6816 10769 22369
28 261 381 670
28 281 438 747
User can fix them and restart verification. This approach looks reasonable and effective for medium sized programs while for applying to big systems it still requires a lot of efforts. 7. CONCLUSION The paper proposed a new method for generating environment models for Linux kernel modules on the base of specifications that describe patterns of scenar ios of interaction between modules and their environ ment. The approach allowed to achieve sufficient level of models correctness and completeness minimizing the efforts for specification development by an order of magnitude at the same time. The developed environ ment models are applicable in continuous evolution of the Linux kernel. As a result it aided to perform large scale verification of Linux kernel modules with static verifiers that require precise environment models. REFERENCES 1. Palix, N., Thomas, G., Saha, S., et al., Faults in Linux: ten years later, Proc. 16th Int. Conference on Architec tural Support for Programming Languages and Operating Systems. ASPLOS XVI, ACM, 2011, pp. 305–318, URL: http://doi.acm. org/10.1145/1950365.1950401. 2. Mutilin, V.S., Novikov, E.M., and Khoroshilov, A.V., Analiz tipovykh oshibok v drajverakh operatsionnoj sistemy Linux [Analysis of typical faults in Linux oper ating system drivers], Trudy ISP RAN [Proc. ISP RAS], 2012, vol. 22, pp. 349–374. 3. Beyer, D. and Keremoglu M. Erkan, CPAchecker: A tool for configurable software verification, Computer Aided Verification, Springer Berlin Heidelberg, 2011, vol. 6806 of Lecture Notes in Computer Science, pp. 184–190, URL: http://dx.doi.org/10.1007/ 9783 642221101_16. 4. Beyer, D., Henzinger, T.A., Jhala, R., and Majumdar, R., The software model checker Blast, International Journal on Software Tools for Technology Transfer, 2007, vol. 9, nos. 5–6, pp. 505–525, URL: http:// dx.doi.org/10.1007/s100090070044z. 5. Shved, P., Mandrykin, M., and Mutilin, V., Predicate analysis with BLAST 2.7, Tools and Algorithms for the Construction and Analysis of Systems, Springer Berlin Heidelberg, 2012, vol. 7214 of Lecture Notes in Com puter Science, pp. 525–527, URL: http://dx.doi.org/ 10.1007/9783642287565_39.
6. Clarke, E., Kroening, D., and Lerda, F., A tool for checking ANSIC programs, Tools and Algorithms for the Construction and Analysis of Systems, Springer Ber lin Heidelberg, 2004, vol. 2988 of Lecture Notes in Com puter Science, pp. 168–176, URL: http://dx.doi.org/10. 1007/9783540247302_15. 7. Aws Albarghouthi, Arie Gurfinkel, Yi Li, et al., UFO: Verification with interpolants and abstract interpreta tion, Tools and Algorithms for the Construction and Anal ysis of Systems, Springer Berlin Heidelberg, 2013, vol. 7795 of Lecture Notes in Computer Science, pp. 637–640, URL: http://dx.doi.org/10.1007/9783 642367427_52. 8. Engler Dawson and Musuvathi Madanlal, Static analy sis versus software model checking for bug finding, Ver ification, Model Checking, and Abstract Interpretation, Springer Berlin Heidelberg, 2004, vol. 2937 of Lecture Notes in Computer Science, pp. 191–210, URL: http://dx.doi.org/10.1007/9783540246220_17. 9. Milner, R., Parrow, J., and Walker, D., A calculus of mobile processes, I, Inf. Comput., 1992, vol. 100, no. 1, p. 140, URL: http://dx.doi.org/10.1016/ 0890 5401(92)900084. 10. Milner, R., The Polyadic πCalculus: a Tutorial, LFCS, Department of Computer Science, University of Edin burgh, 1991, p. 49. 11. Khoroshilov, A., Mutilin, V., Novikov, E., et al., Towards an open framework for C verification tools benchmarking, Perspectives of Systems Informatics, Springer Berlin Heidelberg, 2012, vol. 7162 of Lecture Notes in Computer Science, pp. 179–192, URL: http://dx.doi.org/10.1007/ 783642297090_17. 12. Zakharov, I.S., Mandrykin, M.U., Mutilin, V.S., et al., Konfiguriruemaya sistema staticheskoj verifikatsii mod ulej yadra operatsionnykh sistem [Configurable toolset for static verification of operating systems Kernel mod ules], Trudy ISP RAN [Proc. ISP RAS], 2014, vol. 26, pp. 5–42. 13. Novikov, E.M., An approach to implementation of aspectoriented programming for C, Programming and Computer Software, 2013, vol. 39, no. 4, pp. 194–206, URL: http: //dx.doi.org/10.1134/S0361768813040051. 14. Novikov, E.M. and Khoroshilov, A.V., Ispol’zovanie aspektnoorientirovannogo programmirovaniya dlya vypolneniya zaprosov po iskhodnomu kodu programm [Using AspectOriented Programming for Querying Source Code], Trudy ISP RAN [Proc. ISP RAS], 2012, vol. 23, pp. 371–386. 15. Zakharov, I.S., Mutilin, V.S., Novikov, E.M., and Khoroshilov, A.V., Modelirovanie okruzheniya drajverov
PROGRAMMING AND COMPUTER SOFTWARE
Vol. 41
No. 3
2015
PATTERNBASED ENVIRONMENT MODELING ustrojstv operatsionnoj sistemy Linux [Environment modeling of Linux operating system device drivers], Trudy ISP RAN [Proc. ISP RAS], 2013, vol. 25, pp. 85– 112. 16. Beyer Dirk, Second competition on software verifica tion, Tools and Algorithms for the Construction and Anal ysis of Systems, Springer Berlin Heidelberg, 2013, vol. 7795 of Lecture Notes in Computer Science, pp. 594–609, URL: http://dx.doi.org/10.1007/9783 642367427_43. 17. Witkowski, T., Blanc, N., Kroening, D., and Weissen bacher, G., Model checking concurrent Linux device drivers, Proc. 22nd IEEE/ACM Int. Conference on Auto mated Software Engineering, New York, NY, USA: ACM, 2007, pp. 501–504.
PROGRAMMING AND COMPUTER SOFTWARE
195
18. Post Hendrik and Küchlin Wolfgang, Integrated static analysis for Linux device driver verification, Integrated Formal Methods, Berlin, Heidelberg: SpringerVerlag, 2007, pp. 518–537, URL: http://portal.acm.org/ cita tion.cfm?id=1770498.1770525. 19. Ball, T., Bounimova, E., Cook, B., et al., Thorough static analysis of device drivers, SIGOPS Oper. Syst. Rev., 2006, vol. 40, no. 4, pp. 73–85. 20. Ivancic, F., Balakrishnan, G., Gupta, A., et al., DC2: A framework for scalable, scopebounded software verifi cation, Automated Software Engineering (ASE), Proc. 2011 26th IEEE/ACM International Conference on, 2011, pp. 133–142.
Vol. 41
No. 3
2015