Global core, and galaxy structure of networks

6 downloads 4182 Views 8MB Size Report
Degree distributions of HEP-TH. G1 (8638). G2 (5056). G3 (2936). G4 (1529) ...... Community based greedy algorithm for mining top-k influential nodes in mobile.
Editorial Board

Supported by NSFC .

Honorary Editor General

GuangZhao ZHOU (Zhou Guang Zhao)

Editor General

ZuoYan ZHU

Institute of Hydrobiology, CAS, China

Editor-in-Chief

Wei LI

Beihang University, China

Advisory Committee Cor CLAEYS Interuniversity Microelectronics Centre, Belgium Hiroshi IWAI Tokyo Institute of Technology, Japan

ShengGang LIU Univ. of Electronic Science & Technology of China, China T. P. MA Yale Univ., USA

YanDa LI Tsinghua Univ., China

Paul J. WERBOS National Science Foundation, USA

Hong MEI Peking Univ., China

Dongming WANG Centre National de la Recherche Scientifique, France

Lei GUO Academy of Mathematics & Systems Science, CAS, China

Ru HUANG Peking Univ., China

Cesare ALIPPI Politecnico di Milano, Italy Jordan M. BERG Texas Tech Univ., USA JianEr CHEN Texas A&M Univ., USA JingSheng Jason CONG Univ. of California, Los Angeles (UCLA), USA S. Barry COOPER Univ. of Leeds, U.K. Simon DELEONIBUS Laboratorios LETI, France Richard LiMin DU Voxeasy Institute of Technology, China Wen GAO Peking Univ., China ShuZhi Sam GE National Univ. of Singapore, Singapore JiFeng HE East China Normal Univ., China XiaoMing HU Royal Institute of Technology, Sweden ZhanYi HU Institute of Automation, CAS, China Jie HUANG The Chinese Univ. of Hong Kong, Hong Kong, China Amir HUSSAIN Univ. of Stirling, U.K. YueFeng JI Beijing Univ. of Post & Telecommunication, China ZhongPing JIANG Polytechnic Institute of NYU, USA Hai JIN Huazhong Univ. of Science & Technology, China ZhongLiang JING Shanghai Jiao Tong Univ., China XueJia LAI Shanghai Jiao Tong Univ., China Joshua LeWei LI Monash Univ., Australia

WeiPing LI Univ. of Science & Technology of China, China XueLong LI Xi'an Institute of Optics & Precision, CAS, China GuiSheng LIAO Xidian Univ., China DongDai LIN Institute of Information Engineering, CAS, China Zongli LIN Univ. of Virginia, USA DeRong LIU Institute of Automation, CAS, China KePing LONG Univ. of Science & Technology Beijing, China Teng LONG Beijing Institute of Technology, China Jian LÜ Nanjing Univ., China PingXi MA China Electronics Coporation, China David Z. PAN Univ. of Texas at Austin, USA Marios M. POLYCARPOU Univ. of Cyprus, Cyprus Long QUAN The Hong Kong Univ. of Science & Technology, Hong Kong, China XianHe SUN Illinois Institute of Technology, USA ZhiMin TANG Institute of Computing Technology, CAS, China Jie TIAN Institute of Automation, CAS, China WeiTek TSAI Arizona State Univ., USA ChengXiang WANG Heriot-Watt Univ., U.K. JiangZhou WANG Kent Univ., U.K. Long WANG Peking Univ., China XiaoDong WANG Columbia Univ., USA

Howard M. WISEMAN Griffith University, Australia YaQin ZHANG Microsoft Co., Ltd, USA Taieb ZNATI The Univ. of Pittsburgh, USA

Executive Associate Editors-in-Chief

Associate Editors-in-Chief XiaoHu YOU Southeast Univ., China

Members

Editorial Staff Cover Designer

Fei SONG Yu HU

Jing FENG

JunQing LI

Kai JIANG

ZiYu WANG Peking Univ., China Martin D. F. WONG Univ. of Illinois, USA Jie WU Temple Univ., USA WeiRen WU Lunar Exploration and Aerospace Engineering Center, China XinDong WU Univ. of Vermont, USA YiRong WU Institute of Electronics, CAS, China Donald C. WUNSCH Missouri Univ. of Science & Technology, USA XiangGen XIA Univ. of Delaware, USA ChengZhong XU Wayne State Univ., USA Jun XU Tsinghua Univ., China Ke XU Beihang Univ., China ZongBen XU Xi'an Jiaotong Univ., China Qiang YANG The Hong Kong Univ. of Science & Technology, Hong Kong, China Xin YAO Univ. of Birmingham, U.K. MingSheng YING Tsinghua Univ., China HuanGuo ZHANG Wuhan Univ., China FuChun ZHENG Univ. of Reading, U.K. Dian ZHOU The Univ. of Texas at Dallas, USA ZhiHua ZHOU Nanjing Univ., China Albert Y. ZOMAYA The Univ. of Sydney, Australia

SCIENCE CHINA Information Sciences

Contents

Vol. 57 No. 7

July 2014

RESEARCH PAPER Global core, and galaxy structure of networks ........................................................................................................................................072101(20) ZHANG Wei, PAN YiCheng, PENG Pan, LI JianKou, LI XueChen & LI AngSheng Correlations between characteristics of maximum influence and degree distributions in software networks.......................................072102(12) GU Qing, XIONG ShiJie & CHEN DaoXu Identifying extract class refactoring opportunities for internetware .......................................................................................................072103(18) CHEN Lin, QIAN Ju, ZHOU YuMing, WANG Peng & XU BaoWen Implementation decision making for internetware driven by quality requirements ...............................................................................072104(19) WEI Bo, JIN Zhi, ZOWGHI Didar & YIN Bin Profiling selected paths with loops ..........................................................................................................................................................072105(15) LI BiXin, WANG LuLu & LEUNG Hareton Binary compatibility for embedded systems using greedy subgraph mapping ......................................................................................072106(16) CHEN XuHao, SHEN Li, WANG ZhiYing, ZHENG Zhong & CHEN Wei On the parameterized vertex cover problem for graphs with perfect matching .....................................................................................072107(12) WANG JianXin, LI WenJun, LI ShaoHua & CHEN JianEr Proof systems for planning under 0-approximation semantics ...............................................................................................................072108(12) SHEN YuPing & ZHAO XiShun An upper (lower) bound for Max (Min) CSP ..........................................................................................................................................072109( 9 ) HUANG Ping & YIN MingHao What is the effective key length for a block cipher: an attack on every practical block cipher .............................................................072110(11) HUANG JiaLin & LAI XueJia Cryptanalysis of a signcryption scheme with fast online signing and short signcryptext ......................................................................072111( 5 ) ZHOU DeHua, WENG Jian, GUAN ChaoWen, DENG Robert, CHEN MinRong & CHEN KeFei Gaussian sampling of lattices for cryptographic applications.................................................................................................................072112( 8 ) HU YuPu, LEI Hao, WANG FengHe & ZHANG WenZheng Real-time control of human actions using inertial sensors .....................................................................................................................072113(11) LIU HuaJun, HE FaZhi, ZHU FuXi & ZHU Qing Video color conceptualization using optimization ..................................................................................................................................072114(11) CAO XiaoChun, ZHANG YuJie, GUO XiaoJie & CHEUNG Yiu-Ming Fractional partial differential equation denoising models for texture image ..........................................................................................072115(19) PU YiFei, SIARRY Patrick, ZHOU JiLiu, LIU YiGuang, ZHANG Ni, HUANG Guo & LIU YiZhi Content-based image retrieval using high-dimensional information geometry......................................................................................072116(11) CAO WenMing, LIU Ning, KONG QiCong & FENG Hao Generating Lorenz-like and Chen-like attractors from a simple algebraic structure ..............................................................................072201( 7 ) WANG Xiong & CHEN GuanRong Further results on state feedback stabilization of stochastic high-order nonlinear systems .................................................................072202(14) XIE XueJun, ZHAO CongRan & DUAN Na Sequence memory based on an oscillatory neural network ....................................................................................................................072203(12) XIA Min, WENG LiGuo, WANG ZhiJie & FANG JianAn System lifecycle processes for cyber security in a research reactor facility ...........................................................................................072204(12) PARK JaeKwan, PARK JeYun & KIM YoungKi

Contents

Ownership by Science China Press; Copyright©2014 by Science China Press, Beijing, China and Springer-Verlag Heidelberg, Germany Submission of a manuscript implies: that the work described has not been published before (except in the form of an abstract or as part of a published lecture, review, or thesis); that it is not under consideration for publication elsewhere; that its publication has been approved by all co-authors, if any, as well aṣtacitly or explicitlỵby the responsible authorities at the institution where the work was carried out. The author warrants that his/her contribution is original and that he/she has full power to make this grant. The author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. Transfer of copyright to Science China Press and Springer becomes effective if and when the article is accepted for publication. After submission of the Copyright Transfer Statement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted by Science China Press and Springer. The copyright covers the exclusive right (for U.S. government employees: to the extent transferable) to reproduce and distribute the article, including reprints, translations, photographic reproductions, microform, electronic form (offline, online) or other reproductions of similar nature. An author may self-archive an author-created version of his/her article on his/her own website. He/she may also deposit this version on his/her institution’s and funder’s (funder designated) repository at the funder’s request or as a result of a legal obligation, including his/her final version, provided it is not made publicly available until after 12 months of official publication. He/she may not use the publisher’s PDF version which is posted on www.springerlink.com for the purpose of self-archiving or deposit. Furthermore, the author may only post his/her version provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer’s website. The link must be accompanied by the following text: “The original publication is available at www.springerlink.com”. All articles published in this journal are protected by copyright, which covers the exclusive rights to reproduce and distribute the article (e.g., as offprints), as well as all translation rights. No material published in this journal may be reproduced photographically or stored on microfilm, in electronic data bases, video disks, etc., without first obtaining written permission from the publishers. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations. While the advice and information in this journal is believed to be true and accurate at the date of its going to press, neither the authors, the editors, nor the publishers can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Special regulations for photocopies in the USA: Photocopies may be made for personal or in-house use beyond the limitations stipulated under Section 107 or 108 of U.S. Copyright Law, provided a fee is paid. All fees should be paid to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA, Tel.: +1-978-7508400, Fax: +1-978-6468600, http://www.copyright.com, stating the ISSN of the journal, the volume, and the first and last page numbers of each article copied. The copyright owner’s consent does not include copying for general distribution, promotion, new works, or resale. In these cases, specific written permission must first be obtained from the publishers.

SCIENCE CHINA Information Sciences

Vol. 57 No. 7

July 1 2014 (Monthly)

Supervised by Chinese Academy of Sciences Sponsored by Chinese Academy of Sciences and National Natural Science Foundation of China Published jointly by Science China Press and Springer Subscriptions China Science China Press, 16 Donghuangchenggen North Street, Beijing 100717, China Email: [email protected] Fax: 86-10-64016350 North and South America Springer New York, Inc., Journal Fulfillment, P.O. Box 2485, Secaucus, NJ 07096 USA Email: [email protected] Fax: 1-201-348-4505 Outside North and South America Springer Distribution Center, Customer Service Journals, Haberstr. 7, 69126 Heidelberg, Germany Email: [email protected] Fax: 49-6221-345-4229 Printed by Beijing Zhongke Printing Co., Ltd., Building 101, Songzhuang Industry Zone, Tongzhou District, Beijing 101118, China Edited by Editorial Board of SCIENCE CHINA Information Sciences, 16 Donghuangchenggen North Street, Beijing 100717, China Editor-in-Chief LI Wei

CN 11-5847/TP ᒯ੺㓿㩕䇨ਟ䇱: Ӝьᐕ୶ᒯᆇㅜ 0429 ਧ

䛞ਁԓਧ: 80-210 ഭ޵⇿ᵏᇊԧ: 138 ‫ݳ‬

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072101:1–072101:20 doi: 10.1007/s11432-013-4930-6

Global core, and galaxy structure of networks ZHANG Wei1,2 , PAN YiCheng1,4 , PENG Pan1,2 , LI JianKou1,2 , LI XueChen3 & LI AngSheng1 ∗ 1State

Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; of Information Science and Engineering, University of Chinese Academy of Sciences, Beijing 100190, China; 3Beijing No. 4 High School, Beijing 100034, China; 4State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100039, China

2School

Received May 23, 2013; accepted September 4, 2013

Abstract We propose a novel approach, namely local reduction of networks, to extract the global core (GC, for short) from a complex network. The algorithm is built based on the small community phenomenon of networks. The global cores found by our local reduction from some classical graphs and benchmarks convince us that the global core of a network is intuitively the supporting graph of the network, which is “similar to” the original graph, that the global core is small and essential to the global properties of the network, and that the global core, together with the small communities gives rise to a clear picture of the structure of the network, that is, the galaxy structure of networks. We implement the local reduction to extract the global cores for a series of real networks, and execute a number of experiments to analyze the roles of the global cores for various real networks. For each of the real networks, our experiments show that the found global core is small, that the global core is similar to the original network in the sense that it follows the power law degree distribution with power exponent close to that of the original network, that the global core is sensitive to errors for both cascading failure and physical attack models, in the sense that a small number of random errors in the global core may cause a major failure of the whole network, and that the global core is a good approximate solution to the r-radius center problem, leading to a galaxy structure of the network. Keywords

complex networks, local reduction, global core, small community phenomenon, social network

Citation Zhang W, Pan Y C, Peng P, et al. Global core, and galaxy structure of networks. Sci China Inf Sci, 2014, 57: 072101(20), doi: 10.1007/s11432-013-4930-6

1

Introduction

To understand the structures of networks is a grand challenge in the study of network science. Real networks are so large that they can not be analyzed efficiently. Some researches have been dedicated to extracting a core-like subgraph from a network, based on the intuition that a “core” should be small and preserve the most important nodes with some structure properties. Seidman [1] defined the notion of k-core of a graph G = (V, E) to be an induced subgraph H = GW for some W ⊂ V such that H is maximal satisfying the requirement that for every node v ∈ W , the degree ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:2

of v in H is at least k. This gives rise to a decomposition of k-cores of a graph by degrees, which can be done in time O(m), where m is the number of edges in G. Intuitively, the k-cores for large k are more influential in the graph than the set of the rest of nodes. Borgatti et al. [2] proposed a set of ideal images of core/periphery structures, and measures of the extent to which real networks approximate these measures, which are used as the basis for tests for a priori hypotheses to test core/pheriphery structures, leaving open a statistical test for the significance of the core/pheriphery structures found by their algorithms. Alvarez-Hamelin et al. [3] proposed a visualization algorithm based on k-core decomposition to uncover several topological and hierarchical properties of large scale networks in a two-dimensional layout. Goltsev et al. [4] proposed a theory of the k-core percolation on uncorrelated random networks with arbitrary degree distribution. Holme [5] proposed a coefficient to measure whether or not the network has a clearcut core-periphery dichotomy, and showed that geographically embedded transportation networks have a strong core-periphery structure. Intuitively speaking, a core/periphery network contains a core of densely connected high-degree nodes, which link small groups of strongly clustered, low-degree nodes at the fringes of the network. This phenomenon has been modeled and examined in some social and economic networks. For instance, Hojman et al. [6] proposed such a model, and Mislove et al. [7] examined some data from online social networks. A classical problem in algorithm is the metric k-center problem: Given a complete graph G = (V, E) with edge costs satisfying the triangle inequality, and positive integer k, for any set S ⊆ V , define connect(v, S) to be the cost of the cheapest edge from v to a vertex in S. The problem is how to find a set S ⊆ V with |S| = k so as to minimize maxv {connect(v, S)}. Hochbaum et al. [8], and Hsu et al. [9] studied the basic algorithms for this problem. The metric k-center problem is different from our intuitive notion of cores of networks, although it sheds some light on our understanding of global core of a network. John Hopcroft (private communication, 2009) asked the question of a mathematical definition of “similarity of networks”. This is a fundamental question playing a similar role to that of the isomorphism problem in graph theory. The problem is still a grand challenge. A subproblem of this is how to extract a small subgraph, H say, from a graph G such that H is “similar to” G, giving us intuitive understanding of the notion of similarity of networks. For a real complex network, acting as basic units, the small communities can be understood as the local structure of the network, and edges within a small community are taken as local edges. This gives rise to the question as to what is the global structure of a network. Can we understand the edges different from the local edges as global edges? For example, do the relations (edges) between communities represent the global edges of the network? We consider an example: the graph in Figure 1 consists of three communities and a triangle connecting the three communities. In this graph, we interpret the three communities as the local structures of the graph, and the triangle (colored red) connecting the three communities as the global structure of the graph. Observing Figure 1, we interpret the red triangle to be similar to the original graph G. The idea of our local reduction is simple. Let us look at the example in Figure 1. The local reduction works as follows: 1) find all communities, which are the three 6-vertex graphs, 2) delete all local edges, where an edge is called local, if the two endpoints of the edge are in the same community (after this step, the graph becomes a triangle together with 15 isolated nodes) and 3) take the largest connected component from the graph after deleting all local edges, giving the triangle connecting the three communities. This example gives us an intuition of how the local reduction proceeds. Can we understand most real networks by this way? That is, how to split a real network into local structures and global structures? How to extract a small subgraph, H say, from a graph G such that H is (intuitively) similar to G? In this paper, we try to answer these questions. Intuitively, the global structure of a network is a connected induced subgraph, H say, of a graph, G say, which preserves the global properties of the whole graph G. The problem is how to define the global structure of a network. Definition 1. (r-radius center, r-RC) Given a graph G = (V, E), a radius r, find a set S ⊂ V such that 1) The induced subgraph H = GS of S in G is connected.

Zhang W, et al.

2) than 3) In

Sci China Inf Sci

July 2014 Vol. 57 072101:3

For every node v ∈ V , there is a node u ∈ S such that the distance d(v, u) of v and u in G is less or equal to r. The size |S| of S is minimized. this case, we call H = GS an r-radius center of G, or simply, global core of G.

In Figure 1, the red triangle connecting the three communities is thus a one-radius center of the graph. Notice that the r-radius center problem is a generalization of the the famous dominating set problem, corresponding to the case of r = 1 and dropping the connectivity requirement. Therefore, our problem is hard. In addition, there is no easy way to extend the existing algorithms for the dominating set problem to solve it. Fortunately, finding the global core of a real network does not depend on algorithms for the r-radius center problem. We use the problem only to explain our results. In the next section, we propose a nearly linear algorithm to approximately extract the global cores of networks defined by Definition 1 above. The rest of this paper is organized as follows. In Section 2, we formulate our overall algorithm of the local reduction. In Section 3, we implement the algorithm on several collaboration networks to extract the corresponding the r-radius center or global core of the networks. In Section 4, we investigate the roles of the global cores, it is shown that global core is sensitive to errors for the whole network, for each of the networks. In Section 5, we investigate the galaxy structure of the networks. Finally, in Section 6, we conclude our research and discuss the potentials of our methods in understanding the structures and new applications of networks.

2

The algorithm

Our fundamental question is how to approximately extract the global core of a network. This seems an intractable problem at the first thought. We don’t know what is the global structure of a network; however we do know the local structures of the network, that is, the small communities of the network, which can be found approximately. This suggests us the idea of local reduction to extract the global structure by modulo the local structures of the network. For this, we need a searching algorithm of small (true) communities of a network. 2.1

Community finding algorithm

The basic ingredient of our local reduction is a community searching algorithm, which has been extensively studied in a number of papers, such as [10–18]. In particular, spectral algorithms are used to find clusterings [19]. Modularity-based methods have been very useful in recent research [20, 21]. Other works may first treat communities in some specific perspective and then utilize it to achieve their specific goals, e.g. Derenyi et al. [22] view communities as a chain of adjacent cliques and using this they can find overlapping and/or nesting communities. Testing the quality of a community has also been studied [23,24]. However, most of the community identifications of networks are based on traditional graph partitioning; as a result, the communities found in this way are usually large, due to the feature that traditional graph partitioning algorithms find balanced subgraphs, and are disjoint. This fails to reflect our observation that true communities are independent of algorithms, small, overlapping and embedding. Li and Peng in [25] proposed a mathematical definition of the notion of community, and the new notion of local dimension of a network: Definition 2. Given a graph G = (V, E) and α, β > 0, a connected set S ⊂ V with |S| = ω(1) is an ¯ e(S,S) (α, β)-community if Φ  α/|S|β . Moreover, if |S| = O((ln n)γ ), where n = |V | and Φ = min(vol(S),vol( ¯ S)) is the conductance of G, then we say that S is an (α, β, γ) − community. If for α, β, γ, a significant fraction of nodes of the network are contained in some (α, β, γ)-communities, then we say that the network satisfies the small community phenomenon, in which case, the triple (α, β, γ) is called local dimension of G.

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:4

The triple (α, β, γ) of the local dimension is essential to understanding the local structures and information of a network. Intuitively a set of nodes is a good community if the induced subgraph of the set is connected, the links within the set are strong, and the links from the set to nodes outside it are relatively weak. The parameters α and β together exactly measure the quality of a community in the sense that the smaller is the α, the better is the community, and that the larger is the β, the better is the community. There is no way to measure the quality of a community by only one parameter. γ exactly captures the nature of true communities for most real networks. Using this definition of communities, we have proved mathematically that for each of the classical models of networks, networks either satisfy a nice small community phenomenon, or satisfy the expanding property in the sense that their conductance is greater than some constants. This convinces us that the definition is correct. However in real network data, we have to approximate the local dimension (α, β, γ) which are different from various real networks based on which we are able to extract the maximal amount of useful local information of the networks. We remark that it is an interesting open question to prove or disprove the small community phenomenon for networks from classical models by using other definitions of community. Based on this theory, We proposed a new searching algorithm, the (α, β, γ)-searching, which shows remarkable advantages in finding true communities, and in predicting in networks. 2.2

Local reduction

We use the (α, β, γ)-searching as the basic module of our local reduction. The local reduction proceeds as follows: For a given network G, take its largest connected component as G1 . Suppose that Gi is defined. Then: (i) try to find all small communities in Gi , (ii) remove all the local edges, the edges within the small communities, and (iii) define Gi+1 to be the largest connected component of the resulted graph. If Gl fails to allow the local reduction, in the sense that the resulting graph of Gl after deleting the local edges becomes a set of isolated nodes, then we define Gl to be the global core of G, denoted by GC(G) = Gl . Our local reduction proceeds as follows: • Local reduction. (From Gi to Gi+1 ): 1) Input: network Gi = (V, E). 2) Find all communities of Gi and compute the local dimension (α, β, γ) of Gi , by using the (α, β, γ)searching. We say that an edge e = (u, v) of Gi is a local edge, if both u and v belong to the same community C found in step (2) above. 3) Define Gi to be the graph obtained from Gi by deleting all local edges (edges inside communities). 4) Define Gi+1 to be the largest connected component of Gi . Recursively perform the local reduction until a graph Gl is found such that Gl fails to allow the local reduction. Suppose that {G1 , G2 , . . . , Gl } are the graphs obtained from the local reductions from G1 . We say that the final graph Gl is the global core of G1 . 2.3

Examples

We remark that Figure 1 is the result of our local reduction, in which the triangle connecting the communities is the global core of the graph found by one round of our local reduction. In this subsection, we implement the local reduction prescribed in the last subsection on some more representative graphs and benchmark data. We consider the following graphs: 1) A tree model: a binary tree with luxuriant leaves. 2) The planted l-partition (P lP ) model [26]: the model partitions a graph with n = g · l nodes into l groups with g nodes respectively. Nodes of the same group are linked with a probability pin , whereas nodes of different groups are linked with probability pout . In this case, to obtain a ideal community structure, we set n = 1000, l = 10, g = 100, pin = 0.5, and pout = 0.01. 3) The benchmark of football team graph: a well-known benchmark built by Grivan et al. [14]. 115 nodes of the network represent American college football teams and an edge between two of them represents that the corresponding two teams have played against each other.

Zhang W, et al.

Sci China Inf Sci

Figure 1

A small graph with triangle global structure.

Figure 3

The P lP model.

July 2014 Vol. 57 072101:5

Figure 2

Figure 4 The benchmark of football team graph.

The tree model.

Figure 5

The P lP model (reshaped).

In all examples above, G1 , the largest connected component, is the same as the original graph that allows only one step of the local reduction, from which G2 is defined. Figures 2–4 show the result of the local reductions respectively. In these figures, nodes in G2 are colored red, and nodes outside G2 are colored blue. In Figure 2, we notice that G2 consists of exactly the nodes in the trunk and main branches of the tree, and that the small communities consist of exactly the leaves and small branches of the graph. This captures exactly our intuition that leaves and small branches are the local structures of the tree and that trunk and main branches consist of the global structure of the tree. We notice that parameters of our local reductions can be chosen such that the splitting of local and global structures of a tree are different. For example: (i) the local structures consist of just the leaves, and the global structure is the tree obtained from the given tree by deleting only the leaves, and (ii) the global structure consists of the root node, and all others form the local structures. This means that the notions of local and global are relative, rather than absolute, which can be adjusted to satisfy the needs in practice of our applications. Our local reduction allows us to choose parameters to extract appropriate local and global structures of a network. Our intuition is that the global core of a network is more important to the whole network than the rest of the graph. Based on this understanding, one may conjecture that the nodes of high degrees are more important so that they should form the global core of the network. However in the tree of this example, nodes of larger degrees are at the second lowest level of the tree, that is, the nodes link to the leaves. Therefore, the set of nodes of larger degrees are actually in the small communities, and are not contained in the global core extracting from our local reduction. This implies that the intuitively important nodes of large degrees fail to capture the important global structure of the graph. This means that our local reduction does extract some non-trivial global structure of networks. In the graph of Figure 3, the small communities found by our algorithm are exactly the small graphs we planted, and the global structure of the graph consists of nodes connecting different communities. To see this clearly, we rearrange the graph in Figure 5, from which we know that the global core G2 is in the center, that the network has a flower-like shape, and that the small communities are formed as

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:6

petals surrounding the stamen. This captures again our intuition that the center part, i.e., G2 , acts as the global core of the graph G1 . For the benchmark data of football teams, we remark that the community finding algorithm finds most true communities. The global core of the graph is the part of nodes colored red in Figure 4, from which, we know that the global core consists of nodes connecting different communities of the network. The examples in this section convince us that our local reduction can be used to understand the structures of networks, and to extract both the local and the global structures of a real network. By observing Figures 1, 2, 3 and 5, we have that the global core of each of the graphs is intuitively similar to the original graphs respectively. This experiment shows that it is approachable to extract a subgraph which is much small than and similar to the original graph.

3

Global core of real networks

In this section, we extract global core of some real world networks by performing the local reductions prescribed in the last section. We use some collaboration networks1) , each of which is built from the e-print arXiv on scientific collaborations among authors of the submitted papers. If author i co-authors a paper with author j, we add an undirected edge from i to j. A paper co-authored by k authors implies a clique on k nodes. The data includes the papers from January 1993 to April 2003 (124 months). It begins within a few months after the inception of the arXiv, and thus represents basically the complete collaboration history. 1) High Energy Physics-Phenomenology (HEP-PH). The network, denoted by HEP-PH, has number of nodes 12008, number of edges 237010, number of nodes in the largest connected component (CC, for short) 11204 (0.933), number of edges in the largest CC 235268 (0.993), average clustering coefficient 0.6115, number of triangles 3358499, and diameter 13. We denote by G1 the largest connected component of the network HEP-PH, then apply the local reductions to it. After 6 rounds, we obtain network sequence {G1 , G2 , . . . , G7 }, which are described in Table 1. The local dimensions corresponding to the reductions above are given in Table 5, where N means the number of nodes and fraction represents percent of nodes in communities. This table describes the numbers of connected components after deleting the local edges in each round of the local reduction. The table shows in each round, the graph becomes a giant connected component, and there are a great number of isolated nodes, and a few trivial connected components. In each block of the table, we use Scc to denote the size of the connected component, and Ncc to denote the number of connected components of the corresponding size. The local reduction stops at the round at which the graph fails to allow one more round of the local reduction, in the sense that if we do one more round of the local reduction, then after deleting the local edges, the graph breaks isolated nodes or a few trivial connected component. In Table 1, recall that Gi is the graph obtained from Gi by deleting all the local edges, which consists of a giant connected component, a large number of isolated nodes, and a few trivial connected components. For instance, the first block of Table 1 shows that, after deletion of local edges from G1 , the network becomes a connected component of 7310 nodes (the nodes of G1 ), 3891 isolated nodes, and 1 component of 3 nodes. It is a surprising discovery that in each round of our local reduction in all our current experiments, the graph, obtained after deleting its local edges, becomes a giant connected component, and a large number of isolated nodes, except for a few trivial connected components. This means that each round of the local reduction removes indeed only local structures of the network, so that the global structure of the network is still kept in the remaining giant connected component. This suggests the following role for our local reductions: Principle of local reduction: We say that the local reduction is valid, if the graph obtained by deleting all local edges consists of a giant connected component, and a large number of isolated nodes, with few trivial connected components. (We remark that this property needs some theoretical analysis, which will be our future project.) 1) All the data in this paper can be found from the websites: http://snap.standford.edu and http://www-personal.umich. edu/∼mejn/netdata.

Zhang W, et al.

Table 1 G1 (11204) G2 (7310) G3 (4668) G4 (3008) G5 (2114) G6 (572)

Sci China Inf Sci

Local reductions of HEP-PH

Table 2

Scc

7310

3

1

Ncc

1

1

3891

Scc

4668

1

Ncc

1

2462

Scc

3008

1

Ncc

1

1660

Scc

2114

1

Ncc

1

894

Scc

572

1

Ncc

1

1542

Scc

339

1

Ncc

1

133

July 2014 Vol. 57 072101:7

G1 (21363) G2 (14289) G3 (8918) G4 (6102) G5 (4340) G6 (2618) G7 (1093)

Table 3 G1 (17903) G2 (13006) G3 (10513) G4 (8055) G5 (6138) G6 (4412) G7 (3089) G8 (985)

Table 4 G1 (8638) G2 (5056) G3 (2936) G4 (1529)

Local reductions of COND-MAT Scc

14289

20

12

1

Ncc

1

1

1

7042

Scc

8918

2

1

Ncc

1

1

5359

Scc

6102

1

Ncc

1

2816

Scc

4340

1

Ncc

1

1762

Scc

2618

1

Ncc

1

1722

Scc

1093

1

Ncc

1

1525

Scc

463

1

Ncc

1

630

Local reductions of ASTRO-PH (C: component size; N: number of components) Scc

13006

2

1

Ncc

1

2

4893

Scc

10513

1

Ncc

1

2493

Scc

8055

1

Ncc

1

2458

Scc

6138

1

Ncc

1

1917

Scc

4412

1

Ncc

1

1726

Scc

3089

2

1

Ncc

1

1

1321

Scc

985

2

1

Ncc

1

1

2102

Scc

562

1

Ncc

1

423

Local reductions of HEP-TH (C: component size; N: number of components) Scc

5056

2

1

Ncc

1

3

3576

Scc

2936

2

1

Ncc

1

2

2116

Scc

1529

2

1

Ncc

1

2

1403

Scc

459

3

1

Ncc

1

1

1067

By observing Table 1, we know that the topologically global structure of the network is preserved in G7 . The last two graphs G6 and G7 are already small and are drawn in Figure 6 with vertices in G7 have the same positions as they are in G6 .

Zhang W, et al.

Table 5

Sci China Inf Sci

Local dimensions of HEP-PH

July 2014 Vol. 57 072101:8

Table 6

Local dimensions of COND-MAT network

N

α

β

γ

Fraction (%)

N

α

β

γ

Fraction (%)

G1

11204

0.204

0.05

2.47

54.3

G1

21363

0.3

0.05

2.38

52.8

G2

7310

0.37

0.05

2.4

50.3

G2

14289

0.33

0.05

2.5

50.9

G3

4668

0.504

0.05

2.72

64.6

G3

8918

0.4

0.05

2.54

58.5

G4

3008

0.566

0.05

2.67

55.4

G4

6102

0.23

0.05

3.51

55.7

G5

2114

0.89

0.05

2.63

70

G5

4340

0.16

0.05

3.94

50.9

G6

572

0.92

0.05

3.44

52.3

G6

2618

0.2

0.05

3.68

59.9

G7

339

G7

1093

0.3

0.05

3.72

89.9

G8

463

Table 7

Local dimensions of ASTRO-PH

N

α

β

γ

Fraction (%)

G1

17903

0.32

0.04

2.45

52.7

G2

13006

0.53

0.05

4.21

54.1

G3

10513

0.61

0.05

4.16

66.5

G4

8055

0.64

0.05

4.09

63.6

G5

6138

0.67

0.05

2.57

60.4

G6

4412

0.69

0.05

2.06

54.7

G7

3089

0.69

0.05

2.61

89.2

G8

985

0.36

0.05

2.94

50.6

G9

562

Table 8

Local dimension of HEP-TH

N

α

β

γ

G1

8638

0.18

0.05

2.3

50.3

G2

5056

0.25

0.05

2.72

54.1

G3

2936

0.32

0.05

2.69

53.2

G4

1529

0.35

0.05

2.55

50.6

G5

459

Fraction (%)

2) Condense Matter (COND-MAT). This network has number of nodes 23133, number of edges 186936, number of nodes in the largest CC 21363 (0.923), number of edges in the largest connected component (CC, for short) 182628 (0.977), average clustering coefficient 0.6334, number of triangles 173361, and diameter 15. The location reduction and its corresponding local dimensions are shown in Table 2 and Table 6 respectively. Let G1 be the largest connected component of COND-MAT. We start the local reduction from G1 . After 7 rounds, we obtain the reduction sequence {G1 , G2 , . . . , G8 }, by using local dimensions given in Table 6. We notice that in each round of the local reduction, we delete only local edges, and isolated vertices, together with a few trivial connected components. The last two graphs G7 and G8 are described in Figure 7. 3) Astro Physics (ARSTO-PH). The network has number of nodes 18772, number of edges 396160, number of nodes in the largest CC 17903 (0.954), number of edges in the largest CC 394003 (0.995), average clustering coefficient 0.6306, number of triangles 1351441, and diameter 14. We describe the local reduction of the network as Table 3. The local dimensions of the reduction are shown in Table 7. We also choose the largest component of ASTRO-PH network as G1 . After 8 rounds, as shown in Table 7, we obtain network sequences {G1 , G2 , . . . , G9 }. G8 and G9 are shown in Figure 8. 4) High Energy Physics-Theory (HEP-TH). The network has number of nodes 9877, number of edges 51971, number of nodes of the largest CC 8638 (0.875), number of edges of the largest CC 49633 (0.955), average clustering coefficient 0.4714, number of triangles 28339, and diameter 17. The local reduction is given in Table 4. We choose the largest component of HEP-TH network as G1 , and then apply local reduction on it and obtain network sequence {G1 , G2 , . . . , G5 }. G4 and G5 are shown in Figure 9, and the local dimensions of the reduction are as Table 8. Summary. Let G be one of the collaboration networks above, and G1 be the largest connected compo-

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:9

Figure 6 Global core of HEP-PH (G6 and G7 ). By def- Figure 7 inition, G7 is the global core of the network. Since G6 is already small, we draw the pictures of both G6 and G7 for us to understand better the the role of the local reduction.

Network structure of COND-MAT (G7 and G8 ).

Network structure of ASTRO-PH (G8 and G9 ).

Network structure of HEP-TH (G4 and G5 ).

Figure 8

Figure 9

nent of G. Suppose that {G1 , G2 , . . . , Gl } are the graphs obtained from the local reductions above. By observing the reductions, we have the following properties: (1) For each i, the deletion of the local edges of Gi gives rise to a giant connected component, i.e., Gi+1 , and a large number of isolated nodes together with a few (less than 5) trivial connected components of size 2, or 3 or less than 20 in one or two special cases. This ensures that the local reduction simplifies a graph by deleting essentially the local structures of the graph, so that the global structure of Gi is kept in Gi+1 . (2) Gl preserves essentially the global properties of G1 , and G. (3) in any case, Gl is small. We say that Gl is the global core of G (GC, for short).

4

Analysis of the global cores

In the last section, we find a global core for each of the 4 collaboration networks by our local reduction. The global core of a network can be regarded as a syntax global core of the network, since it is computed solely from the topology of the network. Our motivation of finding the global core is to show that global properties of the network are primarily determined by the global core. Our local reduction is designed to ensure this. In this section, we validate that for each of the real networks, the global core determines the global properties of the corresponding network. We show that for each of the collaboration network G, and for G1 to be the largest connected component of G, the following properties hold: 1. Gi+1 is intuitively similar to Gi . 2. Gi+1 follows a power law degree distribution with power exponent the same as that of Gi . For the graph Gi , obtained at the end of round i of the local reduction, we define the transition point of Gi to be the least size l such that with probability 0.9, a randomly and uniformly chosen set of nodes in Gi of size l will cause a giant cascading in G. 3. The transition point of Gi+1 is significantly smaller than that of Gi ’s. 4. Gi+1 has greater average betweenness than that of Gi ’s. For Gi , we define the breaking point of Gi to be the least number b such that with probability 0.9, for a randomly and uniformly chosen set of nodes in Gi of size l, the graph obtained from G after deleting the nodes in the set, has no giant connected component. 5. The breaking point of the global core is much smaller than that of the original network.

Zhang W, et al.

4.1

Sci China Inf Sci

July 2014 Vol. 57 072101:10

Similarity of networks

The examples in Section 2 show that the global core is intuitively similar to the original graph in each of the graphs. The local reductions in section 3 show that Gi+1 is a connected subgraph obtained from Gi by deleting the local edges and the resulted isolated nodes and few trivial connected components. This shows that Gi+1 is intuitively similar to Gi . However since we do not have the mathematical definition of similarity of networks, we consider the topologies of Gi+1 and Gi by examining the degree distributions of the corresponding graphs respectively. We depict the degree distributions of each sequence of graphs for the networks in Figures 10–13. From the figures, we know that the power-law degree distributions are preserved by the local reduction, and that the slopes of all lines corresponding to different Gi ’s are very close. This means that Gi+1 preserves the degree distribution of Gi , for all possible i’s. 4.2

Global cascading failure

In [27], Domingos and Richardson proposed the influence maximization problem, the goal of which is to find a set S of k nodes(individuals), that by first convincing them to adopt a new product, the size of the finally generated set T is maximized. More precisely, for a given set S, the procedure to generate T is as follows: Definition 3. Let G = (V, E) be a graph, φ be a constant in (0, 1), and S be a subset of V . We define a set T ⊆ V recursively as follows: i) Set T ← S. ii) We say that a node v ∈ V is active if v ∈ T . iii) For every v ∈ V \ T , if there are at least φ × dv neighbors of v which are active, then v becomes active, and enters T , where dv is the degree of v in G. iv) Recursively run step (iii) until no new node can be added to T . Let T be the final set defined as above. We say that T is the φ-cascading of S in G, denoted by cG φ (S) = T . Such a definition is similar to the one given by Kempe et al. [28] and the final size of T indicates the diffusion power or influence of nodes in S. In this subsection, we show that nodes in Gi+1 have more influence than those in Gi , and in particular, the global core Gl , although much smaller than G1 , is the most influential set among all Gi ’s. To verify that, for every Gi = {Vi , Ei } and some given φ, we implement the following cascading simulation. 1. Fix a set size |S|. 2. Randomly pick a node set S of size |S| in Vi and compute T = cφ (S) in G1 (Recall Definition 3. The key point is that, we select S in Gi , but calculate T in G1 , since our goal is to verify the node importance of Gi in G1 .) 3. Repeat step 2 for sufficiently many times, 100 times, say, and calculate the average of |T |. The average |T | can be considered as the expected size of T . The relation between |S| and E(|T |) for HEP-PH is shown in Figure 14. In Figure 14, φ is fixed as 0.3. We find several important properties which are consistent with our intuition. 1. There are some sudden phase transitions in each cascading simulation, which indicates some thresholds on the initial size of S determining the influence scope of the cascading procedure. 2. As the local reduction continues, there is an obvious “left approaching” pattern: the required size of initial set leading to a giant cascading becomes smaller and smaller, which means that the randomly picked nodes from later networks generate a giant cascading more easily than those randomly picked in earlier ones. For example, in G7 , the global core of HEP-PH, the required size of initial set to generate a giant cascading is only about 100, which is much smaller than that in G1 , more than 600. From Figure 14, we observe that G7 has more power in network cascading in the HEP-PH network.

Zhang W, et al.

Table 9

Sci China Inf Sci

July 2014 Vol. 57 072101:11

Average Betweenness of Collaboration Networks

HEP-PH

COND-MAT

ASTRO-PH

HEP-TH

AB(G1 )

20572.53

46485.35

28589.6

18677.87

AB(G2 )

29062.89

65567.39

37151.99

31296.04

AB(G3 )

39226.95

92895.36

43687.17

44249.69

AB(G4 )

51440.43

119946.3

53787.06

57949.63

AB(G5 )

59027.23

148407.5

66178.13

89801.27

AB(G6 )

93802.26

185119.4

82512.36

AB(G7 )

126327.6

345634.1

103575.8

AB(G8 )

171724.7

561301.1

193743

AB(G9 )

217089.8

Moreover, for every cascading threshold φ, we define a phase transition point as follows. For each graph Gi , simulate the cascading procedure for 100 times. The phase transition point is determined by binary search to be the minimum size of S which induces giant cascading (more than half of nodes in G1 become active) for more than 90 times. We depict the relation between φ and the corresponding transition point of HEP-PH in Figure 15. From Figure 15, we discover several important properties: 1. There is a “downward approaching” pattern corresponding to the “left approaching” one discovered in Figure 14. It means that the “left approaching” figure in the S − T curves for φ = 0.3 fits to every φ between 0.2 and 0.4. 2. The curve of G7 disappears at φ = 0.38, which means that if φ = 0.38, even picking all the nodes in G7 as the initial set in G1 cannot induce a φ-giant cascading. So we consider G6 as the global core in this case. 3. The gaps between the curves are amplified as φ increases. At φ = 0.38, the transition point of G1 is about 1700, and it is about 200 in G7 . The curve for G7 is almost flat. This implies that the robustness of nodes in later graphs is much stronger than those in the earlier ones. In other words, φ has less influence in picking initial set in Gi+1 than that in Gi . The S − T curves and φ-transition point curves of other three networks are shown in Figures 16–21, respectively. We notice that the features stated for HEP-PH all hold for the other networks. Experiments in this part show that the global core is sensitive to errors in the sense that a small number of random errors in the global core may probably cause a global cascading failure of the whole network. This means that finding and preserving the global core are essential to network security resisting virus spreading.

4.3

Betweenness centrality

Freeman introduced Betwenness Centrality in [29] to identify important nodes. Given a graph G = (V, E), and a node v ∈ V , the betweenness centrality of v is the sum of the fraction of all-pairs of shortest paths  that pass through v: BC(v) = s=v=t∈V σ(s, t|v)/σ(s, t) where σ(s, t) is the number of shortest (s, t)paths, and σ(s, t|v) is the number of those paths passing through node v. In this subsection, we consider the Average Betweenness (AB, for short) of nodes in each Gi . We report the average betweenness of all Gi ’s in Table 9, from which we know that the average betweenness of Gi , denoted by AB(Gi ), increases as i grows. This implies that nodes in Gi+1 have more controlling force than that of Gi on the average.

Zhang W, et al.

July 2014 Vol. 57 072101:12

Degree distributions of HEP-PH

104

104

G1 (11204) G2 (7310) G3 (4668) G4 (3008) G5 (2114) G6 (572) G7 (339) G8 (200)

122

101

100 0 10

101

102

Degree distributions of COND-MAT

122

101

100 0 10

103

Degree distributions of HEP-PH.

Degree distributions of ASTRO-PH

104

G1 (17903) G2 (13006) G3 (10513) G4 (8055) G5 (6138) G6 (4412) G7 (3089) G8 (985) G9 (562)

103 Number of nodes

Figure 11

122

101

100 0 10

101

4.4

103

102

Degree distributions of COND-MAT.

Degree distributions of HEP-TH G1 (8638) G2 (5056) G3 (2936) G4 (1529) G5 (459)

103

122

101

100 0 10

Degree

Figure 12

102

103 Number of nodes

104

101 Degree

Degree

Figure 10

G1 (21363) G2 (714289) G3 (8918) G4 (6102) G5 (4340) G6 (2618) G7 (1093) G8 (463)

103 Number of nodes

103 Number of nodes

Sci China Inf Sci

Degree distributions of ASTRO-PH.

101

102

Degree

Figure 13

Degree distributions of HEP-TH.

Global failure by physical attack

Previous works on the robustness and security of networks focused on the physical attack by removing a small fraction of nodes to destroy the connectivity such as the small world property or giant connected component. Albert [30] suggested that scale-free networks are resistant to random failures, and that such networks are vulnerable to deliberate attacks on the top degree nodes. These intuitive ideas have been confirmed numerically [30, 31] and analytically [32, 33]. In these studies, the deliberate attack is to attack the top degree nodes of size as large as a constant fraction of nodes of the whole network. All the works confirm both robustness and fragility of the scale-free networks. In this part, we examine the global failure of the physical attacks of the global cores of networks. We discover here that structures of networks are key to both robustness and security of networks, and that simple properties like the power law are insufficient to characterize the criterions for robustness and security of networks. Suppose that G is a network, and C is the global core of G. We will examine how the average distances, and largest connected components, respond to errors and attacks of nodes of the global cores, and compare the results with the errors and attacks of the same number of nodes of the whole graph. Figures 22 and 23 describe the change of size of the largest connected component responding to both random errors and attacks of nodes of the top degrees in both the original network and the global of the networks HEP-PH and the HEP-TH. From Figures 22 and 23, we observe that deleting even the whole global core fails to break the whole

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:13

S-T curve of HEP-PH (φ = 0.3)

12000

φ-transition point curve of HEP-PH

1800

G1 (11204) G2 (7310) G3 (4668) G4 (3008) G5 (2114) G6 (572) G7 (339)

1600 10000

1400

6000

G1 (11204) G2 (7310) G3 (4668) G4 (3008) G5 (2114) G6 (572) G7 (339) G8 (200)

4000 2000 0

0

100

200

300

400

500

600

1200

Transition point

|T|

8000

1000 800 600 400 200 0 0.10

700

0.15

0.20

Figure 14

The S-T curves of HEP-PH network.

Figure 15 work.

S-T curve of COND-MAT (φ = 0.3)

20000

3000 2500

G1 (21363) G2 (714289) G3 (8918) G4 (6102) G5 (4340) G6 (2618) G7 (1093) G8 (463)

5000

200

400

600

800

1000

Transition point

|T|

10000

0

2000 1500

φ-transition point curve of COND-MAT

1000

0 0.10

1200

The cascading curves of COND-MAT.

Figure 17

3500

16000

3000

|T|

12000 10000

G1 (17903) G2 (13006) G3 (10513) G4 (8055) G5 (6138) G6 (4412) G7 (3089) G8 (985) G9 (562)

8000 6000 4000 2000 0

400

800

1200

1600

Transition point

14000

0

2500 2000 1500

0.15

0.20

The cascading curves of ASTRO-PH.

0.25 φ

0.30

0.35

0.40

The φ-transition curves of COND-MAT.

φ-transition point curve of ASTRO-PH G1 (17903) G2 (13006) G3 (10513) G4 (8055) G5 (6138) G6 (4412) G7 (3089) G8 (985) G9 (562)

1000 500 0 0.10

|S| Figure 18

0.40

500

S-T curve of ASTRO-PH (φ = 0.3)

18000

0.35

G1 (21363) G2 (714289) G3 (8918) G4 (6102) G5 (4340) G6 (2618) G7 (1093) G8 (463)

|S| Figure 16

0.30

The φ-transition point curves of HEP-PH net-

15000

0

0.25 φ

|S|

Figure 19

0.15

0.20

0.25 φ

0.30

0.35

0.40

The φ-transition curves of ASTRO-PH.

network into pieces, that deleting the same number of nodes of the top degrees in the original network fails to break the whole network into pieces, that both the random attack and top degree attack in the global core show the same effect in reducing the size of the largest connected component, and that top degree attack in the original network has better performance than that attack of the global core.

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:14

S-T curve of HEP-TH (φ = 0.3)

8000 7000

4000 3000 G1 (8638) G2 (5056) G3 (2936) G4 (1529) G5 (459)

2000 1000 50

0

100

150

200

250

300

1000

Transition point

5000 |T|

G1 (8638) G2 (5056) G3 (2936) G4 (1529) G5 (459)

1200

6000

0

φ-transition point curve of HEP-TH

1400

800 600 400 200 0 0.10

350

0.15

|S| Figure 20

The cascading curves of HEP-TH.

Figure 21

Attack of HEP-PH

11250

11050 11000

Initial network (random) Core (random) Initial network (top degree) Core (top degree)

10950 10900 0

20

0.35

0.40

The φ-transition curves of HEP-TH.

8400 8200 8000 7800

7400

Attack of HEP-PH

4.75 4.70

50

100 150 200 250 300 350 400 450 Number of removed nodes

Attack of HEP-TH (largest component size).

Attack of HEP-TH

8.0

Initial network (random) Core (random) Initial network (top degree) Core (top degree)

7.5 Average distance

4.80

0

Figure 23

Initial network (random) Core (random) Initial network (top degree) Core (top degree)

4.85

Initial network (random) Core (random) Initial network (top degree) Core (top degree)

7600

60 80 100 120 140 160 180 Number of removed nodes

Attack of HEP-PH (largest component size).

4.90

Average distance

40

Largest component size

Largest component size

11100

4.65

0.30

8600

11150

Figure 22

0.25 φ

Attack of HEP-TH

8800

11200

0.20

7.0 6.5 6.0

0

20

Figure 24

40

60 80 100 120 140 160 180 Number of removed nodes

Attack of HEP-PH (average distance).

5.5

0

50

Figure 25

100 150 200 250 300 350 400 450 Number of removed nodes Attack of HEP-TH (average distance).

Figures 24 and 25 show responses of the average distance corresponding to the attacks and random errors of the networks HEP-PH and HEP-TH. By observing Figures 24 and 25, we see that the performance of random errors in the global core is the same as the top degree attack in the global core, meaning that the global core is sensitive to random errors. Summary. From Subsections 4.2 and 4.4, we have the following conclusions:

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:15

1. The global core of a network is sensitive to random errors in both virus spreading and physical malfunction, in the sense that a small number of random errors in the global core may cause a major failure of the whole network. This suggests that to guarantee the robustness and security of a network, it is essential to protect the global core of the network. 2. Global collapse of physical attack of removing nodes is harder than the global cascading failure, in the sense that a small number of initial nodes in the global core may cause a giant cascading of the whole network, however it is hard to break the network into small pieces even if we delete all nodes in the global core. This is an interesting discovery that physical attack (regarded as hard attack) is harder than the cascading failure model (regarded as soft attack). 3. In some cases, the attack of the top degree nodes of the original network has better performance in reducing the largest connected component, in the physical attack model. Theoretically, this probably is true; however in practice, this is unlikely to happen. The reason is that any attack has costs, and must take consequences. In practice, our strategy usually is to attack the vulnerable nodes or edges, instead of the strongest ones. This suggests a new challenge to measure the costs, benefits and consequences of network attack and to propose strategies to achieve the optimal efficiency of attacks. 4. In real application, a network is evolving, perhaps, the best strategy to guarantee the robustness and security of a network would be to attack the rules that generate the network. 5. The experiments suggest that to attack networks that are evolving, the first strategy is to change the rules of the generation of networks, the second is to attack the networks by the cascading failure model, and the third is to physically attack by deleting some nodes or edges. This result exactly validates the conclusions of the Offensive Strategy of the Art of War ( ): The highest strategy is to use your non-military forces such as diplomatic and economic forces etc to change the military policies and rules of the enemy’s country, the second is to use your non-military forces to undermine the enemy’s fighting spirit and to disintegrate the ability of formation of the enemy’s military force, the third is to send army to fight the enemy in the field, and the worst is to attack and destroy the enemy’s city or country ( ). It may be useful for managing and controlling other complex systems in economics and social sciences etc. 6. The experiments in this section suggest that it is necessary to take special measures to protect the global core of a network to ensure the robustness and security of the network.

ê

Ü  Ü  Ü

5

ß



Galaxy structure of networks

We have shown that networks are rich in small communities, for which the reason is homophyly, stated as a Chinese saying that people sharing the same interests come together, and materials are grouped by categories, and that real networks have global cores, the key structures determining the global properties of the networks. What properties, characteristics, and roles may the remaining nodes of a network have? What global structures may the networks have? In many applications, the demands of a new theory of complexity are clear: we need to be able to predict how the Internet responds to attacks and traffic jams or how the cell reacts to changes in its environment, or how the globalized economy responds to the current financial crisis, etc. In fact, a global understanding of networks is essential to understanding even just the local structures and information of networks. This had been realized by ancient Chinese, that one failure to plan globally can not even plan ). To make progress in this direction, we need to tackle the next a local area ( frontier, to understand the dynamics of various networks. For this, a first step would be to understand the global structures of networks, which calls for a new direction of global theory of networks. In this section, we propose the notion of galaxy structure of networks, and implement experiments by using our global cores and small communities of the real networks. This experiment validates the galaxy structures of real networks, predicting the birth of global theory of networks.





Zhang W, et al.

Table 10

Sci China Inf Sci

July 2014 Vol. 57 072101:16

Radius of Local Reductions of Collaboration Networks

HEP-PH

COND-MAT

ASTRO-PH

HEP-TH

G2

5

5

6

7

G3

6

6

7

8

G4

6

6

7

9

G5

7

6

7

9

G6

7

6

7

G7

7

7

7

G8

8

7

7

G9

5.1

7

r-radius center

In graph theory, the k-center problem is to find a set S of k nodes with which radius(S) is minimized. The Connected k-Center (CkC, for short) problem requires that the induced subgraph of the set S of k nodes is connected. Our problem is related to, but different from the connected k-center problem. Given G = (V, E) and a set S ⊂ V , we define the radius of S in G to be the longest distance between v and S among all v ∈ S, that is radius(S) = max min d(v, u), v ∈S / u∈S

where d(v, s) is distance between v and u. The r-radius center problem is to find the set S such that (1) The induced subgraph GS is connected, (2) radius(S)  r, and (3) the size |S| of S is minimized. For each of the collaboration networks, G say, we compute the radius of Gi in G for all possible i. We report the results in Table 10. From Table 10, we know that for each of the real networks, G say, in each round of the local reduction, from Gi to Gi+1 say, we reduce the size of the bounded radius center significantly, and at the same time, keep the radiuses stable. Therefore, the local reduction finds a small r-radius center for each network G, for a small constant r. This is very useful for us to understand the structure of the large networks. Let G be one of the real networks. Let C be the global core of G, and r be the radius of C in G. From this, we know that: 1) The induced subgraph GC of C in G is small, and connected. 2) A large fraction of nodes of G are contained in some small communities of G. 3) For each node v which is not in any small communities of G, we know that either v is in the global core C, or v links to some node u in C within r steps. These properties give us an intuitive picture of the network G. 5.2

Galaxy structure

Let G = (V, E) be a graph, and S, T ⊂ V . We define the distance d(S, T ) to be the minimal of d(s, t) for all s ∈ S and all t ∈ T . To better understand the structure of a real world network, for each small community found in the first round of the local reduction, we compute the minimum distance between the small community and the global core found by the local reduction. Table 11, summarizes the number of communities that have representatives connecting the global core by k steps for k = 0, 1, 2, 3, . . . , 6, from which we know that most of the small communities have distances to the global core within 2. This means that for most of the small communities, there is a node and a path of length  2 linking to a node in the global core. From Table 10, Table 12 and Table 11, we know that, for each of the collaboration networks, G say, the following properties hold: 1. (Global core) G has a small global core, written as GC. 2. There is a significant fraction of nodes of G which are contained in some small communities of G.

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:17

Table 11 Community distances of collaboration networks, where D means distance between community and GC, and NoC means number of communities of distance D D

0

1

2

3

4

5

6

NoC (HEP-PH)

3

793

1026

418

173

13

4

NoC (COND-MAT)

307

777

41

NoC (ASTRO-PH)

300

2257

1693

281

63

21

5

NoC (HEP-TH)

232

827

870

397

129

8

28

Table 12 Nodes (outside of communities) distances of collaboration networks, where D means distance between a node and GC, and NoN means number of nodes of distance D D

1

2

3

4

NoN (HEP-PH)

1940

2748

1571

204

14

NoN (COND-MAT)

4801

7460

2463

510

62

NoN (ASTRO-PH)

7562

3821

342

9

NoN (HEP-TH)

1879

2350

962

158

Figure 26

Galaxy structure of HEP-PH.

Figure 28

Galaxy structure of ASTRO-PH.

Figure 27

5

6

7

8

3

9

Galaxy structure of COND-MAT.

Figure 29

Galaxy structure of HEP-TH.

3. (Small community) For most of the small communities, C say, C links to the global core GC within 2 steps. 4. (Near nodes) There are nodes not in the GC, and not in any of the small communities, most of which have paths to the GC within 3 steps. We use NN to denote the set of these nodes. 5. (Distant nodes) We call a node distant node of G, if it is not in the global core, nor in any small community of G, nor the set NN of near nodes. A distant node, u say, links to the global core GC within 9 steps. The classification of vertices in the last subsection allows us to understand the Galaxy Structure of the four networks, which are depicted in Figures 26–29 respectively.

Zhang W, et al.

5.3

Sci China Inf Sci

July 2014 Vol. 57 072101:18

Applications of the galaxy structures

In many cases, we can develop new applications of networks only if we understand the global structures of the networks. An immediate application of the galaxy structure is perhaps to recommend a small set of important nodes from every node of the large network, very similar to taking an X-ary in medical examination. Given a network G = (V, E) and a node v ∈ V , if v is in the global core, we could recommend the (important) neighbors of v. If v is in a small community, we could recommend a small set using the community, if v is a near node, we find the node, u say, in the global core that is closest to v, and recommend a small set using u, and if v is a distant node, we could recommend a path from v to the global core, which is of course short. The second application is to develop new ranking system by using the local reductions, since it finds more and more important nodes. (Both of the two application systems are on going projects of the Institute of Software, the Chinese Academy of Sciences.) Understanding the mechanisms and roles of the four classes of nodes is extremely important. In this regard, we proposed a homophyly model of networks by introducing a color to each node in the preferential attachment model, and showed that networks generated from the homophyly model satisfy simultaneously the following: 1) the power law degree distribution, 2) the small world phenomenon, 3) nodes of the same color naturally form a small community, 4) 1 − o(1) fraction of nodes are in small communities, that is, the small community phenomenon, 5) the induced subgraph of a small community follows a power law degree distribution. This theorem implies the homophyly law that homophyly is the mechanism of the small community phenomenon, meaning that real networks are rich in small communities, that nodes within a small community share remarkable common features, providing a base for predicting in networks, that every community has a small set accounts for major internal links, and that every community has a small set accounting for major external links. The homophyly law is validated by a number of real networks. Suppose that a network G satisfies the small community phenomenon, in the sense that most nodes are in small communities. By definition, a community induces a connected graph which is densely connected internally and sparsely connected externally. Suppose we randomly attack a small number of nodes in G. Then with high probability, we target only a few small communities. By definition, nodes within a small community have influence mainly in the community, and the community protects its members. In this case, even if a node is attacked, it is unlikely that the node can injure nodes outside its community. Consequently, the small number of random attacks on G is unlikely to cause a global failure of the whole network. This means that networks with the small community phenomenon are robust. This intuition is correct, which we have proved mathematically. In fact, we are able to prove even the security. The cascading behavior in this paper is a basic model reflecting information or virus spreading, in which, we did not consider the role of structures of the neighbors of a node, v say, in injuring (affecting) v, or the properties of influence, which is either positive or negative. We explain the problem as follows. Suppose that u1 , u2 , u3 are three neighbors of v, and that u1 , u2 , u3 are in the initial set S of attacked nodes. Then the influence of S on node v could depend on different structures of u1 , u2 , u3 , for example: i) u1 , u2 , u3 are isolated nodes; ii) two of u1 , u2 , u3 are connected; iii) u1 , u2 , u3 form a triangle structure, etc. This happens normally in economic networks reflecting interactions among allies. In addition, edges may have different properties. For example, for an edge, (u, v) say, u may collaborate with v (+) or u may be against v (−). It is important to develop a theory of the cascading influence of this model, which are more suitable for applications in such as economic networks or network games.

6

Conclusion

We propose the method of local reduction to extract global core (GC, for short) of large networks, and a new algorithmic problem of the r-radius center to understand theoretically the notion of global core of networks. Our new method is built based on a new understanding of complex networks of both local and global perspectives. The local reduction runs in nearly linear time to extract an approximate solution of the global cores for large networks. The results of the local reduction on classic example of graphs

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:19

and benchmark data show that 1) the global core found by the local reduction is almost the same as the optimal solution of the r-radius center problem for small constant r, 2) the global core found by our local reduction agrees with our intuition of the important global structure of a network, and 3) the global core found by our local reduction is different from the ones found by existing algorithms such as the maximum degree strategy, and the k-core of graphs. This implies that the local reduction finds good approximate solution to the r-radius center problem, which helps very much in understanding the structure of large networks. This conjecture is validated by a series of experiments of the local reduction on typical collaboration networks. In fact, we discover: 1. The global core of a network is small, and preserves the global structure and degree distribution of the network. 2. The global core GC of a network G is intuitively similar to the original graph G. 3. Nodes in the global core are much more influential than the rest of nodes in G1 . Under the scenario of information diffusion, compared with G1 , a small number of randomly chosen nodes (individuals) in the GC induce a giant cascading of the whole network. 4. The global core is sensitive to random errors for global failure of the whole network. 5. The average betweenness of nodes in Gi increases as i grows, meaning that nodes in Gi+1 have more controlling force than nodes in Gi on the average. 6. For most small communities, C say, there is a node v ∈ C and a path of length  2 from v to a node u in the global core, see Table 11. 7. The global core GC is a good approximate solution of the r-radius center of the whole network, because GC is small, and the radius r of the GC in G is a small constant,  9 in our experiments as shown in Table 10. 8. There are nodes which are not in the GC, nor in any of the small communities, and links to the GC within 3 steps, which are called near nodes, written as NN 9. For a real network G, the vertices of G can be categorized into four classes: the global core, the small communities, the near nodes, and the distant nodes, denoted by GC, SC, NN and DN respectively. These four classes must play different roles in the network, which leads to new research problems to examine the different roles of the GC, SC, NN and DN. The classification of vertices seems a common property for most real networks. The research in this paper arises some fundamental open questions such as how to understand the galaxy structure of networks, and how to investigate the different roles of the four classes, i.e., the GC, the SC, the NN, and the DN of nodes of a network.

Acknowledgements All authors were partially supported by Grand Project “Network Algorithms and Digital Information” of the Institute of Software, the Chinese Academy of Sciences. National Key Basic Research Project of China (Grant No. 2011CB302400), and “Strategic Priority Research Program” of the Chinese Academy of Sciences (Grant No. XDA06010701). The last author was partially supported by Hundred-Talent Program of the Chinese Academy of Sciences. During the preparation of this paper, the last author was visiting Isaac Newton Institute for Mathematical Sciences, Cambridge University, as a visiting fellow, for which he gratefully acknowledges the support from the Institute.

References 1 Seidman S. Network structure and minimum degree. Soc Netw, 1983, 5: 269–287 2 Borgatti S, Everett M. Models of core/periphery structures. Soc Netw, 2000, 21: 375–395 3 Alvarez-Hamelin J, Dall Asta L, Barrat A, et al. Large scale networks fingerprinting and visualization using the k-core decomposition. Adv Neural Inf Process Syst, 2006, 18: 41 4 Goltsev A, Dorogovtsev S, Mendes J. k-core (bootstrap) percolation on complex networks: critical phenomena and nonlocal effects. Phys Rev E, 2006, 73: 056101 5 Holme P. Core-periphery organization of complex networks. Phys Rev E, 2005, 72: 046111 6 Hojman D, Szeidl A. Core and periphery in networks. J Econ Theory, 2008, 139: 295–309

Zhang W, et al.

Sci China Inf Sci

July 2014 Vol. 57 072101:20

7 Mislove A, Marcon M, Gummadi K, et al. Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. New York: ACM, 2007. 29–42 8 Hochbaum D, Shmoys D. Using dual approximation algorithms for scheduling problems theoretical and practical results. J ACM, 1987, 34: 144–162 9 Hsu W, Nemhauser G. Easy and hard bottleneck location problems. Discrete Appl Math, 1979, 1: 209–215 10 Andersen R, Chung F, Lang K. Local graph partitioning using pagerank vectors. In: 47th Annual IEEE Symposium on Foundations of Computer Science. Washington D. C.: IEEE, 2006. 475–486 11 Andersen R, Peres Y. Finding sparse cuts locally using evolving sets. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing. New York: ACM, 2009. 235–244 12 Clauset A, Newman M, Moore C. Finding community structure in very large networks. Phys Rev E, 2004, 70: 066111 13 Fortunato S. Community detection in graphs. Phys Rep, 2010, 486: 75–174 14 Girvan M, Newman M. Community structure in social and biological networks. Proc Nat Acad Sci, 2002, 99: 7821 15 Kannan R, Vempala S, Vetta A. On clusterings: good, bad and spectral. J ACM, 2004, 51: 497–515 16 Leskovec J, Lang K, Dasgupta A, et al. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math, 2009, 6: 29–123 17 Leskovec J, Lang K, Mahoney M. Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, 2010. 631–640 18 Hopcroft J, Khan O, Kulis B, et al. Natural communities in large linked networks. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2003. 541–546 19 von Luxburg U. A tutorial on spectral clustering. Stat Comput, 2007, 17: 395–416 20 Newman M, Girvan M. Finding and evaluating community structure in networks. Phys Rev E, 2004, 69: 026113 21 Danon L, Diaz-Guilera A, Duch J, et al. Comparing community structure identification. J Stat Mech-Theory Exp, 2005, 2005: P09008 22 Palla G, Der´ enyi I, Farkas I, et al. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814–818 23 Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms. Phys Rev E, 2008, 78: 046110 24 Lancichinetti A, Fortunato S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E, 2009, 80: 016118 25 Li A S, Peng P. The small-community phenomenon in networks. Math Struct Comput Sci, 2012, 22: 373–407 26 Condon A, Karp R. Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms, 2001, 18: 116–140 27 Domingos P, Richardson M. Mining the network value of customers. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2001. 57–66 ´ Maximizing the spread of influence through a social network. In: Proceedings of the 28 Kempe D, Kleinberg J, Tardos E. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2003. 137–146 29 Freeman L. A set of measures of centrality based on betweenness. Sociometry, 40, 1977: 35–41 30 Albert R, Jeong H, Barab´ asi A. Error and attack tolerance of complex networks. Nature, 2000, 406: 378–382 31 Broder A E A. Graph structure in the web. Comput Netw, 2000, 33: 309–320 32 Cohen R, Erez K, ben Avraham D, et al. Resilience of the internet to random breakdowns. Phys Rev Lett, 2000, 85: 4626–4628 33 Cohen R, Erez K, ben Avraham D, et al. Network robustness and fragility: percolation on random graphs. Phys Rev Lett, 2000, 85: 5468–5471

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072102:1–072102:12 doi: 10.1007/s11432-013-5047-7

Correlations between characteristics of maximum influence and degree distributions in software networks GU Qing1 ∗ , XIONG ShiJie2 & CHEN DaoXu1 1National

Key Lab of Novel Software Technology and Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China; 2National Laboratory of Solid State Microstructures and Department of Physics, Nanjing University, Nanjing 210023, China Received July 1, 2013; accepted December 23, 2013; published online March 11, 2014

Abstract Software systems can be represented as complex networks and their artificial nature can be investigated with approaches developed in network analysis. Influence maximization has been successfully applied on software networks to identify the important nodes that have the maximum influence on the other parts. However, research is open to study the effects of network fabric on the influence behavior of the highly influential nodes. In this paper, we construct class dependence graph (CDG) networks based on eight practical Java software systems, and apply the procedure of influence maximization to study empirically the correlations between the characteristics of maximum influence and the degree distributions in the software networks. We demonstrate that the artificial nature of CDG networks is reflected partly from the scale free behavior: the in-degree distribution follows power law, and the out-degree distribution is lognormal. For the influence behavior, the expected influence spread of the maximum influence set identified by the greedy method correlates significantly with the degree distributions. In addition, the identified influence set contains influential classes that are complex in both the number of methods and the lines of code (LOC). For the applications in software engineering, the results provide possibilities of new approaches in designing optimization procedures of software systems. Keywords

software network, scale free, influence maximization, power law, complex network

Citation Gu Q, Xiong S J, Chen D X. Correlations between characteristics of maximum influence and degree distributions in software networks. Sci China Inf Sci, 2014, 57: 072102(12), doi: 10.1007/s11432-013-5047-7

1

Introduction

In view of structures, a software system can be expressed as a software network [1–11], representing collaborations or dependencies among the modules or entities building up the system. Different types of software networks can be built based on different levels of software entities. Being complex human- made artifacts and integral parts of daily life [12–14], software systems deserve clear understanding from both fabrics and physical behaviors. For this purpose, software network can be a good starting point. Many studies have already been made to investigate whether scale free or small world properties are held in software networks [5–9,11,15–21]. ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Gu Q, et al.

Sci China Inf Sci

July 2014 Vol. 57 072102:2

The method of influence maximization is aimed to identify the set of nodes which play influential or important roles in a network [22–35], where influence is propagated through the links. Finding influential nodes is important in a variety of complex systems represented with the networks, such as the situations in viral marketing and the Internet [22,30,34]. For software, it gives knowledge of the role played by each node (and the corresponding software entity) in spreading contagions in a software network. For example, the obliged code change caused by a bug fix or a function update is an example of the contagions spreading within a software system. Recent researches have given ways of identifying high rank entities in a software system represented as a network, which have been proved useful in real-world software engineering, especially for software maintenance and reuse [8,9,35]. Although finding highly influential nodes in a network is necessary, for software, another question is also important: what are the effects of the network fabric on the influence behavior of the nodes. In a software network, highly influential nodes are those that stay in the center [22,30], and give services to other software entities [9,35]. Empirical studies have shown that such entities are more subjective to change, compared to other parts of the software [21,36]. If the relationship between the network fabric and the influence behavior is made clear, developers can find ways of optimizing the software structure, so that the influence spread of the highly influential nodes is limited, and hence, improving the maintainability. To our best knowledge, this question has not been fully studied by current researchers yet. In this paper, we apply influence maximization on class dependence graph (CDG) networks representing object-oriented (OO) software systems, and study the correlations between the characteristics of maximum influence and the degree distributions in the networks. The motivations of this paper are twofold. The first is to investigate the scale free behavior in software networks, which are treated as directed networks, and both the in-degree and the out-degree distributions are studied separately. The other is to investigate the effects of degree distributions, including the scale free behavior, on the characteristics of maximum influence in software networks. By experimental studies, we demonstrate that the artificial nature of software networks is reflected partly from the scale free behavior: the in-degree distribution follows power law, while the out-degree distribution is lognormal. For influence behavior, the attainable influence spread of the maximum influence set identified by the greedy method correlates significantly with the degree distributions. To our best knowledge, this finding is new, useful in software engineering, and deserves thorough studies. In addition, the highly influential classes identified are complex in both the number of methods and the lines of code (LOC), which increase the probability of change. This suggests possible ways of design optimization for the software systems. The rest of the paper is organized as follows. Section 2 briefly introduces related works on software networks and influence maximization. Section 3 describes our approach in applying influence maximization on CDG networks representing OO software. Section 4 presents our experimental studies to evaluate the results of applying influence maximization on CDG networks. At last, Section 5 concludes the paper with some future works.

2

Basic procedures for investigations of software networks

Here we briefly introduce two groups of relevant studies: one relates to representing software systems as complex software networks, the other relates to applying influence maximization on large networks. 2.1

Software systems as networks

There are plenty of works already done to represent software systems as different types of software networks, and analyze network properties to gain deep understanding of the nature of complex software. Examples include package dependence networks built from the Linux/BSD operating systems [3,7,11], variant types of class diagrams and call graphs from Java/Smalltalk/C++ programs [1,2,6,15,19,21], type collaboration networks [5], module collaboration networks [20] and procedure call networks [9] from C/C++ software, and software mirror graphs [4] from software running traces.

Gu Q, et al.

Sci China Inf Sci

July 2014 Vol. 57 072102:3

Power laws are commonly found in these artificial networks. Some researchers tried to explain the scale free behavior based on the design and evolution of software systems [3,7,8,15,19]. There are also researchers who discussed possible applications of the scale free behavior, for example, in software reuse and quality assurance [2,20]. There are studies discussing about the “important” nodes within a software network. For example, the page rank algorithm has been applied to find high-rank Java packages [8] or classes [35], and popular C/C++ procedures [9]. Small world behavior is also discussed in software networks, but there are research results against the small world behavior in directed software networks, for example, in [8]. 2.2

Brief introduction of influence maximization

Given a network G = (V, E), and an initial set of active nodes S (S ⊆ V ), the expected influence spread of S, denoted by σ(S), is defined to be the number of nodes anticipated to be affected (influenced) by contagions started from the nodes in S [26]. The process of influence propagation is formulated by diffusion models [22,25–27], which include the mostly used IC model. Based on above, an influence maximization problem can be defined as: Finding the maximum influence set S (S ⊆ V ) of m nodes, which has the maximum σ(S), i.e. for any set A, if A ⊆ V and |A| = m, then σ(A)  σ(S). Solving the influence maximization problem is proven to be NP-hard [24,25]. Current techniques for influence maximization can roughly be grouped into three categories: the heuristic methods, the greedy methods, and the search based methods. The heuristic methods determine the maximum influence set S based on the node centrality measures [22,30,31,34,35,37,38], which include degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and k-shell index. The page rank algorithm belongs to the eigenvector centrality. Under the greedy methods, the current most influential node (i.e. the node with the maximum expected influence gain) is selected one in a time until enough nodes are selected to build the set S. The expected influence can be computed by the Monte Carlo simulation [25], by applying the bond percolation process [23,31], or by the expectation method [39]. The search based methods try to find the set S iteratively using meta-heuristic optimizers such as simulated annealing or particle swarm optimization [32].

3

Applying influence maximization on software networks

Given a software system represented as a network, the objectives of applying influence maximization are twofold: One is to identify highly influential or important nodes; the other is to estimate the expected influence spread attained by these nodes. In the following, we firstly give a model of transforming OO software into software networks, and then introduce the approach of applying influence maximization on the networks. At last, we discuss possible applications of our methods in software engineering. 3.1

Class dependence graph

Class dependence graph (CDG) is a method of transforming an object-oriented (OO) software system into a software network, which is called a CDG network. A CDG network is a directed network G = (V, E), in which a node v ∈ V is a class or an interface defined in the OO software. A directed link e12 = v1 , v2  (e12 ∈ E) represents any of the three relationships between two class (or interface) nodes v1 and v2 : • Inheritance: v1 inherits or implements v2 ; • Aggregate: v1 has a field (or member data) whose defined type is v2 ; • Parameter: a method (or member function) of v1 has a parameter whose defined type is v2 , or the method’s return type is v2 . CDG resembles the class dependency network defined in [2]. The difference is that we treat it as a directed network, and only one link is allowed between any pair of starting and ending nodes, i.e. it is a simple graph. We define CDG based on class dependencies amenable to influence propagation.

Gu Q, et al.

Sci China Inf Sci

July 2014 Vol. 57 072102:4

CDG networks are built solely based on the header information of the classes, ignoring details of class implementation by particular developers. The header information is available at the design stage of software development, and is sufficient to represent the structure of software. Other possible dependencies that are hidden within the method bodies also deserve further investigations, which will be part of our future work. Figure 1 shows two CDG networks built from the Java software systems (packages) “Javax” and “Tools”. Walrus (http://www.caida.org/tools/visualization/walrus/) is used to depict the networks in 3D space. In these networks, hubs can clearly be seen, which imply scale free behavior. 3.2

Methods for influence maximization

We use the greedy method called cost-effective lazy forward (CELF) selection [28] to identify the set of nodes with maximum influence in software networks. Given a network G = (V, E), CELF runs the following two steps repeatedly until enough nodes are selected into the maximum influence set S: 1. Select the node v with the maximum marginal influence gain: δS (v) = σ(S ∪ {v}) − σ(S); 2. If δS (v) is computed against an elder S  , re-compute it with the current S; else add v to S, i.e. S = S ∪ {v}. CELF is an efficient variant of the simple greedy method introduced in [25]. Since the influence function σ(S) is submodular, CELF can give a solution which has quantitatively bounded approximation to the optimum [25,28], and for influence maximization, outperforms the other methods, such as the page rank. We use the independent cascade (IC) model to formulate the influence process in software networks. The IC model is one of the mostly used diffusion models for influence maximization [23,25,27,28,31]. Under the IC model, any node v in a network G takes one of the two states: active or inactive, where active means affected or influenced. A node v can only switch from inactive to active, but not vice versa. The diffusion process takes the form of a series of discrete steps t0 , t1 , t2 , . . .; and at step t0 , only nodes in the initial set S are active. Each link eij =< vi , vj > in G is assigned an influence probability pij (0  pij  1) to express the chance of influence along the link. The influence along eij is directional, i.e. from vi to vj , but not vice versa. Once vi becomes active in a step tl , it has an opportunity to affect each of its inactive direct (child) neighbors in the next step tl+1 . After that, vi stops the trials to affect its neighbors in the later steps. The diffusion process stops at the step tn if no new node switches from inactive to active at tn . For software networks, it should be noted that based on the definition, a link in a CDG network implies “trust” from the starting node to the ending node: eij implies vi trust vj , which means that vj can affect vi along eij ; but not vice versa. Hence, the links in a CDG network are all trust links. For influence propagation, the trust links ought to be reversed to get the influence links. To compute influence spread (i.e. the influence function σ(S)) in a network, Monte-Carlo simulation is adopted. Kempe et al. [25] suggested that 10000 times of simulation was good enough to ensure the quality of approximation, which is used in our experiments. 3.3

Applications in software engineering

By applying influence maximization on software networks, the highly influential nodes can be identified, and the maximum attainable influence spreads can be estimated. Developers should keep the highly influential classes simple and less fault prone, and put complex classes at peripheries of a software network (i.e. linking to other nodes instead of linked by others). If an influential class is too complex, developers could simply split it into multiple classes, which are more concise and concentrated on singular functions. Given a set of classes, the computed influence spread can be used to measure the relative importance or criticality of these classes in the software system. Moreover, during software evolution or renovates, the influence spread could be a valuable measure to predict the efforts required or the software parts affected by updating or fixing these classes. At last, by limiting the possible influence spreads, the maintainability of a software system can be improved.

Gu Q, et al.

4

Sci China Inf Sci

July 2014 Vol. 57 072102:5

Empirical evaluation

In this section, firstly we introduce the Java software systems selected for the empirical study. Secondly, we demonstrate scale free behavior in the in-degree distribution of the CDG networks built. Thirdly, we analyze correlation of the maximum influence spread to the degree distributions of the networks. At last, we analyze the highly influential class nodes identified, and the implications in software design optimization. We select eight Java software systems (packages), and build CDG networks from them to explore the effects of applying influence maximization. The Apache Commons BCEL facility is used to build the networks (http://commons.apache.org/bcel/). Table 1 lists details of the eight Java systems and the corresponding CDG networks. In each network, standalone class nodes are removed, and only one link exits between each pair of starting and ending nodes, which may stand for multiple inter-class relationships. The Java systems are selected from a variety of application areas, from Java language facilities and runtime environments, to Eclipse plugins and stand-alone libraries. Size of the networks ranges from hundreds of nodes (and links) to tens of thousands of nodes (links). The purpose is to increase generality and soundness of the experimental results. 4.1

Scale free behavior in the degree distributions

Concas et al. [6,19] indicated that besides power law, lognormal distribution also prevailed in software networks. Kohring [8] approved that lognormal was obeyed by the out-degree distributions in the studied software networks. For CDG networks, we treat the in-degrees and the out-degrees separately, and take into consideration both power law and lognormal distributions. The in-degree and out-degree distributions of these CDG networks are depicted in Figures 2 and 3 respectively. In each figure, the percentages of nodes with given in-degree or out-degree are shown in logarithmic scale (base = 2). It can be seen that in each network, the distribution of in-degrees nearly forms a straight line in logarithmic scale, which demonstrates a power law, while the distribution of out-degrees may conform to lognormal. For the in-degree distribution, the basic power law distribution formula is p(x) = cx−α [40], which specifies the probability of a node possessing the in-degree x. Taking into account the zero in-degrees, which constitute a significant portion of nodes in CDG networks, and cannot be ignored for studying influence behavior, we adapt the formula as p(x) = c(x + x0 )−α , where c > 0, α > 0, x  0 and x0 is a positive constant.

(1)

Figure 2 shows the percent of nodes of each in-degree value in the eight CDG networks (in logarithmic scale). For each individual network, we treat the proportion of nodes as p(x), and apply curve fitting [41] based on formula (1). For simplicity, we fix the constant at x0 = 1 for all the networks. Table 2 lists the fitted α and the relevant coefficient of determination (R2 ) resulted in each network. It can be seen that power law fits perfectly well for all the in-degree distributions in these networks; the worst R2 is 0.964. The fitted α value ranges from 1.31 to 1.87. For the out-degree distributions, we also adapt the probability density function of the lognormal distribution to take into account the zero out-degrees, where p(x) denotes the probability of a node possessing the out-degree x p(x) = √

[ln(x+x0 )−μ]2 1 2σ2 e− , where x0 is a positive constant. 2πσ(x + x0 )

(2)

Figure 3 shows the percent of nodes of each out-degree value in the eight CDG networks (also in logarithmic scale). For each individual network, we again treat the proportion of nodes as p(x), and apply curve fitting [41] based on formula (2). Same as above, we fix the constant at x0 = 1. Table 3 lists the fitted parameters μ and σ by the out-degrees in each CDG network along with R2 . The results imply

Gu Q, et al.

Figure 1

Sci China Inf Sci

(a) (b) CDG networks depicted by Walrus. (a) “javax” namespace of “rt.jar” in JRE6; (b) “tools.jar” in JDK1.6. Table 1

Network

The CDG networks of different Java systems |V |

Description

Junit

|E|

junit.jar v4.8.2, http://junit.org/

171

423

Ant

ant.jar v1.8.2, http://ant.apache.org/

1037

2744

Jface

org.eclipse.jface v3.7.0, http://www.eclipse.org/

715

1449

org.eclipse.jdt.core v3.7.3, http://www.eclipse.org/

1415

8077

java namespace in JRE6, http://www.java.com/

2433

9798

Javax

javax namespace in JRE6, http://www.java.com/

2954

8055

Tools

tools.jar in JDK1.6, http://www.java.com/

3323

14620

Jung

jung2 v2.0.11) , http://jung.sourceforge.net/

2677

7285

Junit Ant Jface Jdt Java Javax Tools Jung

16 8 4 2 1 0.5 0.25 0.125

Junit Ant Jface Jdt Java Javax Tools Jung

32 16 8 4 2 1 0.5

0.25 0.125 0.0625 0.03125

0.0625 0.03125 1 Figure 2 network.

Percent of nodes (%)

Jdt Java

32

Percent of nodes (%)

July 2014 Vol. 57 072102:6

2

4

8

16 32 64 128 256 512 1024 In-degree

Distribution of the in-degrees in each CDG

1 Figure 3 network.

2

4

8 16 Out-degree

32

64

128

Distribution of the out-degrees in each CDG

that lognormal fits perfectly well to the out-degrees of these networks. The fitted μ ranges from 0.85 to 1.50, while σ ranges from 0.40 to 0.84. Mitzenmacher [42] indicated that the lognormal and the power law distributions are intrinsically connected with each other, and one would become the other by minor variations during the generation (evolution) process [8,42]. Hence, it is not surprising to find that the two distributions co-exist in software networks. It should be noted that a power law distribution could have infinite mean and variance [40,42]. As the in-degree represents the number of times a class node being referenced by other classes, it has the possibility of becoming infinite if the software system is unboundedly growing up. On the other 1) All the “.jar” files in “jung2” are counted in.

Gu Q, et al.

Table 2

Sci China Inf Sci

July 2014 Vol. 57 072102:7

Fitted exponents of power law by the in-degree distributions

Junit

Ant

Jface

Jdt

Java

Javax

Tools

α

1.55

1.34

1.69

1.31

1.58

1.46

1.33

1.87

R2

0.993

0.964

0.989

0.994

0.999

0.993

0.992

0.998

Table 3

μ

Jung

Fitted parameters of lognormal by the out-degree distributions

Junit

Ant

Jface

Jdt

Java

Javax

Tools

Jung

1.12

1.06

0.85

1.50

1.38

1.03

1.36

1.06

σ

0.56

0.54

0.40

0.84

0.52

0.53

0.66

0.48

R2

0.988

0.993

0.984

0.994

0.969

0.978

0.989

0.994

Table 4

Size of the maximum influence set S in each network

Junit

Ant

Jface

Jdt

Java

Javax

Tools

Jung

|S|

2

11

8

15

25

30

34

27

% of total nodes

1.17

1.06

1.12

1.06

1.03

1.02

1.02

1.01

hand, as the size of a class node is limited, the out-degree tends to be finite. This may explain why the in-degrees of a software network follow power law, while the out-degrees follow lognormal. 4.2

Correlation between the maximum influence spread and the degree distributions

As CDG networks possess scale free behavior in the degree distributions, we investigate the correlation between degree distributions and the maximum influence spread in these networks. To fully study the effects of degree distributions on influence spread, for each CDG network we generate configuration models [22,43], which preserve the degree distributions. Given a network G = (V, E), its configuration model G = (V, E  ) is built by repeatedly running the following two steps until enough edges are generated, i.e. |E  | = |E| (E  is initially empty; and the degree quanta of each node are initialized to its degrees in G): 1. Randomly select one node vi whose out-degree quantum has not been exhausted and vj whose / E; in-degree quantum not exhausted, ensure that eij =< vi , vj >∈ 2. Generate edge eij and add eij to E  ; decrease both the out-degree quantum of vi and the in-degree quantum of vj by 1. Given a network G, CELF is used to identify the maximum influence set S; and the correspondent σ(S) is computed by the Monte Carlo simulation under the IC model. We randomly generate 10 configuration models for each of the eight CDG networks, and totally, we have 88 networks. Since a configuration model preserves both the in-degree and the out-degree distributions of the corresponding CDG network, it also possesses the scale free behavior. As the sizes of these networks are different from each other, we set |S| (i.e. the size of S) accordingly to make the percents of initial active nodes comparable. Table 4 lists the settings of |S| for each CDG network. For the IC model, the influence probability on each link is uniformly assigned p = 0.05 and p = 0.07 respectively in the different sets of experiments. We use the exponent α of formula (1) to represent the in-degree distributions, and the logarithmic mean μ of formula (2) to represent the out-degree distributions. Figure 4 shows the correlation between α and σ(S), and between μ and σ(S) in scatter plots, where “Config” stands for the configuration models, and “CDG” stands for the CDG networks. The upper two figures, (a) and (b) correspond to p = 0.05, while the lowers, (c) and (d) correspond to p = 0.07. To facilitate comparison, σ(S) is presented in percentage of the total nodes in each network (denoted by σ(S)% ). It should be noted that for a CDG network, its configuration models preserve the degree distributions, i.e. they have the same α and μ values. From Figure 4, it is evident that in these networks, the degree distributions contribute much to the maximum influence spreads, since the scatter points tend to group

Gu Q, et al.

July 2014 Vol. 57 072102:8

p = 0.05 Config CDG

9

σ (S)%

9

σ (S)%

Sci China Inf Sci

6

p = 0.05 Config CDG

6

3

3

1.4

1.6 α (a)

15

1.8

1.0 15

p = 0.07 Config CDG

10

1.2 μ (b)

1.4

p = 0.07 Config CDG

σ (S)%

σ (S)%

10

5

5

1.4

Figure 4

1.6 α (c)

1.0

1.8

1.2 μ (d)

1.4

Correlations between the maximum influence spread and the degree distributions.

Table 5

The Pearson’s correlation to σ(S)% by the degree distribution parameters p = 0.05 α

p = 0.07 μ

α

μ

Pearson’s r

0.287

0.753

0.297

0.773

p-value

0.0067

2.57 × 10−17

0.0049

1.05 × 10−18

around the eight distinct α and μ values, corresponding to the eight CDG networks. This means the maximum influence spreads are nearly the same in configuration models with similar degree distributions. Different settings of p in the IC model do not change the influence behavior, since relative positions of the scatter points in both the paired figures (i.e. (a) vs. (c), and (b) vs. (d) in Figure 4) are almost identical. In some networks, difference of σ(S)% between CDG and its configuration models is obvious (e.g. in “Jdt”, which has the smallest α and the biggest μ in the eight networks), this suggests that other network features also take effects on the influence behavior. In Figure 4, The correlation between σ(S)% and μ is more evident than between σ(S)% and α. In the two figures (a) and (c), the peak point of “Java” (containing the “super” class “java.lang.Object”) in the middle, and the turn-up of “Jung” (containing multiple “.jar” files contributed by different groups of developers) at the end, both suggest that the correlation is not linear. To formally approve this, we compute the Pearson correlation coefficients [44] between σ(S)% and the two distribution parameters. Table 5 lists the results of the correlation computation, where data from the CDG networks and the configuration models are combined together. It approves our observation in Figure 4, both the in-degree and the out-degree distributions significantly correlate with the maximum influence spreads in these networks (p-value < 0.01). The correlation with the out-degrees is more significant than with the indegrees, since the absolute value of Pearson’s r is bigger, and the p-value is much smaller. For the IC model used, the value of p alters r proportionally, but does not change the significance of the correlations. During our experiments, the correlations between degree distributions and the maximum influence spread remain significant as long as the influence probability p falls between [0.01, 0.1] and the size of

Gu Q, et al.

Table 6

Sci China Inf Sci

July 2014 Vol. 57 072102:9

The contents of the maximum influence set identified in “Javax” Fields

Methods

LOC2)

In-degree

Out-degree

javax.swing.JComponent

73

189

8445

247

19

javax.accessibility.Accessible (I)

0

1

0

116

1

javax.swing.plaf.ComponentUI

0

14

138

172

3

javax.swing.text.AttributeSet (I)

2

10

0

101

1

javax.swing.Icon (I)

0

3

0

110

0

javax.swing.text.Element (I)

0

10

0

112

3

Class name

javax.swing.plaf.UIResource (I)

0

0

0

114

0

javax.accessibility.AccessibleContext

27

25

209

76

12

javax.swing.event.ChangeListener (I)

0

1

0

74

1

javax.swing.Action (I)

11

6

0

53

0

the maximum influence set S is less than 5% of the total nodes in each network, the same setting as in [25]. This result is far from complete, since CELF is very time consuming, and we could not try every possible combination of the parameter settings. Further study on the correlations, for example, theoretically rather than empirically, is essential and will be one of our future work. 4.3

The maximum influence set identified in CDG networks

Applying influence maximization on CDG networks, the identified maximum influence sets are valuable in both software development and maintenance. Highly influential classes should be kept simple, and put under risk management undertaking thorough testing and review. Table 6 lists the contents of the maximum influence set (|S| = 10) identified in “Javax”, by CELF running under the IC model with p = 0.07. Both code measures including number of fields, methods, and lines of code (LOC), and network measures including in and out-degrees are listed for each class node. Within the 10 class nodes listed in Table 6, seven of them are interfaces with 0 LOC, and the others are solid classes. Considering the network measures, in-degree (of trust links) is important since it directly determines the number of influence trials from a node. Not all the influential nodes have the top-biggest in-degrees, since they may be linked by (i.e. be trusted by) other highly influential nodes. Considering the code measures of the influential classes, Studies [45,46] have stated that number of methods and LOC are strongly and positively correlated with fault-proneness of software entities. Some classes listed in Table 6 do have a relatively great number of both the methods and the LOC. For example, the class “javax.swing.JComponent” has 189 methods defined and its LOC is 8445, the corresponding class node has 247 in-degree and 19 out-degree. In fact, the maximum influence sets (|S| = 10) of all the Java systems studied contain highly influential class nodes having methods in hundreds and LOC in thousands or even tens of thousands (e.g. “org.eclipse.jdt.internal.compiler.lookup.Scope” in “Jdt”). Maintainability of the software system would be improved by making these highly influential classes simpler and less fault-prone. 4.4

Discussion

During our experimental study, we first investigate the scale free behavior in CDG networks, which are simple directed graphs, and both the in-degree and the out-degree distributions are studied separately. By experimental results, we have approved scale-free behavior in the in-degree distributions, and found that the out-degrees follow lognormal. Although numerous papers have studied scale free behavior in software networks, few have investigated the possible implications of scale free behavior in real-world applications. We then make a trial for this, and to our best knowledge, is the first to study the effects of degree distributions, including the scale free behavior, on the maximum influence spreads in software networks. 2) The value is by counting lines of code in the byte code, achieved by the BCEL facility. We think it is a more accurate LOC measure than by counting lines of code in the source code.

Gu Q, et al.

Sci China Inf Sci

July 2014 Vol. 57 072102:10

To make clear if the other network features, other than the degree distributions, make any significant difference on the influence behavior, we use configuration models for comparison. The experimental results suggest that the degree distributions, especially the out-degrees, largely determine the maximum influence in a software network. The differences among a software network and its corresponding configuration models are small. Above finding provides possibilities of approaches in design optimization to improve software maintainability. In order to study the effects of degree distributions on the influence behavior in software networks, we apply influence maximization on the CDG networks. For each link in a network, we do not make any difference among either the type or the number of inter-class relationships it stands for. The particular influence probabilities uniformly assigned during our experiments are estimations of the possible influences along the inter-class dependencies. Due to class encapsulation and foreseeable design optimization, this probability ought to be small. By the way, we do not try to calculate the exact number of possible affections by any singular class node, but to analyze the effect of network features on the influence behavior. During the experiments, our findings are valid (but not limited to) when the influence probability p falls between [0.01, 0.1] and the size of the maximum influence set S is less than 5% of the total nodes in a network, a setting commonly used in influence maximization [25]. We think that the 5% influence set is reasonable in real-world applications in software engineering, since during software maintenance, it is seldom that more than 5% of the software parts are obliged to change at the same time in a single revision, which can be validated by available data on open source development (http://www.sourceforge.net). Figure 4 and Table 5 show that the maximum influence directly correlates to the degree distributions in a CDG network. Other than the degree distributions, other network features contributes little in altering the correlation, since according to Figure 4, the diversity in the maximum influence spreads is little among a software network and its configuration models, which are random except that the node degrees are preserved. Based on Table 5, the correlations between the maximum influence and both the in-degree and out-degree distributions are relatively strong and sound, almost regardless of the influence probability assigned on the links. According to (a) and (c) of Figure 4, a bigger α in the power law by the in-degrees is better to limit the maximum influence, which means a steeper line of the in-degree distribution in logarithmic scale. This suggests that the proportion of nodes holding large in-degrees should be small. For the out-degrees, according to (b) and (d) of Figure 4, a smaller μ in the lognormal function is better, which means the mean logarithms of the out-degrees should be small. This again suggests smaller proportion of nodes holding large out-degrees. Although these findings conform to the common knowledge in software engineering that less coupling between software modules is preferred during software design, our results suggest a quantitative instrument to rectify the designed software structure. Our study makes a step forward for the real-world application of the scale free behavior commonly found in software networks. One possible application is that the degree distributions can be used to judge if the designed software structure is suitable for maintenance, since the distributions directly correlate to the maximum influence. During software evolution or renovates, the maximum influence could be a valuable measure to predict the efforts required or the software parts affected by those classes mandated to updates. Hence, if the possible influence spreads are limited, the maintainability of the software system is improved. Another application is that the detected influential classes shall be made simple, so that either the possibility of change or the fault proneness is kept small, and hence, little chance of too much influence spreads, which also improve the maintainability. During our experiments, although the Java systems selected have already undertaken many revisions and optimizations, we can still find opportunities of improvements by locating complex classes in the highly influential class sets.

5

Conclusion

In this paper, we build CDG networks from object-oriented software systems, and apply the procedure of influence maximization to investigate the correlations between the characteristics of maximum influence and the degree distributions in the networks. To our best knowledge, this is the first work that studies the

Gu Q, et al.

Sci China Inf Sci

July 2014 Vol. 57 072102:11

effects of network fabric on the influence behavior of the highly influential nodes. By empirical studies on the networks constructed from eight Java software systems, the results demonstrate that a CDG network possesses scale free behavior in the degree distributions: the in-degrees follow power law, while the outdegrees follow lognormal. Correlations of the degree distributions to the maximum influence spread are studied. The valuable finding is that the degree distributions, especially the out-degree distribution strongly determines the maximum influence spread in a CDG network. All this suggests possibilities of optimizing the structure of a software system, so that its maintainability can be improved. By applying CELF, we have found highly influential class nodes with too much complexity in number of methods and LOC, which also calls for the design optimization. Our experimental studies could be extended to use more software systems to improve soundness of the results, and involve extra network features such as community structure and small world behavior, which are worthy of study, to investigate the effects on influence propagation. The effects of degree distributions on influence behavior, for example, the influence spreads of nodes other than the maximum influence set, still need further studies.

Acknowledgements This work was supported by National Basic Research Program of China (Grant No. 2009CB320705), National Natural Science Foundation of China (Grant Nos. 61373012, 91218302, 60873027, 61021062, 61076094), and National High-Tech Research & Development Program of China (Grant No. 2006AA01Z177).

References 1 Myers C R. Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys Rev E, 2003, 68: 046116 2 Jenkins S, Kirk S R. Software architecture graphs as complex networks: a novel partitioning scheme to measure stability and evolution. Inform Sciences, 2007, 177: 2587–2601 3 Zheng X, Zeng D, Li H, et al. Analyzing open-source software systems as complex networks. Physica A, 2008, 387: 6190–6200 4 Cai K Y, Yin B B. Software execution processes as an evolving complex network. Inform Sciences, 2009, 179: 1903–1928 5 De Moura A P S, Lai Y C, Motter A E. Signatures of small-world and scale-free properties in large computer programs. Phys Rev E, 2003, 68: 017102 6 Concas G, Marchesi M, Pinna S, et al. Power-laws in a large object-oriented software system. IEEE Trans Softw Eng, 2007, 33: 687–708 7 Maillart T, Sornette D, Spaeth S, et al. Empirical tests of Zipf’s law mechanism in open source linux distribution. Phys Rev Lett, 2008, 101: 218701 8 Kohring G A. Complex dependencies in large software systems. Adv Complex Syst, 2009, 12: 565–581 9 Chepelianskii A D. Towards physical laws for software architecture. arXiv: 1003.5455, 2010 ˇ 10 Subelj L, Bajec M. Community structure of complex software systems: analysis and applications. Physica A, 2011, 390: 2968–2975 11 LaBelle N, Wallingford E. Inter-package dependency networks in open-source software. arXiv: 0411096, 2004 12 Yang F, Lv J, Mei H. Technical framework for Internetware: an architecture centric approach. Sci China Ser F-Inf Sci, 2008, 51: 610–622 13 Mei H, Huang G, Lan L, et al. A software architecture centric self-adaptation approach for Internetware. Sci China Ser F-Inf Sci, 2008, 51: 722–742 14 Lv J, Ma X, Tao X P, et al. On environment-driven software model for Internetware. Sci China Ser F-Inf Sci, 2008, 51: 683–721 15 Valverde S, Sol´ e R V. Hierarchical small-worlds in software architecture. arXiv: 0307278, 2007 16 Strogatz S H. Exploring complex networks. Nature, 2001, 410: 268–276 17 Albert R, Barab´ asi A L. Statistical mechanics of complex networks. Rev Mod Phys, 2002, 74: 47–97 18 Bhattacharya P, Iliofotou M, Neamtiu I, et al. Graph-based analysis and prediction for software evolution. In: Proceedings of the International Conference on Software Engineering, Zurich, 2012. 419–429 19 Newman M E J. The structure and function of complex networks. SIAM Rev, 2003, 45: 167–256 20 Concas G, Marchesi M, Pinna S, et al. On the suitability of yule process to stochastically model some properties of object-oriented systems. Physica A, 2006, 370: 817–831 21 Louridas P, Spinellis D, Vlachos V. Power laws in software. ACM Trans Softw Eng Meth, 2008, 18: 2

Gu Q, et al.

Sci China Inf Sci

July 2014 Vol. 57 072102:12

22 Kitsak M, Gallos L K, Havlin S, et al. Identification of influential spreaders in complex networks. Nat Phys, 2010, 6: 888–893 23 Kimura M, Saito K, Nakano R, et al. Extracting influential nodes on a social network for information diffusion. Data Min Knowl Disc, 2010, 20: 70–97 24 Lu Z, Zhang W, Wu W, et al. The complexity of influence maximization problem in the deterministic linear threshold model. J Comb Optim, 2012, 24: 374–378 25 Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2003. 137–146 26 Richardson M, Domingos P. Mining knowledge-sharing sites for viral marketing. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 61–70 27 Cosley D, Huttenlocher D P, Kleinberg J M, et al. Sequential influence models in social networks. In: Proceedings of AAAI ICWSM, Washington, 2010. 26–33 28 Leskovec J, Krause A, Guestrin C, et al. Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, 2007. 420–429 29 Watts D J. A simple model of global cascades on random networks. Proc Natl Acad Sci, 2002, 99: 5766–5771 30 Miorandi D, De Pellegrini F. K-Shell decomposition for dynamic complex networks. In: Proceedings of the 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), Avignon, 2010. 488–496 31 Chen W, Wang Y, Yang S. Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 2009. 199–208 32 Jiang Q, Song G, Cong G, et al. Simulated annealing based influence maximization in social networks. In: Proceedings of AAAI, San Francisco, 2011. 127–132 33 Wang Y, Cong G, Song G, et al. Community based greedy algorithm for mining top-k influential nodes in mobile social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2010. 1039–1048 34 Canright G S, Engø Monsen K. Spreading on networks: a topographic view. Complexus, 2006, 3: 131–146 35 Inoue K, Yokomori R, Yamamoto T, et al. Ranking significance of software components based on use relations. IEEE Trans Softw Eng, 2005, 31: 213–225 36 Vasa R, Schneider J G, Nierstrasz O. The inevitable stability of software change. In: Proceedings of IEEE International Conference on Software Maintenance, Paris, 2007. 413–422 37 Martin Gonz´ alez A M, Dalsgaard B, Olesen J M. Centrality measures and the importance of generalist species in pollination networks. Ecol Complex, 2010, 7: 36–43 38 Li C T, Shan M K, Lin S D. Dynamic selection of activation targets to boost the influence spread in social networks. In: Proceedings of the 21st International Conference Companion on World Wide Web, Lyon, 2012. 561–562 39 Zhang Y, Gu Q, Zheng J, et al. Estimate on expectation for influence maximization in social networks. In: Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Hyderabad, 2010. 99–106 40 Newman M E J. Power laws, pareto distributions and Zipf’s law. Contemp Phys, 2005, 46: 323–351 41 Marquardt D W. An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math, 1963, 11: 431–441 42 Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet Math, 2004, 1: 226–251 43 Molloy M, Reed B. A critical point for random graphs with a given degree sequence. Random Struct Algor, 1995, 6: 161–180 44 Myers J L, Well A D. Research Design and Statistical Analysis. Lawrence Erlbaum Associates, 2002 45 Hall T, Beecham S, Bowes D, et al. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 2012, 38: 1276–1304 46 Zhou Y M, Xu B W, Leung H. On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Software, 2010, 83: 660–674

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072103:1–072103:18 doi: 10.1007/s11432-013-5024-1

Identifying extract class refactoring opportunities for internetware CHEN Lin1,2 , QIAN Ju3 , ZHOU YuMing1,2 , WANG Peng4 & XU BaoWen1,2 ∗ 1State

Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China; of Computer Science and Technology, Nanjing University, Nanjing 210093, China; 3College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; 4School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Department

Received May 7, 2013; accepted July 6, 2013; published online May 14, 2014

Abstract The quality of internetware software is significantly associated with class structure. As software evolves, changes often introduce many unrelated responsibilities to the same classes or distribute tightly-related methods in different classes. These changes make the classes difficult to understand and maintain. Extract class refactoring is an effective technique to improve the quality of software structure by decomposing unrelated methods in one class to create new classes or extracting tightly-related methods from different classes. In this paper, we propose a novel approach for class extraction from internetware source codes. This approach leverages a community structure detection technique to partition software into clusters and extracts classes from the resulting clusters. Our experimental results, which investigate the public well-known internetware PKUAS, indicate that: (1) the proposed approach is much faster than existing search-based clustering approaches (Hillclimbing and Genetic algorithm) and is thus applicable for large-scale internetware; (2) the proposed approach can identify meaningful class extractions for internetware; and (3) Extract Class refactoring candidates identified by the proposed approach significantly improve class cohesion of internetware. Keywords

refactoring, extract class, community structure, software modularity, internetware

Citation Chen L, Qian J, Zhou Y M, et al. Identifying extract class refactoring opportunities for internetware. Sci China Inf Sci, 2014, 57: 072103(18), doi: 10.1007/s11432-013-5024-1

1

Introduction

Internetware plays an important role in software development for the Internet computing environment [1,2]. The quality of internetware software is significantly associated with class structure. As software evolves, changes often introduce many unrelated responsibilities to the same classes or distribute tightlyrelated methods in different classes. For example, some methods cooperate to perform the same (or similar) responsibility, but the methods are scattered in different classes. Alternatively, a class grows larger and larger, and undertakes too many responsibilities. These classes have been referred to as God Classes [3]. As an effective technique to improve quality of class structure, refactoring has become an important practice in software development, especially in agile development [3,4]. Extract Class refactoring is a commonly used refactoring that creates a new class and moves the relevant entities from ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:2

the old class into the new class [3]. It can be used to solve the two design flaws mentioned above by decomposing unrelated methods and extracting tightly connected methods. Over the last decade, several approaches have been proposed to identify entities for class extraction refactoring [5–10]. Most existing work focuses on the local structure and takes local entities as candidates for extraction. For example, G. Bavota et al. propose a two-step technique for class extraction [9]. This approach first takes a class identified by a software engineer as input, and then decomposes the class into parts, thus extracting a new class. For a large-scale system, as it evolves, related entities may be distributed in different classes. The approaches that focus on local structure are less successful at extracting scattered methods from different classes. Software modularity is usually used to decompose a software system for comprehension [11]. It also clusters related entities in different clusters, thus is suitable for identifying entities in Extract Class refactoring. In the literature, search-based approaches have been proposed for software clustering and identifying refactoring opportunities [12,13]. However, these search-based approaches are not lightweight and require complex configurations, thus limiting their usability. In this paper, we proposed a community structure detection-based approach to identify Extract Class refactoring opportunities to improve the quality of class structure. Community structure detection has been studied in many areas and usually uses networks as input [14–16]. Software modularity can be considered as a community structure detection problem. In this context, elements being considered in modularity are usually classes in object-oriented software and files in structural programs. Here, we consider method, a lower level entity as an atomic element in modularity. We first construct a graph, called the method dependence graph (MDG), as the input of modularity [17]. Then we apply community structure detection to MDG to obtain clusters. We next employ the following two simple heuristics to identify extract class refactoring opportunities from clusters: (1) if a cluster contains scattered methods from different classes, then these methods can be extracted to create a new class; and (2) if a class is decomposed into different clusters, then the class should be decomposed to several new classes. These two heuristics reflect class cohesion and coupling of a software system at a high level [18], providing good refactoring clues for software developers and maintainers. Our main contributions are summarized as follows: 1. We propose a novel approach to identify Extract Class refactoring opportunities to improve the class structure of internetware. The proposed approach decomposes internetware to clusters by a community structure detection technique that is efficient for large-scale systems. The experimental results show that our approach not only runs much faster than typical search-based approaches, but also identifies meaningful extract class refactoring opportunities. 2. We perform an in-depth investigation on the effects of software entity relationships on class extraction identification. Our study investigates two types of software entity relationships: method-methodinvocation and method-attribute-reference. We also investigate how transitive effect of method invocation affects identification results. 3. We conduct an empirical study on a well-known internetware (PKUAS) to evaluate the refactoring candidates suggested by the proposed approach. To the best of our knowledge, this is the only empirical study on Extract Class refactoring identification for Internetware. Our results indicate that this approach is efficient and effective for large-scale Internetware.

2 2.1

The proposed approach An illustrative example

To illustrate out approach, we discuss a simple example. Considering three classes: C1 with three methods m11, m12, m13; C2 with four methods m21, m22, m23, m24; and C3 with nine methods m31, m32, . . . , m39. We first construct an MDG of these classes first, and then cluster the methods, as shown in Figure 1. After clustering, these methods are grouped into three clusters: M 1, M 2 and M 3. As previously discussed, the clustering results reflect two design flaws: (1) the methods m11, m21, and m31

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:3

M2

M1 m23 m21

m22

m11 m37 m31

m39 m36 m38 m35

m14

m13 m12

m32

m34 m33 M3 Figure 1

An illustrative example.

are scattered methods and after clustering, they are grouped into the same M 1 cluster; and (2) the class C3, like a God Class, performs more than one responsibility, so its methods are grouped into three different clusters: M 1, M 2 and M 3. To refactor this example to improve class structure quality, a general approach is to extract three classes that exactly correspond to the three clusters. The three candidate classes are: C1 : m11, m21, m31; C2 : m32, m33, m34, m35; C3 : m12, m13, m14, m22, m23, m36, m37, m38, m39. However, several observations can be noted here. First, after extracting the cluster M 3 to create a new class C3 , M 3 performs too many responsibilities and is still a God Class. Thus, it is unreasonable to extract the cluster M 3 to create a new class. In this example, the aim is to decompose the class C3. The class C3 is decomposed to three parts: {m31}, {m32, m33, m34, m35} and {m36, m37, m38, m39}, with the latter two parts constituting the main structure of the class. Therefore, we can extract two new classes based on the latter two parts: one class containing {m32, m33, m34, m35}, and the other containing {m36, m37, m38, m39}. We call these two parts cores of the class C3. The final issue regards extraction of the last method m31. Two strategies are possible for method m31: 1. If the cluster M 1 is extracted to create a new class, m31 should be extracted into the new class; 2. If the cluster M 1 is not extracted to create a new class, placement of method m31 is decided by the tightness between m31 and the two new classes extracted: {m32, m33, m34, m35} and {m36, m37, m38, m39}. Subsection 2.3 describes measurement of tightness. Thus, the final Extract Class suggestion for this example is as follows: C1 : m11, m21, m31; C2 : m32, m33, m34, m35; C3 : m36, m37, m38, m39. The classes C1 and C2 remain, but the class C3 is eliminated. In summary, our approach first constructed an MDG from source code and clustered the methods; the cores of each class were searched, and classes were extracted. More details of the process are described in Subsection 2.3. 2.2

Method clustering

In this paper, we restrict the subjects of our approach to object-oriented systems. Since the objective of our study is to use modularity to identify Extract Class refactoring opportunities, we consider methods as atomic elements in modularity and a cluster as a set of methods.

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:4

The proposed approach consists of the following three steps. In the first step, an MDG is constructed to represent an object-oriented program system under investigation. In the second step, a community structure detection technique is applied to decompose the MDG to clusters. In the third step, Extract Class refactoring opportunities are identified by comparing the modularity result and the real structure of the software. 2.2.1

Method dependency graph

The MDG is a language independent representation of the structure of an object oriented system. This representation includes all the interest methods in the system and the dependencies between these methods. Definition 1. The MDG is an undirected graph extracted from the source of an object-oriented system, in which nodes represent the methods and edges between two nodes represent the relationships between methods. In the MDG, each method is represented by only one node. Each edge represents the relationship between two methods. Different relationships between methods may have different impacts on software modularity. To evaluate whether different relationships between methods affect our approach, we use the two relationships: method-method-invocation and method-attribute-reference. We use W (mi , mj ) to denote the weight value of an edge in an MDG between a pair of methods, where i and j are the indexes corresponding to the methods in the MDG. In the method-method-invocation measurement, the values can be computed directly as  1, if there is an invocation between the method mi and mj , (1) W (mi , mj ) = 0, others. If the effect of transitive invocation is considered, then W (mi , mj ) between methods mi and mj can be calculated as  1, if there is a direct or transitive invocation between the method mi and mj , W (mi , mj ) = (2) 0, others. The second metric we evaluate is related to method-attribute-reference. We use a matrix called methodattribute reference (MAR) matrix to assist weight value computation of the edges. Definition 2. MAR matrix is a binary k ∗ l matrix, in which rows are indexed by the methods and columns indexed by the attributes, and k is the number of methods and l is the number of attributes in the system of interest. For 1  i  k, 1  j  l, the element eij in the matrix is computed as  1, if ith method references jth attributes, eij = (3) 0, otherwise. In this paper, we use a metric based on the similarity-based class cohesion metric (LSCC) [19] to measure method relation. LSCC is a method-method-interaction (MMI) metric that accounts for the degree of interaction between each pair of methods. This metric precisely measures the degree of interaction between each pair of methods and does not cause the metric to violate important mathematical properties [19]. With this metric, W (i, j) measures the tightness of a pair of methods, and is defined as W (mi , mj ) =

Y 

(eix ∧ ejx ),

(4)

x=1

where ∧ is the logical and relation, and Y is the number of entities of the row. The MAR matrix can also be adapted to account for transitive interactions caused by method invocations. In this case, the binary value 1 in the matrix indicates that the attribute is directly or transitively referenced by the method. In Section 3, we will discuss the impact of method invocation transitivity on Extract Class identification.

Chen L, et al.

2.2.2

Sci China Inf Sci

July 2014 Vol. 57 072103:5

Method clustering

After MDG construction, the graph is decomposed to clusters. To make our approach practicable for large-scale internetware, we use the community structure detection technique Infomap [16] first introduced in complex network areas. This technique is an information theoretic approach that reveals community structure in weighted networks. Community structure detection problems have been studied in many areas, including the Internet, World Wide Web, and citation networks. The problems involve vertices in networks which cluster into groups with a high density of intra-group edges and a lower density of inter-group edges [14,15]. For a given software, after constructing an MDG from source code, the modularity problem can be easily transformed to a community structure detection problem: the MDG is the weighted network being analyzed; methods can be treated as vertices in the weighted network; and the metric value W (mi , mj ) is the weight of the edge between the two methods mi and mj of the network. The Infomap technique uses the probability flow of random walks on a network as a proxy for information flows and decomposes the network into modules by compressing a description of the probability flow. The basic idea of this technique is that a group of nodes among which information flows quickly and easily can be clustered to a single well-connected module; the links between modules capture the avenues of information flow between those modules. More details on the technique can be found in the literature [5]. 2.3

Refactoring identification

After modularity, if the real structure of the software and the modularity results are not consistent, then the modularity results provide very useful tips on software refactoring. A straightforward method to use modularity results for Extract Class identification is to extract each cluster a class. However, as discussed in Subsection 2.1, each cluster does not always serve fully as a class in the system. Before proposing how to identify Extract Class refactoring opportunities based on modularity results, we outline several notations and definitions. In the following section, m, mi denote methods, C, Ci denote classes, M , Mi denote clusters, |C| denotes number of methods in the class C, and |C ∩ M | denotes number of methods both in the class C and the cluster M . Definition 3. For a method m, and the class C containing m, after modularity, m is grouped into a cluster Mi , m is an outlier method if there is another cluster Mj (j = i), |C ∩ Mi | < |C ∩ Mj |. Definition 4. For a class C, its methods are grouped into clusters M1 , M2 , . . . , Mk . If there is a cluster Mi (1  i  k) and for any other cluster Mj , where 1  j  k and j = i, satisfying |C ∩ Mi |  |C ∩ Mj |, then the methods of C in the cluster Mi constitue a core of the class C. Definition 3 describes methods that are scattered into clusters. Outlier methods are usually considered to be components in class extraction. Definition 4 describes methods that perform the main responsibilities of a class. The methods often constitute the main structure of a class. If a class has more than one core, the class is likely a God Class and should be decomposed. We propose the following strategies to identify Extract Class refactoring opportunities: 1. If a cluster contains no cores of any class, then it is a candidate of a class; 2. If a class contains more than one core, all of the cores are candidates. These two strategies handle two different scenarios: the former strategy extracts scattered methods from classes to construct a new class, and the latter strategy decomposes an old class to create new classes. These extractions address the two design flaws in internetware previous describe, in which (1) some methods cooperate to finish the same (or similar) responsibility, but they are scattered in different classes; or (2) a class becomes larger and larger, and including too many responsibilities. If a class has more than one core and the cores are suggested to be candidates of extraction, to determine where the outlier methods of the class should be, we use D(mj , ck ) to calculate the similarity between the method mj and the candidate ck , defined as D(mj , ck ) =

l

i=1

W (mj , mi ) , where mi is the method in ck , l is the number of methods in ck . l

(5)

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:6

Algorithm 1: Identify Extract Class refactorings from MDG Input: a method dependence graph Output: a sequence Sec of Extract Class refactoring candidates 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

identifyClassExtraction (MDG){ partition methods in the MDG to clusters M1, M2, ..., Mn; for each class , calculate its cores and the corresponding host clusters; for each cluster Mi { num = number of cores in Mi; if (num==0) { Candidate c = extractMethods (all methods in Mi); computeRank (c); add c to Sec ; } } for each class Ci { num = number of cores of Ci if (num>1) then { for each outlier method mj in Ci{ If (mj is not in any Candidate){ for each core ck{ D(mj , ck ) = distance between mj and ck; Candidate c = extractMethods (method set of the core); } put mj to the candidate ck with the max D (mj, ck); } for each candidate ck{ computeRank (ck); add c to Sec ; } } } return Sec ; } Figure 2

Identify Extract Class refactorings from MDG.

After identifying these candidates, we use the LSCC metric described in [19] to compute rank values, and then give a sequence of refactoring suggestions. Figure 2 shows the detail of the process. In Algorithm 1, extractMethods(method set) takes a method set as a class extraction candidate and computeRank(candidate) computes the rank value of a candidate. The computation of rank value is given in Subsubsection 3.2.3. Lines 4 to 11 extract methods scattered in different classes to create a new class; lines 12 to 28 decompose an old class to create new classes. Besides Extract Class refactoring, Move Method refactoring opportunities can also be identified by our approach. We can identify Move Method refactorings by two steps: (1) identify outlier methods, and take them as candidates; and (2) identify the target class in which the candidate method should be moved.

3 3.1

Experiment Research questions

We investigate the following research questions for the proposed approach: • RQ1: Is the proposed approach efficient enough for large-scale internetware refactoring is frequently conducted in software development and maintenance and efficiency plays an important role in the selection of refactoring tools. It is necessary to use a lightweight technique to develop a refactoring tool. Here we examine if the proposed approach is efficient.Timings should not vary widely across the modularity runs. • RQ2: Does the proposed approach identify meaningful Extract Class refactoring opportunities for internetware to improve the structure quality? In this paper, we use LSCC, a class cohesion metric, to evaluate the class structure quality.

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:7

• RQ3: Do different relationships between methods affect refactoring identification results? Two relationships are evaluated: method-method-invocation and method-attribute-reference. We also investigate transitive effect of method invocation on these two relationships. 3.2

Evaluation measurements

To evaluate our approach, we use two dependent variables: efficiency and effectiveness. To evaluate the efficiency of the approach, we record each running-time of the approach. To evaluate the effectiveness of the approach, we calculate the LSCC class cohesion values of the program in terms of how the technique improves software structural quality. 3.2.1

Efficiency measurement

The timings are gathered only for modularity because modularity is the most time consuming step of the approach. After modularity, timings for refactoring identification are most related to the numbers of clusters and classes, as described in Algorithm 1. The differences are minor because the numbers of clusters and classes do not vary greatly. We ran each approach many times, and recorded the minimum, max and average values. 3.2.2

Relationship measurement

We evaluate two relationships: method-method-invocation and the method-attribute-reference. Considering the transitive effect of method invocation, we use four measurements to evaluate the impacts on class extraction identification: 1. Method-method-invocation without transitive effect (MM). The metric value used to evaluate this relationship is defined in (1). 2. Method-method-invocation with transitive effect (MMT). The metric value used to evaluate this relationship is defined in (2). 3. Method-attribute-reference without transitive effect (MA). This metric value is defined in (3). The formula takes MAR matrix as input. The matrix does not account for transitive interactions caused by method invocations. 4. Method-attribute-reference with transitive effect (MAT). This metric value is also defined in (3), but the MAR matrix used in the formula accounts for transitive interactions caused by method invocations. 3.2.3

Cohesion metric

To evaluate whether our approach can identify Extraction Class refactoring opportunities that can improve class cohesion. We use LSCC to measure class cohesion. We use the LSCC value of a class C is defined as ⎧ ⎪ ⎪ ⎪ ⎨ LSCC(C) =

0,

l = 0 and k > 1;

1, (l > 0 and k = 0) or k = 1; l ⎪ ⎪ ⎪ i=1 xi (xi − 1) ⎩ , otherwise. lk(k − 1)

(6)

In formula (6), xi is the number of 1 s in the ith column of the MAR matrix; l denotes the number of methods in the matrix; k denotes the number of attributes in the matrix. We use a metric to rank the candidates, denoted as ILSCC, which is defined in the following formula:   2 ∗ ( LSCC(Cnew ) − LSCC(Cold ))   . (7) ILSCC = LSCC(Cnew ) + LSCC(Cold ) Cnew denotes the LSCC value of the new class after refactoring is applied. The new classes include the class extracted and the classes containing extracted methods before refactoring. Cold denotes the LSCC value of the class involved before refactoring.

Chen L, et al.

Table 1

Sci China Inf Sci

July 2014 Vol. 57 072103:8

Basic information of the subject program

Program

KLOC

No. of packages

No. of classes

No. of methods

No. of attributes

PKUAS

136

166

1442

11918

3600

3.3

Experiment setup

The subject shown in Table 1 for analysis is PKUAS, an internetware developed by Peking University. PKUAS is a typical internetware that can be used as a J2EE-compliant application server, and reflect both the underlying platform and EJB components [20]. We use an Eclipse Metrics plugin to collect the basic metrics of the subject program, including lines of code (in thousands), and number of packages, classes, methods and attributes. We use svn to check out the source code from the PKUAS official subversion repository. It addition to PKUAS source code, the repository also includes some library source code and as which source code belongs to PKUAS is not clear, all code is assumed to belong to PKUAS. We implemented a prototype of our approach to conduct the experiment. The platform we used is a machine with a 24 cores Intel Xeon 1.87 GHz CPU, 16 GB memory. The operation system we used is Red Hat Enterprise Linux Server 6.1. 3.4

A case study

Several examples from PKUAS are included here to illustrate the ability of our method. As Figure 3 shows, pku.as.datasvc.PoolManager is a class for Pool management. The class has seven methods, and four are non-trivial methods (with more than one statement): startService, stopService, getDatabase and getTable. The class has two non-static attributes: serviceName and dt. Both methods getDatabase and getTable access the attribute dt. Both methods startService and stopService invocate the methods start and stop in the class PoolDataSource, respectively. After clustering using our approach, the four methods are grouped into two clusters: startService and stopService in one cluster; and getDatabase and getTable in second cluster. The proposed approach suggests that these four methods can be divided into two classes, see Figure 4. As the experiment results shown in the next section indicate, this division can improve the LSCC class cohesion. We checked the revision information of the source code file, and discovered several interesting things. The methods getDatabase and getTable and the attribute dt were added by zhaojy11, not the original author. The purpose of this revision is to add the data source manage tool. Thus, the methods getDatabase and getTable are assigned to a different responsibility from the methods startService and stopService. Besides improving class cohesion, the clustering results also imply useful information for software comprehension and maintenance. The class extraction suggestion is a meaningful suggestion. This example illustrates two abilities of the proposed approach: (1) the ability to identify several opportunities to improve class cohesion metric; and (2) the ability to identify several meaningful suggestions, which may reflect useful information for software comprehension and maintenance. 3.5

Results and analysis

To evaluate the efficiency of the approach, we run the modularity process several times and compare it to the Bunch tool [21]. Bunch is a search-based software modularization tool. We compare the proposed approach to two typical search algorithms implemented in Bunch: Hill-climbing and Genetic algorithm. Search-based methods usually require some complex configurations, and here we use the default configuration of the Bunch tool. First, we calculated the minimum, maximum, average and standard deviation values for the Infomap method. Table 2 details the timing results of the Infomap method. Infomap is much faster than the other two approaches. For one run, Hill-climbing takes minutes, but Infomap takes less than one second. This suggests Infomap can be applied for large-scale internetware to identify refactoring opportunities. Genetic algorithm requires too much time and sometimes results are not obtained, thus no details are included here for this method.

Chen L, et al.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Sci China Inf Sci

July 2014 Vol. 57 072103:9

//some import information public class PoolManager extends ServiceBase implements PoolManagerMBean { private static final Log logger = LogFactory.getLog(PoolManager.class); public static List poolList = new LinkedList(); private String serviceName; //add by zhaojy11 private Data dt = new Data(); public PoolManager() throws Exception { serviceName = "Pool Manager Service"; } public void addChild(ManageableBean pds){ poolList.add(pds); } public String getServiceName () { return serviceName; } public synchronized void startService () throws Exception { logger.debug("Pool Manager is starting..."); for(Iterator it = poolList.iterator(); it.hasNext();){ PoolDataSource pds = (PoolDataSource)it.next(); pds.start(); } logger.debug("Pool Manager is started..."); } public synchronized void stopService () throws Exception{ logger.debug("Pool Manager is closing..."); for(Iterator it = poolList.iterator(); it.hasNext();){ PoolDataSource pds = (PoolDataSource)it.next(); pds.stop(); } poolList.clear(); logger.debug("Pool Manager is closed..."); } //add by zhaojy11 public ArrayList getDatabase(String DBName)throws SQLException { //show all the tables in a database dt.setAppBDName(DBName); dt.print(0); return dt.getTableNameList(); } //add by zhaojy11 public ArrayList getTable(String tableName)throws SQLException { //show the table dt.setTableName(tableName); dt.print(1); return dt.getTableColumnList(); } //add by zhaojy11 public void closeConnection()throws SQLException{ dt.closeConnection(); } } Figure 3

An example class from PKUAS.

Table 3 shows the percentage of candidates that increase class cohesion identified by our approach and Hill-climbing. Both approaches show that using the method-attribute-reference relationship affects class extraction to a greater extent than using the method-method-invocation relationship. MM, MA and MAT metrics indicate that our approach performs better that Hill-climbing.

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:10

PoolManager

PoolDatasource

getTable

PoolDataSource.st op

startService

dt PoolDataSource.st art

stopService getDatabase

Figure 4

Clustering result for the example in Figure 3.

Table 2

Timings of the modularity methods (a) Results of Infomap Time(ms)

Metric Min

Max

Average

Standard deviation

MM

537

530

533

1.6

MMT

483

487

484

1.2

MA

370

368

369

0.7

MAT

1783

1797

1787

3.6

(b) Results of Bunch Time(ms) Metric

Hill-climbing

Generic-algorithm

Min

Max

Average

Average

MM

218 ∗ 103

655 ∗ 103

408 ∗ 103

> 107

MMT

199 ∗

103

572 ∗

351 ∗

103

> 107

MA

202 ∗

103

2041 ∗

699 ∗

103

> 107

MAT

1272 ∗ 103

2425 ∗ 103

> 107

Table 3 Metric

103 103

3930 ∗ 103

Percentage of candidates that increase class cohesion Hill-climbing (%)

Our approach (%)

MM

56

67

MMT

62

55

MA

81

96

MAT

73

96

Tables 4–7 give the top 10 candidate results that can increase the LSCC metric using the different metrics. Some suggestions given by the different metrics are the same. For example, methods from the class pku.as.metadata.impl.ApplicationInfoImpl appear in the suggestion lists of using MM, MMT, and MAT metrics. Three results suggest that this class should be refactored. In the results, although most of the candidates are suggestions that decompose a class to create several new classes, one suggestion that clusters scattered methods from different classes to create a new class is given in Table 7. This suggestion is that methods from the class pku.as.metadata.impl.BaseBeanMetaData and pku.as.metadata.impl.MessageDrivenMetaDataImpl should be extracted to form a new class.

Chen L, et al.

Table 4 Candidate 1

Sci China Inf Sci

July 2014 Vol. 57 072103:11

Top 10 candidates of the identified refactorings using MM metricn

Extracted methods pku.as.ejb.im.BaseEJBContext.lifeCycleCall(InterceptorType)

ILSCC 1.87

pku.as.ejb.im.BaseEJBContext.initializeBeanInstanceInterceptor() 2

pku.as.remoting.jrmp.JRMPServerInvoker.doInvoke(InvocationRequest)

1.82

pku.as.remoting.jrmp.JRMPServerInvoker.invoke(InvocationRequest) 3

pku.as.metadata.impl.ApplicationInfoImpl.getEjbJarCount()

1.75

pku.as.metadata.impl.ApplicationInfoImpl.assertEqual() 4

pku.as.timer.TimerImpl.cancel()

1.72

pku.as.timer.TimerImpl.enlistTxIfAny() pku.as.persistence.metadata.callback.CallbackMetaData.getPostLoadCallbacks() 5

pku.as.persistence.metadata.callback.CallbackMetaData.getPostLoadMethodCallbacks()

1.60

pku.as.persistence.metadata.callback.CallbackMetaData.getPostLoadListenerCallbacks() 6

pku.as.web.tomcat5.cache.CacheResponseWrapper.getWriter()

1.49

pku.as.web.tomcat5.cache.CacheResponseWrapper.getOutputStream() 7

pku.as.datasvc.PoolManager.stopService()

1.43

pku.as.datasvc.PoolManager.startService() 8

pku.as.datasvc.PoolManager.getTable(String)

1.43

pku.as.datasvc.PoolManager.getDatabase(String) pku.as.metadata.impl.BaseBeanMetaData.buildReferenceMapping() pku.as.metadata.impl.BaseBeanMetaData.getEjbRefs() pku.as.metadata.impl.BaseBeanMetaData.getEjbLocalRefs() pku.as.metadata.impl.BaseBeanMetaData.getEnvEntrys() pku.as.metadata.impl.BaseBeanMetaData.createEnvEntryMapping() 9

pku.as.metadata.impl.BaseBeanMetaData.createEjbRefMapping()

1.33

pku.as.metadata.impl.BaseBeanMetaData.getResourceEnvRefs() pku.as.metadata.impl.BaseBeanMetaData.createEjbLocalRefMapping() pku.as.metadata.impl.BaseBeanMetaData.getResourceRefs() pku.as.metadata.impl.BaseBeanMetaData.getJndi4ejbLink(String,boolean) pku.as.metadata.impl.BaseBeanMetaData.getEnvironmentValue(String,String) pku.as.persistence.metadata.object.ClassMetaData.getFieldMetaData(String) 10

pku.as.persistence.metadata.object.ClassMetaData.getPrimaryTableName()

1.32

pku.as.persistence.metadata.object.ClassMetaData.getAllFields()

Table 8 shows the frequency of occurrence of the extracted methods by different metrics. This table reflects differences among the results using different metrics. Although only three methods appear in all the results, there is some overlap between different results. For example, 331 methods appear in both results acquired using MM and MMT metrics. The result shows that the transitive effect of method invocation has a slight impact on the identification results. Similarly, 76 methods appear in the results acquired using MA and MAT metrics. However, only 57 methods appear in the results using MM and MA metrics. This shows that method-method-invocation and method-attribute-reference relationships have different impacts on class extraction identification. After refactoring, the number of classes increase. The change of number of classes is dependent on how many suggestions given by the proposed approach are accepted. Table 9 shows the numbers of suggestions that will increase the LSCC class cohesion. In practice, it is rare that all suggestions are accepted. Even though all suggestions are accepted, the total number of classes of PKUAS is 1442, thus accepting all the suggestions does not greatly change the class number. For example, the maximum number of suggestions in Table 9 is 65. Accepting all the 65 suggestions does not have a great influence on the whole system,

Chen L, et al.

Table 5

Sci China Inf Sci

July 2014 Vol. 57 072103:12

Top 10 candidates of the identified refactorings using MMT metric

Candidate Extracted methods 1

pku.as.remoting.jrmp.JRMPServerInvoker.doInvoke(InvocationRequest)

ILSCC 1.82

pku.as.remoting.jrmp.JRMPServerInvoker.invoke(InvocationRequest) 2

pku.as.metadata.impl.ApplicationInfoImpl.getEjbJarCount()

1.75

pku.as.metadata.impl.ApplicationInfoImpl.assertEqual() 3

pku.as.timer.TimerImpl.cancel()

1.72

pku.as.timer.TimerImpl.enlistTxIfAny() pku.as.web.tomcat5.JACCRealm.hasResourcePermission(.Request,Response,SecurityConstraint[], 4

Context)

1.52

pku.as.web.tomcat5.JACCRealm.hasUserDataPermission(Request,Response,SecurityConstraint[]) 5

pku.as.web.tomcat5.cache.CacheResponseWrapper.getWriter()

1.49

pku.as.web.tomcat5.cache.CacheResponseWrapper.getOutputStream() 6

pku.as.datasvc.PoolManager.stopService()

1.43

pku.as.datasvc.PoolManager.startService() 7

pku.as.datasvc.PoolManager.getTable(String)

1.43

pku.as.datasvc.PoolManager.getDatabase(String) 8

pku.as.web.tomcat5.JACCRealm.hasRole(Principal,String)

1.14

pku.as.web.tomcat5.JACCRealm.findServletName(Request) 9

pku.as.timer.TimerImpl.getNextTimeout()

1.04

pku.as.timer.TimerImpl.getTimeRemaining() 10

pku.as.invocation.config.ServerInterceptor.getInterceptorCount()

1.00

pku.as.invocation.config.ServerInterceptor.getInterceptor(int)

and the system will not become incomprehensible. These results lead to the following observations. 1. The proposed approach is much faster than the existing search-based approaches investigated in the experiment indicating our approach is applicable for large-scale internetware. 2. On average, the effectiveness of using Infomap is higher than using Hill-climbing or Genetic algorithm. Compared with Hill-climbing or Genetic algorithm, using Infomap can identify a higher percentage of candidates that increase class cohesion. 3. Most of the candidates (more than 55%) can increase the LSCC value regardless of the metric used. This result shows that most of the candidates given by our approach can improve class cohesion. It is helpful for improving software structure quality. 4. The proposed approach can identify the two kinds of class extractions discussed in Section 2. Although most candidates belong to the first type of class extraction, some do belong to the second type. For example, in Table 7, the second suggestion given by the proposed approach contains scattered methods from two classes: BaseBeanMetaData and MessageDrivenMetaDataImpl. 5. Different relationships between methods do affect class extraction identification results. This can be concluded by the results shown in Tables 5–9. 6. The transitive effect of method invocation has a greater impact on the results acquired using method-attribute-reference (MA vs. MAT) than those acquired using method-method-invocation (MM vs. MMT). Results in Table 8 (d) shows that more than 70% (331/418) of the candidates acquired by using the MMT metric also appear in the results of the MM metric, but only 44%(76/173) of the candidates acquired using the MAT metric appear in the results of the MA metric. Overall, these results address the three research questions. First, the proposed approach is efficient enough for large-scale internetware. For the internetware with more than ten thousand lines of code the proposed approach can accomplish the modularity job within one second. Second, the approach is able

Chen L, et al.

Table 6 Candidate 1

Sci China Inf Sci

July 2014 Vol. 57 072103:13

Top 10 candidates of the identified refactorings using MA metric

Extracted methods pku.as.metadata.ApplicationInfo.getComponent(String,String)

ILSCC 1.98

pku.as.metadata.ApplicationInfo.addComponent(BeanDeploymentInfo) 2

pku.as.metadata.ApplicationInfo.addEjbCL(String,ClassLoader)

1.98

pku.as.metadata.ApplicationInfo.getEjbCL(String) 3

pku.as.metadata.ApplicationInfo.addEjbClassMap(String,List)

1.98

pku.as.metadata.ApplicationInfo.getEjbClassMap(String) 4

pku.as.metadata.ApplicationInfo.addEJBVersion(String,EJBVersion)

1.98

pku.as.metadata.ApplicationInfo.getEJBVersion(String) 5

pku.as.metadata.ApplicationInfo.addPersistenceUnitMap(String,Map)

1.98

pku.as.metadata.ApplicationInfo.getPersistenceUnitMap(String) 6

pku.as.db.PKUASDerbyDBController.shutdown()

1.98

pku.as.db.PKUASDerbyDBController.start() 7

pku.as.ejb.message.MessageDrivenInstanceContext.initializeBeanInstanceInterceptor()

1.98

pku.as.ejb.message.MessageDrivenInstanceContext.lifeCycleCall(pku.as.aop.InterceptorType) pku.as.cmp.EJBQLParser.getSelectMyNode() 8

pku.as.cmp.EJBQLParser.peekSelectMyNode()

1.98

pku.as.cmp.EJBQLParser.ignoreSelectMyNode() pku.as.cmp.EJBQLParser.backSelectMyNode() pku.as.cmp.EJBQLParser.getFromMyNode() 9

pku.as.cmp.EJBQLParser.peekFromMyNode()

1.98

pku.as.cmp.EJBQLParser.ignoreFromMyNode() pku.as.cmp.EJBQLParser.backFromMyNode() pku.as.cmp.EJBQLParser.backOrderMyNode() 10

pku.as.cmp.EJBQLParser.ignoreOrderMyNode()

1.98

pku.as.cmp.EJBQLParser.peekOrderMyNode() pku.as.cmp.EJBQLParser.getOrderMyNode()

to identify Extract Class refactoring opportunities to improve class cohesion. Regardless of the metric used, more than 55% of the suggestions can increase the LSCC metric. Most of the suggestions (over 95%) recognized by the method-attribute-reference relationship can increase class cohesion. Third, different relationships between methods do affect refactoring identification results. In the experiment, metrics using the method-attribute-reference relationship are more effective in identifying candidates that can increase class cohesion than using the method-method-invocation relationship. The transitive effect of method invocation has a slight impact on the method-method-invocation relationship, but has a greater impact on the method-attribute-reference relationship. 3.6

Threats to validity

Several factors may restrict the generality and limit the interpretation of our results and are discussed in the next two sections. 3.6.1

External validity

The first factor is that only one internetware system is evaluated in the experiment. Although the selected system is a typical and well-known internetware system, there are many other internetware systems with different properties. Whether these different properties affect our approach will not be clear until more systems are investigated. The second consideration is that the size of the selected system is approximately

Chen L, et al.

Table 7 Candidate 1

Sci China Inf Sci

July 2014 Vol. 57 072103:14

Top 10 candidates of the identified refactorings using MAT metric

Extracted methods pku.as.ejb.message.MessageDrivenInstanceContext.initializeBeanInstanceInterceptor()

ILSCC 1.98

pku.as.ejb.message.MessageDrivenInstanceContext.lifeCycleCall(InterceptorType) pku.as.metadata.impl.BaseBeanMetaData.getClientInterceptor() pku.as.metadata.impl.BaseBeanMetaData.getServerInterceptor() pku.as.metadata.impl.MessageDrivenMetaDataImpl.getDestinationJndiName() 2

pku.as.metadata.impl.MessageDrivenMetaDataImpl.getConnectionFactoryJNDIName()

1.94

pku.as.metadata.impl.BaseBeanMetaData.getJndiName() pku.as.metadata.impl.BaseBeanMetaData.getLocalJndiName() pku.as.metadata.impl.MessageDrivenMetaDataImpl.getMaxMessages() 3

pku.as.persistence.metadata.relational.RdbTable.getIndex(String)

1.93

pku.as.persistence.metadata.relational.RdbTable.addIndex(RdbIndex) pku.as.webservices.metadata.deploy.services.ServiceXmlGenerator.getBdiOfEjb(String) 4

pku.as.webservices.metadata.deploy.services.ServiceXmlGenerator.addBdiOfEjb(String,

1.92

BeanDeploymentInfo) pku.as.webservices.metadata.deploy.services.ServiceXmlGenerator.getWsdmOfEjb(String) 5

pku.as.webservices.metadata.deploy.services.ServiceXmlGenerator.addWsdmOfEjb(String,

1.92

WebserviceDescriptionMetaData) 6

pku.as.webservices.metadata.deploy.services.ServiceXmlGenerator.getWsOfEjb(String)

1.92

pku.as.webservices.metadata.deploy.services.ServiceXmlGenerator.addWsOfEjb(String,Webservice) 7

pku.as.ejb.message.MessageDrivenInstanceContext.clean()

1.91

pku.as.ejb.message.MessageDrivenInstanceContext.bindEJBContext() 8

pku.as.metadata.impl.ApplicationInfoImpl.hasWebJar()

1.87

pku.as.metadata.impl.ApplicationInfoImpl.getWebJarCount() pku.as.mail.JavaMailService.messageDelivered(TransportEvent) 9

pku.as.mail.JavaMailService.messageNotDelivered(TransportEvent)

1.87

pku.as.mail.JavaMailService.messagePartiallyDelivered(TransportEvent) 10

pku.as.launcher.felix.PKUAS.checkRootDir(File)

1.83

pku.as.launcher.felix.PKUAS.init()

one million of lines. In industry practice, there are systems with many millions of lines. The efficiency of our approach for these systems still needs to be investigated. The third is that although some results are analyzed manually, we cannot assure that there is no error. 3.6.2

Internal validity

One factor is that although the LSCC metric has been verified as a cohesion metric with good mathematical properties and good teaching for refactoring, it cannot reflect all software cohesion properties. Many other metrics play an important role in reflecting class structure quality. Using other metrics for ranking, we may get different refactoring candidate suggestions. However, as the purpose is to identify refactorings that can increase class cohesion, the LSCC metric can satisfy this requirement. If users choose to identify refactoring opportunities that satisfy different requirements, other metrics are available. The second factor is that our approach uses the community detection method Infomap for modularity. We do not investigate the modularity results acquired by Infomap. If Infomap cannot decompose software to good clusters, its use in for class extraction identification is not suitable. However, Infomap has shown great efficiency that enables the class extraction identification applicable for large-scale systems and it also has shown its good ability for modularity in complex network areas. We believe that it is applicable

Chen L, et al.

Table 8

Sci China Inf Sci

July 2014 Vol. 57 072103:15

Frequency of occurrence of the methods suggested to be extracted (a) Methods that emerge in all the four results

Metric

MM/MMT/MA/MAT

No. of methods

3

(b) Methods that emerge in only one of the results Metric

MM

MMT

MA

MAT

No. of methods

617

418

457

173

(c) Methods that emerge in two of the results Metric No. of methods

MM/MA

MMT/MAT

57

23

(d) Methods that emerge in two of the results Metric No. of methods

MM/MMT

MA/MAT

331

76

(e) Summary of the frequency of occurrence of the extracted methods Frequency of occurrence All candidates

1

2

3

4

815

380

44

3

Table 9 Metric No. of suggestions

Numbers of suggestions

MM

MMT

MA

MAT

65

35

58

22

for our objective. In the experiment, Infomap also demonstrates its good ability for class extraction identifications. The third factor is that the proposed approach does not consider extracting attributes. In general, methods represent responsibilities of a class, and attributes represent data of a class. In this paper, we focus on restructuring responsibilities of software and ignore restructuring attributes. Refactoring attributes will be the subject of a future work. Finally, the proposed approach gives refactoring suggestion, but does not guarantee the correctness of the refactoring. In the traditional view, refactoring is a behavior-preserving program transformation. Our approach takes the entire system as input, and attempts to provide refactoring suggestions in the context of the entire system. If programmer accepts a refactoring suggestion, he should apply and guarantee the correctness of the refactoring.

4

Related work

Over recent years, several approaches have been proposed for code refactoring to improve software structure quality. In early work on code refactoring by Opdyke [22] describes many types of refactorings and gives a refactoring application framework. Although he did not discuss a refactoring named Extract Class, he analyzed a type of refactoring for capturing aggregations and reusable components. This was a prototype of Extract Class. A famous work on refactoring was conducted by Fowler [3], in which he collected different types of refactorings and systemized a list of categories. The list is still referenced today. As a commonly used refactoring, Extract Class is the subject of much study. Lucia et al. propose an approach to identify Extract Class refactoring opportunities using structural and semantic cohesion

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:16

measures [7,8]. The objective of their approach is similar to our approach. It aims to improve class cohesions such as LCOM2, C3 and CBO. Besides the structural relationships discussed in this paper, e.g., attribute references, the authors use semantic relationships between methods to help identify refactoring opportunities. The semantic information used is based on conceptual similarity between methods (CSM). For a given class, they use a MaxFlow-MinCut algorithm to split the class. They also conducted empirical evaluation to validate the benefits provided by the combination of semantic and structural measures. Their method needs some configurations to set some weights and thresholds, and these settings may have an influence on the results. Suitable configurations can occasionally be difficult to determine. Tsantalisa et al. present a fully automated method for identifying Extract Class opportunities [6]. They implemente a tool as an Eclipse plugin. Their method performs the same three steps as our method: (a) clustering entities to be extracted; (b) ranking of the identified opportunities, and (c) applying the selected candidate. However, the input for their method is one class, and not the entire system. The authors use an adopted hierarchical agglomerative algorithm for method clustering and the distance metric used by their algorithm is the Jaccard distance. To define the Jaccard distance between two class members, they use notion entity sets borrowed from a previous study [5]. One of most important advantages of their method is that the method is fully automatic. However, like the method proposed by Andrea De Lucia et al., this method deals with only the God Class and cannot identify scattered methods from different classes. Another interesting work on Extract Class opportunities identification is presented by Bavota et al. [10]. They use game theory to help identify Extract Class opportunities. The approach is based on the concept that solutions to problems are found by balancing competing goals, such as quality versus cost. Game theory can help make decisions dealing with these contrasting goals. The authors modeled the extraction problem as a game involving two players, in which each player is in charge to create a new class selecting methods from the original class. In particular, they use game theory to recommend Extract Class refactoring opportunities, finding a balance between class cohesion and coupling when splitting a class into several parts. The refactoring operation sequence is computed using a refactoring game, in which the Nash equilibrium defines the compromise between cohesion and coupling. A common point of all the work on Extract Class refactoring identification discussed above is that they can identify opportunities for a God Class. The input for the methods is one class, but not the entire system. These methods take a God Class as input and split it into different parts to create new classes. Notably, our approach can identify Extract Class opportunities not only for a God Class, but also for scattered method from different classes. Simon et al. presented a metric-based refactoring identification and also use a modularity-like method to identify refactoring opportunities [23]. There are some differences between their work and ours. First, the metric used by Simon et al. for clustering is different from those explored in this paper. The distance between two methods is similar to the distance used previously [6]. The authors also use entity sets and Jaccard distance to compute distance. Second, they use a spring embedder program to cluster methods and produce 3D models. The advantage is that it provides visualized results for users, but the process is slow. Third, they did not provide a concrete algorithm for identifying refactoring opportunities, but only described some strategies, and therefor it cannot be automated directly. Fourth, the authors only provided illustration examples. Whether the method is applicable for large-scale software is not known. We conducted a quantity experiment study and explore an in-depth investigation on the impacts of different relationships between methods. A similar work to the approach described in this paper is conducted by Li et al. [24]. They also use community structure detection method for software refactoring identification and propose a method to recondition class structure of object-oriented software systems. The authors use the method presented by Newman for modularity, and identify Move Method refactorings. However, they do not discuss how to identify Extract Class refactorings.

Chen L, et al.

5

Sci China Inf Sci

July 2014 Vol. 57 072103:17

Conclusion

This paper presents a novel approach to identify Extract Class refactoring opportunities for internetware systems. This approach takes class cohesion optimization of software structure as its objective and avoids some drawbacks of the existing methods. Given an internetware system, the approach constructs an MDG from the source code of the system and clusters entities in the MDG; after comparing the modularity results and real structure of the system, the approach gives a set of ranked Extract Class refactoring candidates. To evaluate the efficiency of the proposed approach, we also conducted an empirical study on PKUAS. These results are valuable for helping developers to choose different measurements to identify refactoring opportunities to satisfy their specific requirements. In the future, we will explore more clustering methods and gather more measurements to evaluate method and attribute relationships. More empirical studies should be conducted to evaluate the approach. The proposed approach focuses on extracting methods for Extract Class. In addition to methods, attributes also play an important role in performing responsibilities of a class, therefor we also plan to consider attributes as entities to be extracted. This may improve the identification results.

Acknowledgements This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61003020, 61170071, 61003156, 61073029, 61321491), and Jiangsu Natural Science Foundation (Grant No. BK2011190).

References 1 Mei H, Huang G, Zhao H Y, et al. A software architecture centric engineering approach for internetware. Sci China Ser-F Inf Sci, 2006, 49: 702–730 2 Lv J, Ma X X, Tao X P, et al. Explicit environmental constructs for Internetware. Sci Sin Inform, 2013, 43: 1–23 3 Fowler M. Refactoring: Improving the Design of Existing Code. Indianapolis: Addison-Wesley, 1999 4 Mens T, Tourwe T. A survey of software refactoring. IEEE Trans Softw Eng, 2004, 30: 126–139 5 Tsantalis N, Chatzigeorgiou A. Identification of move method refactoring opportunities. IEEE Trans Softw Eng, 2009, 35: 347–367 6 Fokaefsa M, Tsantalisa N, Stroulia E, et al. Identification and application of extract class refactorings in object-oriented systems. J Syst Software, 2012, 85: 2241–2260 7 De Lucia A, Oliveto R, Vorraro L. Using structural and semantic metrics to improve class cohesion. In: Proceedings of IEEE International Conference on Software Maintenance, Beijing, 2008. 27–36 8 Bavota G, De Lucia A, Oliveto R. Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J Syst Software, 2011, 84: 397–414 9 Bavota G, De Lucia A, Marcus A, et al. A two-step technique for extract class refactoring. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering, New York, 2010. 151–154 10 Bavota G, Oliveto R, De Lucia A, et al. Playing with refactoring: identifying extract class opportunities through game theory. In: Proceedings of IEEE International Conference on Software Maintenance (ICSM), Timisoara, 2010. 1–5 11 e Abreu F B, Poels G, Sahraoui H A, et al. Quantitative Approaches in Object-Oriented Software Engineering. London: Hermes Penton Science, 2003 ´ Cinn´ 12 O’Keeffe M, O eide M. Search-based refactoring for software maintenance. J Syst Software, 2008, 81: 502–516 13 Seng O, Stammel J, Burkhart D. Search-based determination of refactorings for improving the class structure of object-oriented systems. In: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO’06), New York, 2006. 1909–1916 14 Newman M E J. Modularity and community structure in networks. Proc Natl Acad Sci, 2006, 103: 8577–8582 15 Rosvall M, Bergstrom C T. An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci, 2007, 104: 7327–7331 16 Rosvall M, Bergstrom C T. Maps of information flow reveal community structure in complex networks. Proc Natl Acad Sci, 2008, 105: 1118–1123 17 Maruyama K, Shima K. Automatic method refactoring using weighted dependence graphs. In: Proceedings of 21st International Conference on Software Engineering, Los Angeles, 1999. 236–245 18 Chidamber S R, Kemerer C F. A metrics suite for object oriented design. IEEE Trans Softw Eng, 1994, 20: 476–493 19 Al Dallal J, Briand L C. A precise method-method interaction based cohesion metrics for object-oriented classes. ACM

Chen L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072103:18

Trans Softw Eng Meth, 2012, 21: 8–34 20 Huang G, Sun L S. An access control framework for reflective middleware. J Comput Sci Technol, 2008, 23: 895–904 21 Mitchell B S, Mancoridis S. On the automatic modularization of software systems using the bunch tool. IEEE Trans Softw Eng, 2006, 32: 193–208 22 Opdyke W F. Refactoring Object-Oriented Frameworks. PhD Thesis. University of Illinois at Urbana-Champaign, 1992 23 Simon F, Steinbruckner F, Lewerentz C. Metrics based refactoring. In: Proceedings of 5th European Conference on Software Maintenance and Reengineering, Lisbon, 2001. 30–38 24 Pan W F, Li B, Ma Y T, et al. Class structure refactoring of object-oriented softwares using community detection in dependency networks. Front Comput Sci Chi, 2009, 3: 396–404

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072104:1–072104:19 doi: 10.1007/s11432-014-5117-5

Implementation decision making for internetware driven by quality requirements WEI Bo1 , JIN Zhi2 ∗ , ZOWGHI Didar3 & YIN Bin1 1Academy 2Key

of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; Lab. of High Confidence Software Technologies, Ministry of Education, Peking University, Beijing 100190, China; 3Faculty of Engineering and IT., University of Technology, Sydney 2042, Australia Received February 22, 2014; accepted April 27, 2014

Abstract Internetware is an emerging software paradigm in the open, dynamic and ever-changing Internet environment. A successful internetware must demonstrate acceptable degree of quality when carrying out its functionality. Hence, when internetware is being dynamically constructed, making implementation decisions to satisfice the quality requirements becomes a critical issue. In the traditional software engineering, quality requirements are usually refined stepwise by sub-requirements utilizing goal modeling perspective, until some potential functional design alternatives are identified. The goal-oriented paradigms have adopted graphical goal models to reason about quality requirements and proposed qualitative or quantitative reasoning schemas. However, these techniques may become unviable due to the ever-changing operating environment and demands for run-time decision making. In this paper, we propose an approach for implementation decision making driven by quality requirements for internetware. It focuses on the symbolic formula representation of requirements goal models with the tree structure, which is of well-defined syntax and clear traceability. Furthermore, we explore some reasoning rules which effectively automate each reasoning action on the formulae. This supports multiple-factor decision making. A case study is also provided to illustrate our proposed approach. We have developed a supporting tool based on our theoretical approach that we also present in this paper. Keywords

design alternatives, decision making, implementation decision, internetware, quality requirements

Citation Wei B, Jin Z, Zowghi D, et al. Implementation decision making for internetware driven by quality requirements. Sci China Inf Sci, 2014, 57: 072104(19), doi: 10.1007/s11432-014-5117-5

1

Introduction

Internetware is an emerging software paradigm in the open, dynamic and ever-changing Internet environment. It is constructed by a set of autonomic software entities (agents, resources, etc.) distributed over the Internet, and a set of connectors enabling the collaboration among these entities in various ways [1]. A successful internetware must demonstrate acceptable degree of quality when carrying out its functionality. As the environment changes, evolution of the internetware should be proceeded by making implementation decisions dynamically on unviable distributed software entities, so that proper entities accommodating the quality requirements can be selected automatically and new architecture can be identified. Requirements provide opportunities for bridging the gap between internetware quality ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:2

and internetware architecture, by providing generic guidelines for evaluating software quality and design decisions [2]. As widely acknowledged in the traditional software engineering, the quality requirements and corresponding implementations can cast impacts on software design decisions and the software architecture. Also, quality requirements are vital criteria for selecting the final software product among alternatives. Hence, when internetware is being dynamically constructed, making implementation decisions to satisfice1) [3] the quality requirements becomes a critical issue for the construction and evolution of internetware. The issue related to the quality requirements in internetware is intrinsically a software quality requirements issue. In traditional software engineering, goal-oriented approach has been proven successful in assisting requirements and software engineers in their activities by enabling novel types of reasonings and analysis over quality requirements [4,5]. Quality requirements are often hard to quantify and evaluate, thus researchers working on goal-oriented approaches have proposed the concept of softgoal to model them [6]. Goal-oriented approaches support the graphical modeling method for quality requirements and the node-oriented analytical activities on the models. As the most influential reasoning and analysis algorithm, the label propagation process was proposed in 1992 [6]. With this process, an implementation decision will be qualitatively evaluated. Label propagation is also adopted in the strategic rationale model of i∗ / TROPOS [7,8] and the qualitative/quantitative/hybrid analysis in GRL [9]. Internetware quality requirements need run-time implementation decision making on all candidates, which necessitates the automatic exploration on quality requirements design alternatives. However, in corresponding goal-oriented paradigms there exist some deficiencies which hinder their contribution to internetware research. The first is that reasoning mechanisms for evaluating implementation decisions in the NFR Framework [10,11] and i∗ [12,13], are interactive or semi-automated. Both of them usually need extra interactions with stakeholders when performing reasoning. The more nodes are created, the more time-consuming and even impossible the interactions become. So the reasoning may be very difficult if the model involves many nodes in the Internet environment. The second is that SAT solvers employed by TROPOS only support the single quantitative decision making factor for implementation of software quality requirements. However, the Internet may provide a large amount of candidate solutions on which only single quantitative decision factor is insufficient. Besides, reasoning in TROPOS exists for quantitative and qualitative analysis algorithms, but in the absence of the concept of strategy (implementation decision) [4]. The third is that GRL which does not use the SAT solvers, proposes some algorithms for automatically evaluating the nodes and actors, but does not discuss how to select the currently best implementation for software quality requirements. Since GRL does not provide the candidate set for decision making, it may potentially return us how well the quality requirements of an internetware is satisficed given a specific implementation decision, rather than whether a better implementation decision to satisfice the quality requirements of the internetware exists. Additionally, all existing work mentioned above only address the reasoning issue from the algorithmic perspective, and do not tackle with the decision making issue for implementation of quality requirements systematically. What kind of process should be followed is not presented in the state of the art, to make an implementation decision especially for an internetware at run-time. As stated in [14], the emergence of internetware requires innovations of the traditional software development methods and techniques. In this paper, we employ some contributions from traditional software engineering and propose a process for making implementation decisions for internetware quality requirements. This process supports the automated reasoning on candidate implementation decisions and multiple-factor decision making. Hence it facilitates the dynamic construction for the internetware in the ever-changing environment by real-time trade-offs. First, quality requirements models are generated with the tree structure, including the quality requirements as the root node, some detailed quality requirements as subnodes, design alternatives as leaf nodes with their impacts to parents. These tree structure models will be transformed with the formal language, to facilitate the subsequent reasoning. 1) The term “satisfice” is defined by Herbert Simon in [3] and commonly used in software quality requirements research. Here it means that quality requirements are sufficiently addressed, and a good enough solution is sought. Another term “satisfy” usually means solutions are perfect with nothing sacrificed, which is not suitable for software quality requirements.

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:3

Then formula models will be reasoned in a two-step process, which can be grounded as the qualitative process, the quantitative process and integrated process. Different reasoning schemas have their unique algorithms and rules, to identify the final feedbacks of different implementation decisions. Finally, the decision making process allowing for more factors delivers the best implementation decision based on all candidates. Our contributions are multifold. First is that we propose a generic treatment for constructing an internetware from the viewpoint of quality requirements. This is a relatively new topic compared with the existing internetware research. Second, we propose a symbolic modeling language Σ based on the tree models, not the graphical models adopted in the conventional work to represent the quality requirements models. This language provides mapping rules from a goal tree model to the corresponding formula model. Third, we devise some reasoning rules which can realise the reasoning actions. These rules employ the automated reasoning algorithms in our previous work [15], and consist of the core of the whole reasoning process. Forth, we also present a process model to provide a systematic solution for selecting the final implementation decision for internetware quality requirements. Last but not least, we have developed a tool to automatically evaluate and explore different implementation decisions for internetware quality requirements. It also supports the formalization of goal tree models. The paper is structured as follows. Section 2 briefly introduces the core ontology used in our modeling and reasoning work. Section 3 presents the implementation decision making process, in order to deliver an easy understanding of this paper. Section 4 gives the formula model generation process with Σ language. Section 5 illustrates the formula model reasoning mechanism, including three types of reasoning schemas. Section 6 applies our process to the real case. Section 7 presents the supporting tool rΣ. Section 8 compares our work with the related works. Section 9 envisages some future works and concludes the whole paper.

2

Core ontology

This section gives a brief introduction to the core ontology during goal-oriented modeling for software quality requirements. The core ontology defines some key concepts that are captured during the modeling process and the mutual relationships between concepts. This ontology will also be used in the implementation decision making process for internetware. Key concepts are: NFR softgoal captures software quality requirements such as reliability, security, accuracy and performance from the goal’s perspective. They are not about what the software will do, but how well the software will carry out its functions. Usually, quality requirements are cross-cutting concerns. They can be system-level type, or can also be component-level or more detailed. In our work, we use this concept to model quality requirements, where each quality requirement should be interpreted as a corresponding NFR softgoal, to demonstrate that each quality requirement is envisioned or expected by related stakeholders. In the graphical modeling process, a cloud sign is used for NFR softgoals. Operationalization softgoal captures the knowledge about requirements implementation. As introduced above, the NFR softgoal is always cross-cutting. Hence, an NFR softgoal is usually difficult to be assigned to one or several specific entities, components or sub-systems. According to the refinement process, operationalization softgoal can be elicited for the acceptable implementation decision(s). Note that softgoal is a relative concept. An NFR softgoal can be changed to an operationalization softgoal if an agreement is reached. In the graphical modeling process, a bold cloud sign is used for operationalization softgoals. Contribution is a concept that reflects the impacts of operationalization softgoals. An operationalization softgoal is not necessarily sufficient for implementing its related quality requirements. It can help in fully or partially implementing the related quality requirements. Similarly, an operationalization softgoal can contribute to full or partial failure of the related quality requirements. We call this impact from the single operationalization softgoal to the NFR softgoal Contribution. Basically, contribution can be qualitative, or can be quantitative. In our work, there are four types of qualitative contributions: MAKE, BREAK, HELP, and HURT, denoted by ++, −−, + and − respectively. Regarding the quantitative

Wei B, et al.

Sci China Inf Sci

Refined by

July 2014 Vol. 57 072104:4

NFR softgoal

An Operationalization attribute of decomposition

Refined by

Produces

AND/OR decomposition

Satisficing status An attribute of Produces

Side effect

Operationalization softgoal Shows

Identified from

Contribution

Identified from Impacts

Figure 1

Meta model for quality requirements modeling.

contributions, the classification relies on the selected measuring scale. Satisficing status is a concept used for characterizing the implementation state of each softgoal. The fundamental difference between a softgoal and a “hard” goal is that softgoals are never fully satisfied. Rather, they are better to be termed “satisficed”. So, to represent the degree of “satisficed”-ness, satisficing status is introduced. Satisficing status can be qualitative or quantitative. In our work, there are six types of qualitative satisficing statuses: fully satisficed, weakly satisficed, fully denied, weakly denied, unknown and conflicting, denoted by , w+ , ×, w− , ? and  respectively. Unknown status means the satisficing status still remains unclear, and conflicting status means that the positive and negative contributions from offsprings are equally competing. For quantitative satisficing statuses, they also rely on the selected measuring scale. We use S to denote the set of satisficing statuses. There are three types of relationships that are defined as follows: AND/OR-decomposition is a kind of relationship between NFR softgoals at two adjacent levels. AND(OR)-decomposition means the implementation(denial) of parent requires the implementations(denials) of all children, but OR(AND)-decomposition means that the implementation(denial) of parent requires the implementation(denial) of just arbitrarily one child. In the graphical modeling process, AND-decomposition and OR-decomposition are represented by a line with one vertical dash and a line with two vertical dashes respectively. Operationalization decomposition is a kind of relationship between operationalization softgoals and NFR softgoals. According to the operationalization decomposition, each NFR softgoal should be refined by its corresponding operationalization softgoals with positive contributions. Each operationalization softgoal provides one solution for its parent node, and hence the operationalization decomposition can be viewed as a special type of OR-decomposition. Note that the operationalization decomposition process decides whether the acceptable implementation decision can be sought. It is impossible to identify an implementation decision for a quality requirement without any operationalization softgoals. In the graphical modeling process, operationalization decomposition is represented by a line with the positive contributions (“++”, “+”). Side effect is another kind of relationship also between operationalization softgoals and NFR softgoals. An operationalization softgoal is only responsible for the softgoals where this operationalization softgoal is modeled for the first time. But actually, the operationalization softgoals may have the implicit impact onto other NFR softgoals. According to the side effect process, an operationalization softgoal can be explicitly linked to other NFR softgoals, with positive or negative contributions. Side effect process is also critical for identifying the final acceptable implementation decision, as it provides extra information about decision making. In the graphical modeling process, side effect is represented by a line with the possible contribution signs. Figure 1 depicts the meta model for quality requirements modeling with the core ontology.

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:5

Model generation Software quality requirements

Σ-formulae for quality requirements

Σ-transformation

Satisficing status assignment

Model reasoning Qualitative process Different satisficing statuses of quality requirements

No Quantitative process

Yes

Models fully quantified?

Σ-formulae with satisficing status semantics

Partially Integrated process

Subjective information (preference, etc.)

Figure 2

3

Decision making

Final implementation decision

(Color online) Implementation decision process for quality requirements.

Implementation decision making process

Figure 2 presents the implementation decision process for quality requirements. It is assumed that the stakeholder has the capability of eliciting quality requirements and modeling them with goal tree models. Hence the initial input to the process is the quality requirements goal models represented in tree structure. To obtain the final implementation, the process includes four steps: Σ-transformation is the first step for the whole implementation decision making process. When the software quality requirements have been modeled in the tree structure, they can be rewritten into Σ-formulae based on the semantics and syntax of Σ language. The formulae of each quality requirements model can be obtained manually or by our tool detailed later. If stakeholders are used to formal modeling style, like using KAOS method [16]. This step can be skipped by directly inputting the formula model. Further details are presented in Section 4. Satisficing status assignment designates a quantitative or qualitative value to each design alternative in Σ-formula. Principally a design alternative can be satisficed or can be denied as well, hence many assignment options may exist and reasoning may be complex. One possible heuristic for decreasing the complexity may be “satisfice the design alternatives with the positive impact and deny the design alternatives with the negative impact”. Finally, Σ-formulae with satisficing status semantics will be obtained. Model reasoning is the critical step in the whole process. If the model’s contribution and satisficing statuses are fully quantified, the quantitative process is suitable. If they are partially quantified, the integrated process is appropriate. If they are not quantified at all, the qualitative process should be considered. For qualitative reasoning, implementation decisions make the quality requirements fully satisficed, and weakly satisficed are usually acceptable. Similarly, for quantitative reasoning, implementation decisions making the quality requirements at a specific satisficing status numeric value and above are worth considering. This value should be agreed based on the problem domain. Due to the fact that different types of quality requirements are usually considered at the same time, it is recommended that

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:6

the final implementation decision can satisfy constraints from other quality requirements and support all quality requirements as possible as permitted. Hence, we borrow the reasoning idea from [17]. That is, preferentially perform reasoning on the model of the fewest nodes, then apply the results to the remaining models. Further details are presented in Section 5. Decision making is the last step to obtain the final implementation decision. After model reasoning, all candidate implementation decisions can be obtained. To make the final decision, more subjective information such as preference (attitude), priority on specific nodes can be used as the further decision factors. As pointed in [18] that “human factors have more significant impacts on the internetware development process, e.g., the user-generated contents and applications, social-networking, and collective intelligence from open source communities”. These subjective information can be captured during the quality requirements modeling, as proposed in [19,20], or after the decision making process is launched. Also, these criteria can be quantitative, or can be qualitative. They can be weight-based, or can be rank-based. We observe that some researchers have made attempts to apply mature decision theories to quality requirements implementation selection (but not specifically in the internetware domain) when qualitative models exist. For example, Elahi et al. use even swaps method for this job when the numeric measurement for quality requirements models is missing [21,22]. This is a topic about multiple factors (attributes) decision making which is out of scope of our main efforts. Any available techniques from decision theory, such as linear cumulative scoring, analytic hierarchy process or even swaps method are all welcomed.

4

Formula model generation

The formula of goal tree model is generated by the Σ-transformation process, which is based on the semantics and syntax of the language Σ [23]. Provided a quality requirement is modeled with the goal tree structure, it can be transformed to a formula in Σ, and vice versa. Σ’s semantic is reflected by these modeling primitives, as shown below: . Σ = NSG, OSG, CNT1 , CNT2 , where (1) NSG: is the set of all NFR softgoals identified by stakeholders and cannot be implemented directly. (2) OSG: is the set of all operationalization softgoals which is able to be assigned to certain entities, resources, components or sub-systems. (3) CNT1 = {++, −−, +, −} ∪ [−1.0, 1.0]: is a set of connectives which can be related to the elements of OSG to denote the contribution relationship. It can be qualitative, with four types of discrete values. It can also be quantitative with a continuous interval where −1.0 means fully negative contribution and 1.0 means fully positive contribution. Or it can be mixed up. (4) CNT2 = {∧, ∨, (, )}: is a set of connectives which denotes the decomposition relationship among elements in NSG and OSG, where ∧ is for AND-decomposition, ∨ is for OR-decomposition and operationalization decomposition. Besides, the brackets encapsulate the same-level refinement result for the node and support their own nesting. For NSG and OSG, the structured terminology is adopted. That is, each softgoal is termed by a combination of “type” and “topic”, of which type means which kind of quality is being discussed and topic means where or whose type (quality) is being concerned. For more detail, please see [11] and [23]. Its formation rules characterize the syntax, as listed below where Atom(Σ) = NSG∪OSG and Form(Σ) denotes all the formulae in Σ: 1) Atom(Σ) ⊆ Form(Σ); 2) nsg1 ( nsgi ), nsg2 ( ∗ osgj ) ⊆ Form(Σ) , where i = 2, . . . , n, j = 1, . . . , n, nsg1 , nsg2 , nsgi ∈ NSG, osgj ∈ OSG, ∗ ∈ CNT1 and denotes ∨ or ∧ operation on finite elements of NSG or OSG; 3) all formulae generated by finite iteration from 1) and 2) belong to Form(Σ). Note that the formulae in Σ are without semantics about the satisficing status. For the reasoning’s respect, we add braces {} after each element in Atom(Σ). They contains the input of the specific satisficing status of each softgoal. The reasoning target of this kind is called Σ-formula with satisficing status semantics. Hereinafter, a0 , a1 , an ∈ Atom(Σ), s1 , sn ∈ S, ∗1 , ∗n ∈ CN T1 , and · · · denotes the

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:7

Confidentiality[System]

preLoginControl[System]

postLoginControl[System]

StorageControl[System] ++

TransmissionControl[System] ++

AccessPrioritization[System] ApplySSL[Data] Figure 3

+

Signature[System]

Authentication[Account] +

++

Multi-Facts[Account] Biometric[Account]

(Color online) Confidentiality requirements model,

remaining parts which are not our concern. The form of the reasoning target is: a0 {s0 }(· · · ai {si } · · · ). If no specific satisficing statuses are assigned, we assume satisficing status of each node is unknown (?). Example 1. Assume we have a scenario that confidentiality requirements will be modeled and transformed to Σ-formula. The goal tree model is as shown in Figure 3. Then this model can be written in Σ language: Confidentiality[System] (postLoginControl[System] (StorageControl[System] (++PhysicalIsolation[Data])∧ TransformationControl[System] (++ApplyingSSL[Data]∧+Signature[Data]))∨ preLoginControl[System] (Authentication[Account] (+Multi-Facts[Account]∨++Biometrics[Account]))) Noticeably, the graphical model has been transformed to the structure-preserving2) formula by using indentation, which makes formulae more readable for human. We can always locate the design alternatives by figuring out the atoms with contribution signs. In the following examples, the formula representation will be directly generated without indentation for each line, since our reasoning rules can recognize the representation.

5

Formula model reasoning

Formula model reasoning stage includes: qualitative schema, quantitative schema and integrated schema. All reasoning schemas share the two-step process [15]. The first is to transform the satisficing status into the corresponding effect, implemented by Status Transformation Function, denoted by fst . The second is to infer the satisficing status from all the collected effects, implemented by Effect Inference Function, denoted by fei . These two steps can be repeated till the root node. Each schema has its own reasoning mechanism and reasoning rules. The meta-level rule is as: Satisficing Statuses of Children imply Satisficing Status of Their Parent. On the one hand, it reflects the basic idea of the label propagation process. Stepwise, the satisficing status of the root node can be finally obtained by the satisficing statuses of its offsprings. On the other hand, it facilitates the adoption of automatic algorithms in our previous work. Besides, the form of meta-level rule guarantees the multiple iterations 2) Each subnode can be identified from the formula’s appearance, using indentation of each line. Hence the structures of the graphical models are preserved.

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:8

of itself. That is, in a goal tree model, all nodes except the root and leaves are both children and parents simultaneously. Hence the reasoning result in the current round can be used as the input of the next round. All reasoning rules are grounded from this meta-level rule. Whether the reasoning result is the valid satisficing status of the parent node depends on the embedded functions detailed in [15]. Also, our reasoning rules have discussed all decomposition types in each schema. Each decomposition type has a corresponding reasoning rule. Hence reasoning actions can be carried out automatically. 5.1

Qualitative schema

This type of reasoning has gained much attention in the goal graphical models using the interactive way [6,8]. This paper puts emphasis on the goal tree model using the automated way, which is based on the formula representation. 5.1.1

Reasoning mechanism

We use the declaration of closed world assumption (CWA) before reasoning to achieve the automatic reasoning objective. It is an important concept in knowledge engineering and database research [24]. We use this assumption to avoid implicit knowledge during reasoning and interaction with stakeholders, so that the automated reasoning process on each selected implementation decision can be guaranteed3). In the qualitative schema the status transformations function is grounded by the function under CWA: closed . The effect inference function is grounded by the functions under AND-decomposition and ORfst qual qual decomposition: fei,∧ , and fei,∨ . These functions provide the mapping algorithm about qualitative satisficing statuses from the child node to the parent node. Correspondingly, the entailment relationship in the meta-level rules should be grounded as the entailment relationship under CWA: CWA . 5.1.2

Qualitative reasoning rules

Based on the assumption and two types of decompositions, the meta-level rule can be instantiated as two types of reasoning rules in Σ: rules for ∨-decomposition under CWA, and ∧-decomposition under CWA. Rule 1a (∨, OSG, CWA). qual closed closed (fst (s1 , ∗1 ),. . .,fst (sn , ∗n ))}. a0 {?}(∗1 a1 {s1 }∨· · · ∨∗n an {sn }) CWA a0 {fei,∨ Rule 1a captures the reasoning action when a node (a0 ) is ∨-decomposed by subnodes (a1 , . . ., an ) under the closed world assumption. That is, the satisficing status of a0 can be calculated by adopting the operation on the right side of the entailment sign. Note that Rule 1a is only suitable for operationalization softgoals as the refinement result. Regarding the NFR softgoals without contribution sign, the reasoning rule can be specialized as below: Rule 1b (∨, NSG, CWA). qual closed closed (fst (s1 , ++),. . .,fst (sn , ++))}. a0 {?}(a1 {s1 }∨· · · ∨an {sn }) CWA a0 {fei,∨ Rule 1b can be viewed as the special case of Rule 1a where ∗1 and ∗n in Rule 1a are substituted by ++4) . Rule 2 (∧, CWA). qual closed closed a0 {?}(a1 {s1 }∧· · · ∧an {sn }) CWA a0 {fei,∧ (fst (s1 , ++),. . .,fst (sn , ++))}. Rule 2 captures the reasoning action when a node (a0 ) is ∧-decomposed by subnodes (a1 , . . ., an ) under the closed world assumption. Because of the syntax of Σ, the operationalization softgoals cannot be 3) From the theoretical perspective, the closed world assumption implies that the implicit representation of negative facts presumes total knowledge about the domain being represented. In requirements engineering, the closed world assumption usually serves in the evaluation and verification stage when the domain is assumed to be closed and analyzed based on the current information. Especially, this assumption is more suitable for predefined problem domains, such as the security requirements which focuses on some specific types of threats and vulnerabilities though uncertainty exists. 4) It is true that if AND(OR)-decomposed children are jointly satisficed (denied), their parent is fully satisficed (denied), and if one arbitrary AND(OR)-decomposed child is denied(satisficed), their parent is fully denied(satisficed). So we can assume that each child in AND/OR-decomposition has potentially full positive contribution (++) to its parent (partial contribution + cannot guarantee two assertions above). The same understanding also holds in the remaining rules.

Wei B, et al.

Table 1

Sci China Inf Sci

July 2014 Vol. 57 072104:9

Qualitative reasoning process of confidentiality requirements

Confidentiality[System]{?}(postLoginControl[System]{?}(StorageControl[System]{?}(++PhysicalIsolation[Data] {}) ∧TransmissionControl[System]{?}(++ApplyingSSL[Data]{×}∨+Signature[Data]{}))∨ preLoginControl[System]{?} (Authentication[Account]{?}(+Multi-Facts[Account]{}∨ ++Biometrics[Account]{×}))) CWA Confidentiality[System]{?}(postLoginControl[System]{?}(StorageControl[System]{} (++PhysicalIsolation[Data]{}) ∧TransmissionControl[System]{w + } (++ApplyingSSL[Data]{×}∨+Signature[Data]{}))∨ preLoginControl[System]{?} (Authentication[Account]{w + }(+Multi-Facts[Account]{}∨++Biometrics[Account]{×}))) CWA Confidentiality[System]{?}(postLoginControl[System]{w + }(StorageControl[System]{} (++PhysicalIsolation[Data]{}) ∧TransmissionControl[System]{w + } (++ApplyingSSL[Data]{×}∨+Signature[Data]{}))∨ preLoginControl[System]{w + } (Authentication[Account]{w + }(+Multi-Facts[Account]{}∨ ++Biometrics[Account]{×}))) CWA Confidentiality[System]{w + }(postLoginControl[System]{w + }(StorageControl[System]{} (++PhysicalIsolation[Data]{}) ∧TransmissionControl[System]{w + } (++ApplyingSSL[Data]{×}∨+Signature[Data]{}))∨ preLoginControl[System]{w + } (Authentication[Account]{w + }(+Multi-Facts[Account]{}∨ ++Biometrics[Account]{×})))

AND-decomposed, this rule is only suitable for NFR softgoals. Similarly, a1 , . . ., an are interpreted with “++” contribution type, and the satisficing status of a0 can be calculated by adopting the operation on the right side of the entailment sign. Example 2. Follow Example 1. We need to know whether the confidentiality requirements could be satisficed if a specific implementation decision is taken. An implementation decision could be that PhysicalIsolation[Data], Signature[Data], Multi-Facts[Account] are fully satisficed, and ApplyingSSL[Data], Biometrics[Account] are fully denied. We have the qualitative reasoning process in Table 1. We can trace the satisficing status signs in the braces of each node. Obviously, the confidentiality requirements is weakly satisficed (w+ ). 5.2

Quantitative schema

Quantitative process mainly focuses on the reasoning steps where some modeling concepts are usually numeric. For example, the satisficing status and the contribution of each softgoal may be quantified if specific metrics can be assigned and precise values in the predefined scale can be added to models. For quantitative process, the commonly used method is the linear scoring approach, where a fitness function calculates a cumulative score for each candidate decision based on the weight of the criteria and the degree how each alternative matches the criteria [4,25–27]. The decision with the highest score is considered as the most desirable decision. 5.2.1

Reasoning mechanism

Similarly, the quantitative reasoning mechanism should calculate the numeric satisficing status of the parent with the numeric satisficing statuses of its children based on the numeric contributions. Note that, the greatest difference from the qualitative schema is that CWA is not necessary because the numeric values are clear-cut without implicit knowledge. Consequently, the two-step reasoning mechanism can be implemented by a specific function which linearly calculates the result, without employing CWA for mapping among different satisficing statuses. In the quantitative process, the status transformation function is defined as the product of the satisficing status and the node’s contribution. And the effect inference function is grounded as two quantitative functions in the AND-decomposition and OR-decomposition respectively. And the entailment relationship in the meta-level rule is grounded as QUAN .

Wei B, et al.

Table 2

Sci China Inf Sci

July 2014 Vol. 57 072104:10

Quantitative reasoning process of confidentiality requirements

Confidentiality[System]{?}(postLoginControl[System]{?}(StorageControl[System]{?} (1.0PhysicalIsolation[Data]{1.0})∧ TransmissionControl[System]{?} (1.0ApplyingSSL[Data]{−1.0}∨0.5Signature[Data]{1.0}))∨ preLoginControl[System]{?} (Authentication[Account]{?}(0.5Multi-Facts[Account]{1.0}∨1.0Biometrics[Account]{−1.0}))) QUAN Confidentiality[System]{?}(postLoginControl[System]{?}(StorageControl[System]{1.0} (1.0PhysicalIsolation[Data]{1.0})∧ TransmissionControl[System]{0.5} (1.0ApplyingSSL[Data]{−1.0}∨0.5Signature[Data]{1.0}))∨ preLoginControl[System]{?} (Authentication[Account]{0.5}(0.5Multi-Facts[Account]{1.0}∨1.0Biometrics[Account]{−1.0}))) QUAN Confidentiality[System]{?}(postLoginControl[System]{0.5}(StorageControl[System]{1.0} (1.0PhysicalIsolation[Data]{1.0})∧ TransmissionControl[System]{0.5} (1.0ApplyingSSL[Data]{−1.0}∨0.5Signature[Data]{1.0}))∨ preLoginControl[System]{0.5} (Authentication[Account]{0.5}(0.5Multi-Facts[Account]{1.0}∨1.0Biometrics[Account]{−1.0}))) QUAN Confidentiality[System]{0.5}(postLoginControl[System]{0.5}(StorageControl[System]{1.0} (1.0PhysicalIsolation[Data]{1.0})∧ TransmissionControl[System]{0.5} (1.0ApplyingSSL[Data]{−1.0}∨0.5Signature[Data]{1.0}))∨ preLoginControl[System]{1.0} (Authentication[Account]{0.5}(0.5Multi-Facts[Account]{1.0}∨1.0Biometrics[Account]{−1.0})))

5.2.2

Quantitative reasoning rules

Our quantitative reasoning rules only specifies the ∨-decomposition and the ∧-decomposition. Assume that satisficing status values and contribution values belong to the interval [−1.0, 1.0]. For each leaf node, the satisficing status value can only be assigned with −1.0 or 1.0, indicating fully denied or fully satisficed. Rule 3a (∨, OSG). quan (∗i ·si , . . . , ∗k · sk )}, where si , . . . , sk ∈ {s1 , . . . , sn } ∩ a0 {?}(∗1 a1 {s1 }∨· · · ∨∗n an {sn })QUAN a0 {fei,∨ [0, 1.0]. It is obvious that the subnodes with negative satisficing status values are not taken into consideration. The reason is that for ∨-decomposition, the denied nodes have no impacts on changing the satisficing status of the parent node. Furthermore, Rule 3a can be specialized when the OR-decomposition exists in the NFR softgoals. Then all contributions are substituted with 1, and the reasoning rule is as follows: Rule 3b (∨, NSG). quan a0 {?}(a1 {s1 }∨· · · ∨an {sn })QUAN a0 {fei,∨ (si , . . . , sk )}, where si , . . . , sk ∈ {s1 , . . . , sn }∩ [0, 1.0]. For ∧-decomposition in the NFR softgoals, both positive and negative implementation knowledge are worth consideration. Semantically, the positive implementation can improve the denied (negative) status of the parent, and the negative implementation can hurt the satisficed (positive) status of the parent. All contributions are substituted with 1. We have the following quantitative reasoning rule: Rule 4 (∧, NSG). quan (s1 , . . . , sn )}. a0 {?}(a1 {s1 } ∧ . . . ∧ an {sn }) QUAN a0 {fei,∧ Example 3. Just consider the quantitative case of Example 2. To put more emphasis on the reasoning mechanism, instead of specifying the numeric contribution for each design alternative one by one, we just let all MAKE and all HELP contributions be 1.0 and 0.5 respectively in the numeric contribution scale, and let fully satisficed status, weakly satisficed status and fully denied status be 1.0, 0.5 and −1.0 respectively in the numeric status scale. We have the quantitative reasoning process in Table 2. We can trace the satisficing status signs in the braces of each node. Obviously, the satisficing status of confidentiality requirements is 0.5. The result is also in accordance with the previous qualitative result under the quantification setting. 5.3

Integrated schema

Qualitative schema and quantitative schema are available only when satisficing statuses and contributions

Wei B, et al.

Table 3

Sci China Inf Sci

July 2014 Vol. 57 072104:11

Qualification and quantification of quality requirements models Satisficing status/effect

Contribution

Qualification

−1.0

(−1.0, 0)

0

(0, 1.0)

1.0

−1.0

(−1.0, 0)

(0, 1.0)

1.0



Fully

Weakly

Conflicting

Weakly

Fully

BREAK

HURT

HELP

MAKE

Quantification

Denied

Satisficed

Satisficed

Satisficed



−1.0

−0.2

0.2

1.0

−1.0

−0.5

0.5

1.0

0

can be completely qualified or quantified. But in some cases, only part of satisficing statuses and contributions can be quantified. Hence, an integrated schema combining qualitative values and quantitative values is desirable. It is natural transforming this issue to a simple qualitative situation because the current knowledge is sufficient for the qualitative schema but insufficient for the quantitative schema. The contrary way for this issue is transforming qualitative values to quantitative values. We may still lack specific measurement for quantification job. Hence the expected quantification is done just by achieving the agreement among stakeholders. It is not a practical measurement result, but a reference result based on experiences and expertise. Table 3 presents the qualification and quantification of quality requirements models. As noticed, quantitative values of satisficing statuses (effects) cannot be qualified as Unknown status because the satisficing status from the quantitative point of view is quite clear-cut. Meanwhile, quantification here does not specify the detailed nodes. Similarly, the Unknown status cannot be quantified because each numeric value is “known” in terms of the satisficing status. This “missing” mapping will not affect the following reasoning, because the Unknown status is not considered during qualitative reasoning process [15]. After finishing the transforming job, the problem is completely converted to a new one which the quantitative or qualitative schema is suitable for. Then the reasoning can be done according to the previous techniques.

6

Case study

Our approach has been applied to analyze the confidentiality requirements of an enterprise financial service system supporting equity trading in USA, Europe and Australia. It is developed by Zhejiang University, China. Requested by different trade analysts, this system dynamically calls corresponding functional components distributed in multiple servers in different countries, and finally constructs the desired architecture. However, different servers may share the same functional components and support different workloads at run-time. Moreover, different inter/intra-component constraints after component selection may impact the resulting architecture. Hence, there exist trade-offs about which implementation decision should be dynamically selected via Internet at run-time, with quality requirements, such as availability, security and performance guaranteed. Figure 4 presents the modeling result of the confidentiality requirements. At the same time, we also model the cost requirements and the performance requirements which share many design alternatives with the confidentiality requirements. Figure 5 depicts the modeling results for both of them. At the first step, the tree models are transformed to Σ-formulae, as shown in Table 4. These formulae include all knowledge captured by the graphical models in Figures 4 and 5, but in the symbolic format. Each term with CNT1 elements (++,−−,+,−) indicates that it is an operationalization softgoal (design alternatives). At the second step, the satisficing status assignment will be done for each leaf node. Theoretically, each leaf node can be assigned by fully satisficed or fully denied status. But to minimize the computational complexity, we apply a specific heuristics that fully satisfices at most one design alternative for its parent node.

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:12

Confidentiality[System]

Authentication[Account]

AccessControl [System]

DataProtection[System]

StorageProtection[System] InfoProtection[Account] TransmissionProtection[System] Login[Account] AntiDOSAttacking[System] PrivilegeCrossPrevention + [System] + ++ AccountStolenPrevention + + [System] + + Signature[System] + ProcessProtection[Account] + + + Routers[System] + + + AccessPrioritization InfomationIdentification [System] [System] IDS/IDP/IPS[System] ApplySSL[System] DateLeakPrevention[System] AuthenticationIsolation LeastPrivilege [Account] + [System] + + Patching[System] Encryption[System] Algorithm[Account] Multi-factsAuthentication CriticalInfoIdentification [Account] Firewall[System] [System] Encryption[System] PasswordComplexityGuaranteed[Account] LocationIdentification[System] Figure 4

(Color online) Confidentiality requirements model for the financial service system

Performance[System]

CostControl[System]

Throughput[System] NoExtraDevices [System] NoExtraLabour [System] ResponseTime [System] ResourceUtilization[System] −− −− -+ + − − + − Firewall[System] Routers[System] Patching[System] − − + − − Routers[System] −

Firewall[System] Encryption[System]

AuthenticationIsolation [System] PatchingSystem [Account]

ApplySSL [System]

Multi-factsAuthentication [Account]

LeastPrivilege[Account]

PasswordComplexityGuaranteed[Account] Figure 5

Signature[System]

CriticalInfoIdentification[System] LocationIdentification [System] CriticalInfoIdentification[System]

(Color online) Performance and cost requirements models for the financial service system.

At the third step, we choose qualitative reasoning schema and begin with the cost requirements model. Following the reasoning process in Subsection 5.1, four implementation decisions which make the cost requirements weakly satisficed (actually no fully satisficed situation) are identified, as listed in Table 5. The performance requirements model also shares Firewall[System], Routers[System] and Patching [System] nodes. Hence, we apply the results to the performance requirements model and find two implementation decisions, as listed in Table 6. Similarly, we apply the heuristics and the reasoning results from the two models above to the confidentiality requirements model. Consequently, some design alternatives of the confidentiality requirements model are constrained with specific satisficing statuses, as shown in Figure 6. As easily noticed, Algorithm[Account], Signature[System], IDS/IDP/IPS[System] and Routers[System] are four un-constrained nodes, where the latter two share one parent. Considering the heuristics applied previously, there are twelve candidate implementation decisions (2 ∗ 2 ∗ 3) yielded by these four nodes.

Wei B, et al.

Table 4

Sci China Inf Sci

July 2014 Vol. 57 072104:13

Σ-formulae for confidentiality, cost and performance requirements

Confidentiality[System]{}(Authentication[Account]{}(Login[Account]{}(ProcessProtection[Account]{} (+AuthenticationIsolation[Account]{}))∨InformationProtection[Account]{}(+Algorithm[Account]{}∨ ++Multi-factsAuthentication[Account]{}∨+PasswordComplexityGuaranteed[Account]{}))∧ AccessControl[System]{}(PrivilegeCrossPrevention[System]{}(+LeastPrivilege[Account]{}∨ +AccessPrioritization[System]{})∨ AntiDOSAttacking[System]{}(+Firewall[System]{}∨ +Patching[System]{}∨+IDS/IDP/IPS[System]{}∨+Routers[System]{}))∧DataProtection[System]{} (StorageProtection[System]{}(DataLeakPrevention[System]{}(+LocationIdentification[System]{}∨ +CriticalInfoIdentification[System]{}∨ +Encryption[System]{})∨AccountStolenPrevention[System]{} (+InformationIdentification[System]{}))∨TransmissionProtection[System]{} (+Encryption[System]{}∨ +Signature[System]{}∨ +ApplySSL[System]{}))) CostControl[System]{}(NoExtraDevices[System]{}(−−Firewall[System]{}∨ −−Routers[System]{})∨ NoExtraLabour[System]{}(−−Patching[System]{})) Performance[System]{}(ResponseTime[System]{}(−PasswordComplexityGuaranteed[Account]{}∨ +LeastPrivilege[Account]{}∨−Encryption[System]{}∨ −AuthenticationIsolation[System]{}∨ −ApplySSL[System]{}∨−Patching[System]{})∧ResourceUtilization[System]{} (+AccessPrioritization[System]{})∧ Throughput[System]{}(−Multi-factsAuthentication[Account]{}∨ −CriticalInfoIdentification[System]{}∨+AccessPrioritization[System]{}∨+Firewall[System]{}∨ −LocationIdentification[System]{}∨ −InformationIdentification[System]{}))

Table 5

Implementation decisions for cost requirements

Decision

Firewall[System]

Routers[System]

Patching[System]

1

×

×

×

2



×

×

3

×



×

4

×

×



Table 6

Implementation decisions for performance requirements

Decision

AccessPrioritization[System]

LeastPrivilege[System]

Others

1



×

×

2





×

Finally, three candidate implementation decisions are identified, as listed in Table 7. They can make the confidentiality requirements at least weakly satisficed. Up to now, the final decision is still undecided. At the forth step, we use the linear scoring method for decision making. More decision factors are needed. For the analyst, the development budget and the system reliability are set as two factors which have their weights 0.3 and 0.7 respectively. Each implementation decision should be evaluated based on both factors. Assuming that 0 denotes worst situation and 1 denotes the best situation regarding both factors, all design alternatives can be scored with an interval [0, 1]. Since three implementation decisions only differ in the satisficing status of IDS/IDP/IPS[System] and Routers[System], it is sufficient for decision making only to evaluate these two design alternatives. Table 8 presents the independent scoring results which come from stakeholders’ experiences and expertise. Correspondingly, linear scores of all implementation decisions can be obtained, as listed in Table 9. Finally, we choose the one with the highest score as the final decision. That is, fully satisfice design alternatives Algorithm[System], AccessPrioritization[System], Signature[System], Routers[System], and fully deny other design alternatives. We also applied the quantitative and integrated reasoning schema in other cases. As introduced before, the quantitative reasoning has the same two-step process with the qualitative process. The only difference is that quantitative reasoning schema employs quantitative functions which are simpler than those in the

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:14

Confidentiality[System]

Authentication[Account]

AccessControl[System]

DataProtection[System]

StorageProtection[System] InfoProtection[Account] TransmissionProtection[System] Login[Account] AntiDOSAttacking[System] PrivilegeCrossPrevention + + [System] ++ AccountStolenPrevention + + [System] + + Signature[System] + ProcessProtection[Account] + + √ + Routers[System] + + X AccessPrioritization + InfomationIdentification [System] X IDS/IDP/IPS DateLeakPrevention [System] X ApplySSL[System] X [System] [System] AuthenticationIsolation LeastPrivilege [Account] X X [System] + + + Patching[System] Encryption[System] X Algorithm[Account] X X X Multi-factsAuthentication CriticalInfoIdentification X [Account] Firewall[System] [System] X Encryption[System] PasswordComplexityGuaranteed LocationIdentification[System] [Account] Figure 6

(Color online) Confidentiality requirements model with constrained design alternatives.

Table 7 Decision

Algorithm [Account]

Implementation decisions for confidentiality requirements

IDS/IDP/IPS [System]

Routers [System]

Signature [System]

AccessPrioritization [System]

Others

1





×





×

2



×







×

3



×

×





×

Table 8

Independent scoring results for IDS/IDP/IPS[System] and Routers[System] IDS/IDP/IPS[System]

Decision factor and its weight

Routers[System]



×



×

Development budget(0.3)

0.3

1.0

0.8

1.0

System reliability(0.7)

0.5

0.0

0.5

0.0

Table 9

Linear scores for implementation decisions of confidentiality requirements

IDS/IDP/IPS[System]

Routers[System]

Linear Score



×

0.3*0.3+0.5*0.7+1.0*0.3+0.5*0.7=0.74

×



1.0*0.3+0.0*0.7+0.8*0.3+0.5*0.7=0.79

×

×

1.0*0.3+0.0*0.7+1.0*0.3+0*0.7=0.60

qualitative one. Integrated reasoning schema includes a quantification or qualification phase which is highly related to requirements negotiation, but the following phase remains the same. So both are omitted here, for focusing on the implementation decision process itself, not showing how different functions work.

7

Supporting tool

The supporting tool called rΣ (pronounced as [’a: ’sigma]), is a lightweight analysis tool for quality requirements goal tree models. It is implemented in JAVA (JDK 1.6) in Eclipse 3.2 environment, and

Wei B, et al.

Sci China Inf Sci

Figure 7

July 2014 Vol. 57 072104:15

(Color online) rΣ Interface.

supports graphical modeling, Σ-transformation, strategy evaluation, and strategy exploration for this paper’ work. Figure 7 shows its interface. 7.1

Graphical modeling

rΣ consists of four types of modeling elements: 1) Softgoals including NFR softgoal, Operationalization softgoal and Claim softgoal; 2) Decomposition relationships including AND-decomposition and ORdecomposition; 3) Contribution relationships including MAKE, BREAK, HELP, and HURT; 4) Satisficing statuses including fully satisficed, fully denied, weakly satisficed, weakly denied, conflicting and unknown. They are hidden in the cascading menu of each node. rΣ accepts all goal tree models constructed from all of these elements. After modeling process, the models can be saved, in case it will be opened and referred to next time. Multiple tab view is allowed, so that it can support the easy navigation among projects, by clicking the tab name underneath the modeling primitive buttons. 7.2

Σ-Transformation

rΣ provides the function which maps the graphical models to the symbolic formula models. The transformation rules are based on the syntax of the Σ [23]. All graphical models in the tree structure can be written as the corresponding formulae. In rΣ, automated reasoning is performed on the Σ-formulae according to reasoning rules introduced before. So formula representation is more suitable for some matching algorithms to execute our reasoning rules. 7.3

Strategy evaluation

Strategy evaluation is the analysis activity for implementation decisions after assigning satisficing statuses for leaf nodes. rΣ adopts the closed world assumption and reasoning approaches presented in [15,28] to perform the automated reasoning. To present the explicit reasoning process, rΣ will also pop up a new window to show how each node (highlighted in red) is used in stepwise reasoning, which is textual and easy to be documented. 7.4

Strategy exploration

No manual satisficing statuses assignment for leaf nodes involved, rΣ will automatically assign fully satisficed or denied statuses for all leaf nodes, and search all situations to calculate the final status of the root node. Finally, a complete report about the satisficing statuses of quality requirements inferred by all possible implementation decisions will be generated. Meanwhile, rΣ employs the analytical idea

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:16

from [17] for the mutual relationship between implemented quality requirements and unimplemented quality requirements. It provides the probabilistic information that the quality requirement is satisficed or denied.

8

Related work

We have reviewed the prevailing studies for quality requirements implementation reasoning in the traditional software research, and makes comparisons with ours. The NFR Framework is the first to propose the concept of softgoal in the RE context and offers a process for dealing with software quality requirements (termed as “non-functional requirements” in this framework) [6]. The NFR Framework emphasizes more on goal contribution and correlation [10,29]. Using label propagation, one can make implementation decisions based on the consideration of goal satisfaction in a qualitative manner [30]. The NFR Framework does not provide any formalism for the NFR models. And the reasoning process is still interactive. When partial satisficing statuses occur, clarification is necessary for further reasoning. This is not suitable for run-time internetware construction. However our work offers a well-defined formalism, with which the automated reasoning process is guaranteed. i∗ , which has a set of different notation from the NFR’s, captures the interaction between the software entity and environmental entities. In i∗ , goals are correlated with other entities by tasks and resources, which should be implemented by real agents [31–33]. With Strategic Dependency Model and Strategic Rationale Model, i∗ supports goal modeling and reasoning [16]. In the i∗ framework, an interactive, forward propagation algorithm with qualitative values, is provided in [13]. A backward propagation algorithm with qualitative values has also been recently explored [12]. The i∗ framework presents a formalism by using conjunctive normal form (CNF) for each piece of model knowledge, while we prefer one complete formula for a goal tree model which is structure-preserving and decreases input redundancy. Besides, the i∗ framework employs the SAT solver to compute all implementation decisions for the root node. Whereas we are only applying the reasoning rules to achieve the same goal. TROPOS is another goal-oriented software development methodology that includes the concepts of agents and goals [7]. In the early requirements analysis phase, TROPOS adopts i∗ s modeling concepts and diagrams. TROPOS uses the manual axiomatization approach to formalize goal models. It can also perform forward and backward propagation [25]. TROPOS differentiates the impacts from positive evidence (satisficed status) and negative evidence (denied status) in different modeling situation. For −S example, G2 → G1 means that if G2 is satisficed, then there is some evidence that G1 is denied, but if −D G2 is denied, then nothing is said about the satisfaction of G1 ; G2 → G1 means that if G2 is denied, then there is some evidence that G1 is satisficed, but if G2 is satisficed, then nothing is said about the denial of G1 ; G2 → G1 means that, if G2 is satisficed (denied), then there is some evidence that G1 is denied (satisficed). The first two are asymmetric contributions, and the last one is the symmetric contribution. But which type of relationships should be chosen in real practice is not clear. However, we propose the CWA as guidelines to facilitate the positive/negative evidence application. It is feasible in the internetware engineering practice as long as we only focus on the limited internetware entities and resources or otherwise. Besides, TROPOS employs the SAT solver called CHAFF to identify the final decision. It needs input for customizing work before reasoning and only accepts the single quantitative decision factor. This is not enough because many candidate entities or resources are available over the Internet and more decision factors are needed. Our work removes the need for SAT solvers and supports both qualitative and quantitative decision factors. GRL, namely the Goal-oriented Requirement Language, which is part of the user requirements notation (URN) is also encoded with a bottom-up evaluation algorithms for goal models [4]. GRL shares similar notations with i∗ . The emphasis of GRL is not on the formalism of model, but reasoning machinery on decomposition links, contribution links and actors. When calculating the satisficing statuses of parents according to their children, it borrows the traditional ordering relationship among different satisficing statuses in the NFR Framework, like Denied ≺ Weakly Denied ≺ None(Unknown) ≺ Weakly Satisficed ≺ (Conflicting = Undecided) ≺ Satisficed in OR-decomposition. We argue that ordering-based algorithm

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:17

does not always provide the same result under CWA and OWA (open world assumption), but the GRL does not specify them. For example, under CWA, the parent can be said denied if all of its OR-decomposed children are denied. But under OWA, the parent should be unknown if so. While our work just adopts the two-step process to give a clear view of how the satisficing status of parent is calculated under CWA. Further, the GRL does not propagate the Conflicting status. However our algorithm, according to two set of functions, uses the conflict information for conflict detection. Simply speaking, our algorithm is actually conflict-sensitive. Another related research is KAOS modeling language pioneered by van Lamsweerde et al. [34,35]. Their work extends the KAOS goal modeling language [36] with a probabilistic layer allowing one to specify and reason about measurable quantitative requirements. However, KAOS does not specifies the functional requirements and quality requirements. Quantitative reasoning is executable as long as the precise quantitative goal model(usually tree structure as we choose) is obtained. In [26], the automated process is provided by simulation and multi-objective optimization. Our work focuses on quality requirements and takes both qualitative and quantitative reasoning as basic reasoning schemas. Specifically, research about quality requirements in the implementation of internetware has recently attracted some effort as well. For instance, conceptualization between requirements and ontologies for dynamic services has been discussed in [19] and [37]. This may be taken as references for requirements modeling in internetware. But their work provides no reasoning mechanisms. Ma et al. in [2] proposes the requirements-driven method for internetware services evaluation. This work borrows the linear scoring method for evaluating each service with respect to the functionality and the risk. But like GRL, it does not provide the candidate set for decision making. Compared with these existing work, our effort focuses on the decision making issue for internetware implementation. For this issue, we devised the automated reasoning mechanism and the systematic process model.

9

Conclusion and future work

This paper proposes an approach for the implementation decision making process about internetware quality requirements. It uses goal tree model which can be transformed to the symbolic formula model in Σ language, which is easy to edit and document. The reasoning process is treated as a two-step process which can be grounded as a qualitative, quantitative and integrated process. Each reasoning schema has its own reasoning rules. All reasoning work can be embedded to a systematic implementation decision making process, to support more decision factors. Our theory has been applied to real cases and proven efficient in quality requirements implementation for Internetware. A supporting tool is also constructed. Future work includes three lines. Regarding the reasoning process, a more expressive and powerful quantitative reasoning schema should be invented. As introduced, the current quantitative process only accepts the fixed contribution value if the agreement can be reached, for example +0.5 for weakly satisficed. Although assigning different values for the same type of contribution needs much effort in problem domains, it is indeed necessary when a precise feedback about quality requirements is expected. Especially for the similar Internet entities and resources, different implementations for internetwares should be precisely discriminated. Our quantitative rules are just indicative, and scoring functions involved can be substituted by other decision functions as well. We also observe that, no related prior work mentioned above has solved the conflicts which are yielded by the refinement structure itself. They focus only on conflicting status on each node after reasoning. We have argued that structure-related conflicts are unneglectable when decentralized development for quality requirements is adopted, e.g. internetware construction. Different stakeholders may assign different contributions for the same child under the same parent. If so the following reasoning is meaningless. Hence future work will also include the thorough conflict management over distributed goal tree models. To enhance our reasoning work, preference and priority of each quality requirement and design alternative will be considered as variables of automated reasoning process. We intend to interpret this problem as a multi-objective optimization problem where some existing techniques may be employable.

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:18

Acknowledgements This research was supported by National Natural Science Foundation of China (Grant Nos. 61232015, 91318301).

References 1 Mei H. Internetware: challenges and future direction of software paradigm for internet as a computer. In: Proceedings of the 34th Annual Computer Software and Applications Conference, Seoul, 2010. 14–16 2 Ma W, Liu L, Ye X, et al. Requirements-driven internetware services evaluation. In: Proceedings of the 1st Asia-Pacific Symposium on Internetware, New York, 2009 3 Simon H A. Rational choice and the structure of the environment. Psychol Rev, 1956, 63: 129 4 Amyot D, Ghanavati S, Horkoff J, et al. Evaluating goal models within the goal-oriented requirement language. Int J Intell Syst, 2010, 25: 841–877 5 Van Lamsweerde A. Goal-oriented requirements engineering: a guided tour. In: Proceedings of the 5th IEEE International Symposium on Requirements Engineering, Toronto, 2001. 249–262 6 Mylopoulos J, Chung L, Nixon B. Representing and using nonfunctional requirements: a process-oriented approach. IEEE Trans Software Eng, 1992, 18: 483–497 7 Giorgini P, Mylopoulos J, Sebastiani R. Goal-oriented requirements analysis and reasoning in the tropos methodology. Eng Appl Artif Intel, 2005, 18: 159–171 8 Yu E S K. Towards modeling and reasoning support for early-phase requirements engineering. In: Proceedings of the 3rd IEEE International Symposium on Requirements Engineering, Annapolis, 1997. 226–235 9 Weiss M, Amyot D. Business process modeling with URN. Int J E-Bus Res, 2005, 1: 63–90 10 Chung L, Nixon B A. Dealing with non-functional requirements: three experimental studies of a process-oriented approach. In: Proceedings of the 17th International Conference on Software Engineering, Seattle, 1995. 24–28 11 Chung L, Nixon B A, Yu E, et al. Non-Functional Requirements in Software Engineering. Berlin: Springer, 2000 12 Horkoff J, Yu E. Finding solutions in goal models: an iterative backward reasoning approach. In: Proceedings of the 29th International conference on Conceptual modeling, Berlin: Springer, 2010. 59–75 13 Horkoff J, Yu E, Liu L. Analyzing trust in technology strategies. In: Proceedings of International Conference on Privacy, Security and Trust, New York, 2006. 21–32 14 Mei H, Huang G, Zhao H, et al. A software architecture centric engineering approach for internetware. Sci China Ser F-Inf Sci, 2006, 49: 702–730 15 Wei B, Jin Z, Zowghi D. An automatic reasoning mechanism for nfr goal models. In: Proceedings of the 5th IEEE International Symposium on Theoretical Aspects of Software Engineering, Xi’an, 2011. 52–59 16 Van Lamsweerde A, Darimont R, Letier E. Managing con icts in goal-driven requirements engineering. IEEE Trans Software Eng, 1998. 24: 908–926 17 Wei B, Jin Z. Characterizing the implementation of software non-functional requirements from probabilistic perspective. In: Proceedings of the 35th IEEE Signature Conference on Computer Software and Applications, Munich, 2011. 608– 609 18 Mei H, Liu X. Internetware: an emerging software paradigm for internet computing. J Comput Sci Technol, 2011, 26: 588–599 19 Jureta I J, Faulkner S, Thiran P. Dynamic requirements specification for adaptable and open service-oriented systems. In: Proceedings of ICSOC, Berlin: Springer, 2007. 270–282 20 Jureta I J, Faulkner S, Schobbens P Y. A more expressive softgoal conceptualization for quality requirements analysis. In: Proceedings of the 25th International Conference on Conceptual Modeling, Springer: Berlin, 2006. 281–295 21 Elahi G, Yu E. A semi-automated decision support tool for requirements trade-off analysis. In: Proceedings of COMPSAC, 2011. 466–475 22 Elahi G, Yu E. Comparing alternatives for analyzing requirements trade-offs-in the absence of numerical data. Inform Software Tech, 2012, 54: 517–530 23 Wei B, Jin Z, Liu L. A formalism for extending the NFR Framework to support the composition of the goal trees. In: Proceedings of the 17th Asia Pacific Software Engineering Conference, Sydney, 2010. 23–32 24 Reiter R. On Closed World Data Bases. Logic and Data Bases. US: Springer, 1978. 55–76 25 Giorgini P, Mylopoulos J, Nicchiarelli E, et al. Formal Reasoning Techniques for Goal Models. Journal of Data Semantics, Berlin: Springer, 2003. 1–20 26 Heaven W, Letier E. Simulating and optimising design decisions in quantitative goal models. In: Proceedings of the 19th IEEE International Requirements Engineering Conference, Trento, 2011. 79–88 27 Supakkul S, Hill T, Chung L, et al. An NFR pattern approach to dealing with NFRs. In: Proceedings of the 18th IEEE International Requirements Engineering Conference, Sydney, 2010. 179–188 28 Wei B, Yin B, Jin Z, et al. rΣ: Automated reasoning tool for non-functional requirement goal models. In: Proceedings

Wei B, et al.

Sci China Inf Sci

July 2014 Vol. 57 072104:19

of the 19th IEEE International Requirements Engineering Conference, Trento, 2011. 337–338 29 Chung L, do Prado Leite J C S. On Non-Functional Requirements in Software Engineering. Conceptual Modeling: Foundations and Applications. Berlin: Springer, 2009. 363–379 30 Giorgini P, Mylopoulos J, Nicchiarelli E, et al. Reasoning with Goal Models. Conceptual ModelingER 2002. Berlin: Springer, 2003. 167–181 31 Oliveira A P A, Cysneiros L M, do Prado Leite J C S, et al. Integrating scenarios, i*, and aspects in the context of multi-agent systems. In: Proceedings of the Conference of the Center For Advanced Studies on Collaborative Research, CASCON, 2006. 204–218 32 Fuxman A, Liu L, Mylopoulos J, et al. Specifying and analyzing early requirements in tropos. Requir Eng, 2004, 9: 132–150 33 Yu E, Mylopoulos J. Enterprise modeling for business redesign: the i* framework. SIGGROUP Bull, 1997, 18: 59–63 34 Letier E, Van Lamsweerde A. Reasoning about partial goal satisfaction for requirements and design engineering. In: Proceedings of ACM SIGSOFT Software Engineering Notes, New York, 2004. 53–62 35 van Lamsweerde A. Reasoning About Alternative Requirements Options. Conceptual Modeling: Foundations and Applications. Berlin: Springer, 2009. 380–397 36 van Lamsweerde A. Requirements Engineering: From System Goals to UML Models to Software Specifications. Hoboken: John Wiley & Sons, 2009 37 Verlaine B, Dubois Y, Jureta I J, et al. Towards conceptual foundations for service-oriented requirements engineering: bridging requirements and services ontologies. IET Softw, 2012, 6: 85–102

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072105:1–072105:15 doi: 10.1007/s11432-014-5109-5

Profiling selected paths with loops LI BiXin1 ∗ , WANG LuLu1 & LEUNG Hareton2 1School

of Computer Science and Engineering, Southeast University, Nanjing 211189, China; of Computing, Hong Kong Polytechnic University, Hong Kong 999077, China

2Department

Received August 8, 2013; accepted November 23, 2013

Abstract Path profiling records the frequency of each path in an executed program. To accomplish profiling, probes are instrumented in the program and executed as the program runs. So, the number of probes has a significant impact on the efficiency of a profiling technique. By profiling only the interesting paths, existing techniques try to improve the profiling efficiency by reducing the number of probes and optimize path encodings for efficient storage. However, they lack accuracy, waste time on running uninteresting paths, and can mainly deal with acyclic paths. In this article, a novel technique called Profiling Selected Paths (PSP) is introduced to profile selected paths, which enables custom selection for both acyclic and cyclic paths and increases the execution efficiency by early termination on uninteresting paths. Theoretical analysis and experimental evaluation indicate that PSP performs better than existing techniques. Keywords

path profiling, interesting paths, dynamic analysis

Citation Li B X, Wang L L, Leung H. Profiling selected paths with loops. 072105(15), doi: 10.1007/s11432-014-5109-5

1

Sci China Inf Sci, 2014, 57:

Introduction

Path profiles provide an accurate characterization of a program’s dynamic behavior and is valuable in a wide variety of areas such as computer architecture, code optimizations, and debugging. Selective profiling [1,2] is a particular technique to get the profiles on given paths instead of all paths, and uses as little cost as possible. There are two main issues with selective profiling: (1) how to identify PI paths (the paths to profile) and improve profiling efficiency by not profiling other paths; (2) how to handle cyclic paths, that is, to ensure a unique pathid for every (acyclic and cyclic) path. On the first issue, there are mainly two ways for efficiency improvement: use fewer probes to save running cost (SPP [1]) and use compact pathids to save storage cost (PrePP [2]). But both [1] and [2] cannot deal with cyclic paths and cannot give accurate results. In this article, we propose another method to save running cost based on early termination on PN paths. On the second issue, we adopt the method used by Profiling All Paths (PAP) for pathids calculation (using multiplication and addition) that works well for cyclic paths [3]. The main contributions of this article are as follows: ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Li B X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072105:2

1. A profiling technique for acyclic paths called Modified SPP (MSPP) is designed by improving SPP. The significance of MSPP is that it is totally accurate while existing selective profiling methods are not. 2. A new approach to improve profiling selected paths based on early termination on uninterested paths. This gives another way to reduce profiling cost. 3. A new technique (named PSP) for profiling selected paths. PSP is the first selective profiling technique for cyclic paths. It can effectively deal with cyclic paths and provides accurate profiles. PSP can be implemented in four ways, which provide different benefit-cost trade-offs. The rest of this article is organized as follows: Section 2 covers the terminologies; Section 3 first discusses how to modify SPP into MSPP so that it can work with acyclic subgraphs, then illustrates the techniques to identify selected paths. Four PSP implementations are then presented. Section 4 gives experimental evaluation and comparisons. Section 5 summarizes related works and Section 6 concludes the article.

2

Terminologies

Acyclic path profiling algorithms mainly work with Directed Acyclic Graph (DAG) of a program. In this article, since we focus on the problem of selective path profiling based on Control Flow Graph (CFGs), some existing terminologies need to be modified and new terminologies need to be introduced. Paths of Interest (PI): the set of interesting paths [1]. Edges of Interest (EI): the set of edges in PI [1]. Nodes of Interest (NI): the set of nodes (or basic blocks) in PI. Paths Not of interest (PN): the set of paths not in PI. Nodes Not of interest (NN): the set of nodes not in PI. Boundary Edges (BE): the set of edges which are not in EI and have a source node in NI. iNvalid Edges (NE): the set of edges which are not in EI and have a source node in NN. Pathid: the probe value at the end of a path (This definition generalizes the concept of Pathid in [2], which is defined as the sum of edge weights along the path). Local Pathid: in a subgraph, if a local probe variable is used, the local pathid is the value of local probe at the exit of the subgraph. Temp Pathid: the probe value at certain position during execution.

3

Selective profiling

This section first gives an enhanced version of SPP, named Modified SPP (MSPP), then describes the details of four implementations of PSP (PSP0-PSP3). Finally, we compare the different implementations of PSP. 3.1

MSPP

SPP may be inaccurate when some necessary probes are removed causing some pathids of different paths conflict with each other. In this section, SPP is enhanced according to the following steps: before deleting probes on non-EI edges, check if each path in PI still has a unique value after such deletion; iteratively remove non-EI edges until no edge value is removable. Since each PI path is checked to confirm a unique value at every deletion step, this algorithm ensures an accurate selective profiling for acyclic paths and no more probes can be deleted. The complete MSPP algorithm is shown in Algorithm 1 (only phase 3 is different from that of SPP algorithm in [1]). In fact, MSPP algorithm is a greedy algorithm. Since it checks each probe only once to determine whether it is removable, it cannot ensure an optimal solution with the fewest probes. If we can check all possible solutions exhaustively, then we can get the optimal solution, but that entails a high cost. The time cost of MSPP is O(E ∗ I) because there are O(E) probes resulting from EPP algorithm, where E is the number of CFG edges and I is the number of interesting paths given by the users.

Li B X, et al.

Algorithm 1.

Sci China Inf Sci

July 2014 Vol. 57 072105:3

MSPP(G, EI).

Input: G: the DAG to be profiled EI: edges of interest Output: Val(e): value of edge e in G /* phase 1: compute edge values */ 1 NumPaths(EXIT) = 1 2 foreach non-exit node v in reverse topological order do 3

NumPaths(v) = 0

4

foreach outedge e not in EI: v → w do

5

Val(e) = NumPaths(v)

6

NumPaths(v) = NumPaths(v) + NumPaths(w)

7

end

8

foreach outedge e in EI: v → w do

9

Val(e) = NumPaths(v)

10 11

NumPaths(v) = NumPaths(v) + NumPaths(w) end

12 end /*phase 2: optimize the placement of probes */ 13 foreach node v in topological order do 14

if (v has only one inedge ei) and (Val(ei) > 0) then

15

foreach outedge eo: v → w do

16

Val(eo) = Val(eo) + Val(ei)

17

end

18

Val(ei) = 0

19

end

20 end /* phase 3: remove probes on non-EI edges */ 21 while true do 22

foreach edge e not in EI and Val(e) != 0 do

23

if each PI path has a unique pathid after Val(e) set to 0 then

24

Val(e) = 0

25

continue while

26

end

27

end

28

break while

29 end

3.2

Techniques to identify PN paths

In selective profiling, we only focus on the frequencies of interesting paths. However, uninteresting paths may also be executed which can waste resources. This observation suggests that if some executions of the uninteresting paths can be terminated before they are completed, the time cost can be saved. For this, we need to check whether an executing path belongs to P N set. If it is found to be an uninteresting path, it should be ended as soon as possible. If the saved time cost from early termination is more than the identification cost, the profiling efficiency can be improved. To early terminate P N paths, it is required to determine whether the currently executing path is of interest or not. The perfect situation occurs when all P N paths can be identified as early as possible. Since such a goal is hard to reach, we have to accept a lower identification ability to achieve a reasonable performance. There are different techniques for identifying P N paths. In this study, three techniques are proposed as follows: 1. Technique T0. To identify all PN paths as early as possible, check pathids at each necessary position

Li B X, et al.

Figure 1

Sci China Inf Sci

July 2014 Vol. 57 072105:4

Necessary positions for identification. (a) Node B in sequence; (b) node B after merging; (c) node B after

branching.

during executions: we enumerate all temp pathids of PI paths at these positions, and during executions, the current temp pathids is checked whether it belongs to any PI path or not; if it does, continue the current execution; otherwise, an early termination is performed. 2. Technique T1. To further reduce the identification cost, check temp pathids at certain key positions in the CFG. T1 performs identification at exits of acyclic subgraphs where MSPP is used to compute the local pathid. If an executed path fragment does not belong to any path of interest, an early termination is performed. 3. Technique T2. Different from T0 and T1, T2 checks edges, that is, if a BE edge is executed, an early termination can be performed. This is valid for two reasons: first, since PI paths cover EI edges, any path that contains a BE edge should be a PN path; second, if no BE edges are executed, no NE edges would be executed too. It is clear that the above techniques involve different identification costs to achieve different degrees of early detection of PN paths. Based on these techniques, four PSP implementations, PSP0-PSP3, are given in this article where PSP0 uses T0, PSP1 uses T1, PSP2 uses T2, and PSP3 uses both T1 and T2. PSP3 is developed by integrating T1 and T2 together to get a better ability in identifying PN paths (but with an extra cost of course). Since T1 makes identification with subgraphs and T2 with edges, we can apply T1 inside subgraphs and T2 outside. As T0 has stronger identification ability than others, it brings no benefits to integrate T0 with other techniques. These PSP implementations will also make use of MSPP and PAP. 3.3

PSP0

PSP0 applies T0. It is crucial to identify the necessary positions (CFG nodes) to perform PN identification. There are three main cases for consideration, as shown in Figure 1. Consider the first case as shown in Figure 1(a); it is obvious that B is unnecessary for identification because if the currently executed path belongs to PI, then it should pass any identification at B; if not, the path should be early terminated at A according to T0. For the case shown in Figure 1(b), it is inferred that B is also unnecessary for identification because if a PI path contains A or C, then it has to contain B, so any identification at B serves no purpose. For the case shown in Figure 1(c), B is necessary for identification, because when a PN path contains edge AB while a PI path contains edge AC, at node A, it is not clear whether the currently executed path will follow AB or AC. From this analysis, we can conclude that for a CFG node N, if all its inedges are the “only-outedge” of the previous node, then N is unnecessary to perform identification; otherwise, N is a necessary position for identification. Therefore, such necessary nodes should be the positions where temp pathids are checked to see whether they belong to PI paths. PSP0 is presented in Algorithm 2, whose time cost of which is O(E + N ∗ I ∗ L), where E is the number of CFG edges, N is the number of CFG nodes, I is the number of interesting paths, and L is the number of nodes of the longest PI path. Note that an interesting path has O(L) nodes. (Phase 1 costs O(E), phase 2 costs O(N ∗ I ∗ L), and phase 3 costs O(N ).)

Li B X, et al.

Algorithm 2.

Sci China Inf Sci

July 2014 Vol. 57 072105:5

PSP0(G, PI).

Input: G: the CFG to be profiled PI: the set of paths of interest Output: ps: the probe set /* phase 1: generate PAP probes */ 1 foreach node n in G do 2

int s = n.fanIn();//s is the number of inedges of node n

3 If s is unnecessary for identification then 4

continue;

5

end

6

int i = 0; foreach inedge e of n do

7

ps.addPAPProbe(e, s, i++);

8

//instrument probe “s(i)” on edge e

9

end

10 end /* phase 2: compute temp pathids for PI paths at each node */ 11 MapNode, Set tp; //tp records necessary temp pathids 12 foreach node n in G do 13

initialize an Integer-Set, named temp pathids ;

14

foreach path p in PI do

15

for int i=0; i < p.length()-1; i++ do

16

if p.getNode(i)!=n then

17

continue;

18

end

19

int temp pathid = getTempPathid(p, i);

20

//calculate the temp pathid of p

21

if !temp pathids .contains(temp pathid ) then

22

temp pathids .add(temp pathid );

23

end

24

end

25

end

26

tp.put(n, temp pathids );

27 end /*phase 3: calculate probes */ 28 foreach node n in G 29

Set temp pathids = tp.get(n);

30

ps.addIdentifyProbe(n, temp pathids );

31

/*if the temp pathid of r is not contained by local pathids , the execution is early terminated.*/

32 end

Figure 2 gives an example with two paths of interest. The temp pathids along PI paths are 0 and 3 at both nodes B and D; those at node E are 1 and 7 (but E is not necessary for identification since its inedges are all only-outedges). There are no temp pathids at node C since it is not contained in any PI paths. At each node, if its temp pathids are found not to match any PI paths, the executing path should be a PN path and can be early terminated. 3.4

PSP1

PSP1 applies T1. Since MSPP ensures an accurate profiling for DAG, we can integrate it with PAP to generate PSP1 implementation. We first define a reducible acyclic subgraph (RAS) on which we can apply MSPP, and then show how PSP1 applies PAP on the other parts of the CFG.

Li B X, et al.

3.4.1

Sci China Inf Sci

July 2014 Vol. 57 072105:6

Figure 2

An example for PSP0.

Figure 3

An example for PSP1.

Integration with RASs

First, we define the subgraphs of a CFG to perform MSPP: RAS [3]: a subgraph S of the CFG which satisfies the following three conditions: (1) S is a one-entry and one-exit DAG (there can be a backedge from node Sexit to node Sentry ); (2) Sentry pre-dominates Sexit ; (3) Sexit post-dominates Sentry . In [3], an algorithm is given to identify RAS for a reducible CFG (with N nodes and E edges) which costs O(N 2 ∗ E). Within RASs, we can apply MSPP to selectively profile acyclic path fragments, and then use PAP probes for the rest of the CFG. An extra probe is needed on the outedge of the RAS for pathid calculation inside and outside RASs. In the example given in Figure 3, we assume that the CFG has an RAS shown in the dotted box, and PI consists of two interesting paths “ABDEMJG” and “ABEMJGBDEMJG”. To perform PSP1, outside the RAS, PAP probes are used, and inside the RAS, MSPP is used with the local variable r , where the probe on edge IJ is removed. After MSPP instrumentation, the local pathids of PI paths inside the RAS are 5 and 8. So, at the RAS exit node G, PSP1 integrates the value of r into r and continues execution.

Li B X, et al.

Algorithm 3

Sci China Inf Sci

July 2014 Vol. 57 072105:7

PSP1(G, PI, rass).

Input: G: the CFG to be profiled PI: the set of paths of interest rass: the set of RASs where MSPP should be used Output: ps: the probe set /* phase 1: MSPP on RASs */ 1 List handled nodes; //record the nodes handled by MSPP 2 foreach RAS r in rass do 3

add all nodes of r into handled nodes;

4

handled nodes.remove(r.entry);

5

perform MSPP algorithm on r and add probes into ps;

6

get local pathids of PI paths inside r;

7

//calculate localpathids

8

ps.addIdentifyProbe(r.exit, local pathids );

9

/*if the local pathid of r is not contained by local pathids , the execution is early terminated.*/

10 end /* phase 2: PAP on other parts */ 11 foreach node n in G do 12

if handled nodes.contains(n) then

13

continue;

14

end

15

int s = n.fanIn();//s is the number of inedges of node n

16

if s LDR r10,< r5,#-20 > MUL r0,r0,r10 DFG construction Subgraph covering MLA r0 Code re-generation MUL r0 LDR r10,< r5,#-24 > ADD r0,r0,r10 LDR r11 LDR r12 LDR r10 LDR r0 LDR r10 LDR r10 STR r0,< r5,#-16 > −16

r5

−20

−24

−16

r5

−20

LDR r11,< r5,#-16 > LDR r12,< r5,#-20 > LDR r10,< r5,#-24 > MLA r0,r11,r12,r10 STR r0,< r5,#-16 >

−24

Figure 2 An example for each phase of GSM. (a) is an IR basic block, and (b) is the DFG of (a). Given an IR basic block, DFG is constructed according to data dependency among instructions. Each node in DFG corresponds to an IR code, where a leaf node corresponds to an operand, and an internal node corresponds to an instruction. The destination operand of an internal node is appended to it. Each edge corresponds to data dependency between the connected two nodes. Since the destination operand of STR instruction is a memory unit, it is not marked here. Conversion from (a) to (b) is DFG Construction; conversion from (b) to (c) is Subgraph Covering; and conversion from (c) to (d) is Code Regeneration.

To solve this problem, the main chanllenge is how to simplify tree pattern matching algorithm to make it suitable for DBT. Not like the heuristics used in [19], we choose a different fusing algorithm which works better for ARM ISA. Meanwhile, it needs less number of passes to perform instruction fusion. This algorithm is inspired by graph mapping techniques proposed in [20]. Although we use an easier algorithm, it can still get near-optimal results for most cases. This straightforward design also makes the implementation very easy, which reduces the translation overhead at the same time. Given an IR subgraph, we have to perform equivalence test which determines whether there is a target instruction functionally equivalent to it. In fact equivalence test is to check whether the subgraph can be translated. First we should determine the mapping between IR subgraph and target instruction and define the mapping rules, which is a pattern recognition problem. This problem usually has two different solutions. A traditional way is to develop mapping rules manually according to experts’ experience and insight. This is common in compilers to generate target code. Another way is to automatically generate the mapping rules through machine learning, but the design and implementation of such a self-learning optimizer is complicated [18]. Assume that all translatable source instructions can be represented by IR. To make sure that each translatable source basic block can be correctly translated, we must define a functionally equivalent

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:6

Table 2 Pattern Mapping Table (PMT) from IR to ARM assemble. Here rs1, rs2, rs3 represent source operand, rt represents a temporary register and rd represents destination operand. In Pattern-III, Pattern-IV and Pattern-V, SHIFT indicates shift instruction such as LSL, LSR, and ASR. In Pattern-III, ALU indicates an ALU instruction including ADD, SUB, AND, OR, XOR, CMP, TST, MUL, UMULL, SMULL, etc., while ALU* indicates an enhanced ALU with micro operation of shift. More instruction details can be found in [10] Pattern

IR

ARM

Description

I

MUL rs1, rs2, rt; ADD rt, rs3, rd

MLA rs1, rs2, rs3, rd (Multiply Accumulate)

Fusing MUL and ADD into an MLA

II

MUL rs1, rs2, rt; SUB rt, rs3, rd

MLS rs1, rs2, rs3, rd (Multiply and Subtract)

Fusing MUL and SUB into an MLS

SHIFT rs1, shift, rt;

ALU* rs1, rs2, shift, rd

ALU rt, rs2, rd

(Shift and ALU)

IV

SHIFT rs1, shift, rt; SUB rs2, rt, rd

RSB rs2, rs1, shift, rd (Reverse Subtract)

V

SHIFT rs1, shift, rt1; NOT rt1, rt2; AND rs2, rt2, rd

III

BIC rs1, rs2, shift, rd

Fusing SHIFT and ALU into an ALU* Fusing SHIFT and SUB into a RSB

Fusing SHIFT, NOT and AND into a BIC

(Bitwise Bit Clear)

target instruction sequence for each IR. Then, define the mapping between IR subgraph which includes n IRs and target instruction sequence which includes m target instructions. When n = 1, it is one-to-many mapping, and when m = 1, the algorithm degrades to many-to-one mapping. Table 2 lists five sample mapping rules (patterns) of IR to ARM assemble code. These target instructions (MLA, MLS, ALU*, BIC, RSB) are not very complex, and they are available since ARM v3. There are usually data dependencies among IRs in a pattern. For example, Pattern-I maps MUL and ADD instructions to an MLA (Multiply Accumulate) instruction, where MUL and ADD are read-afterwrite (RAW) data dependent. After defining mapping rules, it is easy to conduct equivalence test. In fact, there are other powerful instructions which can be used as patterns, such as CMN (Compare Negative), SMLAL (Signed Multiply Accumulate Long), UMLAL (Unsigned Multiply Accumulate Long), SWP (Swap), SWPB (Swap Byte), LDM (Load Multiple), etc. These are all available since ARM v3. More complicated version of ARM instruction set, such as ARM v7, includes 8-bit, 16-bit data processing instructions, and even vector instructions for SIMD extensions which are also potential patterns. Next we should find a subgraph set S which covers the DFG with minimal cost (minimal number of subgraphs). We do not produce optimal solutions for all DFGs, because it is NP-complete [5], and trying to find optimal solutions would be time-consuming. We introduce a greedy strategy which significantly simplifies the algorithm and produces near-optimal solutions. First, choose a seed node v0 in G and bring it into S. Then start from v0 and let S grow along data flow edge. During growth, lookup the Pattern Mapping Table (PMT), check whether the subgraph is translatable. If yes, continue to let it grow, else stop, and choose the subgraph as a member of S. The iteration repeats until G is covered by S. We propose Greedy Subgraph Mapping (GSM) as the kernel of our dynamic code generation algorithm. Figure 3 is the pseudo-code of GSM. Algorithm input S is a DFG, while V and E are its node set and edge set. Main procedure is an infinite loop of seed selection and subgraph growth (line 2–5). The loop ends when no suitable seed can be selected. In the loop body, it firstly calls Seed-Selection function (line 3) to select a seed from S, if no seed is found (line 4), the loop ends, else it calls Subgraph-Growth function (line 6) to try to grow a subgraph along data flow edge from the seed. Seed is selected in topological order (line 7–10). There are many other seed selection strategies, but it slightly affects the efficiency, so we choose a simpler one. In order to reduce the search space, we can restrain the type of seed nodes, and even specify it to be a logical or arithmetic shift instruction. During subgraph growth, the algorithm firstly identifies the opcode of seed node (line 12), and determines whether the seed with such an opcode can grow into any pattern. For example, if the opcode is shift, then it may try to match Pattern-III, Pattern-IV or Pattern-V (line 13–19); if the opcode is multiply, it then tries to match Pattern-I or Pattern-II (line 20–24). If a pattern is matched, the corresponding

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:7

Procedure SubgraphGrowth(seed) 12 switch seed.opcode 13 case SHIFT: 14 if pattern ShiftALU matched then 15 Call Replace_ShiftALU(); end 16 else if pattern RSB matched then 17 Call Replace_RSB(); end 18 else if pattern BIC matched then 19 Call Replace_BIC(); end Procedure SeedSelection(S) break; 7 foreach node[i] do 20 case MUL: 8 if is not leaf then if pattern MLA matched then 21 9 if haven’t been enumerated 22 Call Replace_MLA(); 10 return i; end end 23 if pattern MLS matched then end 24 Call Replace_MLS(); end end 11 return NOT FOUND; break; 25 case ...

1 Input: S=(V,E) 2 While(1) do Seed=SeedSelection(S); 4 if seed is NOT FOUND then 5 return; end else 6 SubgraphGrowth(seed); end end

26

default: break;

end 27 return; Figure 3

Pseudo code for GSM.

target node is used to replace the subgraph in the DFG. Every seed node results in no more than one pattern. Once a pattern is identified and replaced, GSM will select another seed to continue. Patterns are matched in order of descending pattern weight which is decided by its appearance frequency in the program and the size of its corresponding subgraph. The weight of a pattern intuitively means the importance of the pattern. So the more frequently pattern appears in the program, or the more nodes its subgraph contains (the bigger the subgraph is), the greater its weight is. The probability of identifying a pattern in a application can be statistically counted by profile. This is a simple way to implement our greedy strategy which reflects the design objective (shortest length of target code). Note that GSM is greedy, and may stop at a local minima. Moreover, although memory dependence analysis in dynamic environments has been well studied [30] for simplicity, our methods for identifying data dependency do not consider LOAD/STORE dependencies, i.e., GSM do not try to fuse instructions with memory dependencies, and thus may leave room for further improvement. Despite these potential weaknesses, our experiments show that the proposed approach is lightweight enough, and for the benchmarks evaluated, the code generation results outperform existing one-to-many mapping schemes. • GSM example. For the IR code and DFG in Figure 2 (a) and (b), GSM first selects MUL as seed, and then grow up from this node. Along the data flow edge, the corresponding ADD is found, and Pattern-I is matched, so the MUL and ADD would be replaced by MLA. The replaced DFG is shown in Figure 2(c) and the translated code is listed in Figure 2(d). 3.3

ISA-dependent optimization

When building a real DBT for commodity ARM processor, to further improve performance, some ISAdependent optimization should be implemented. Although these optimization are dependent on specific ISA, its idea is generic for all kinds of hardware. Only its implementation varies with specific hardware. • 8-bit Instruction problem. IA-32 ISA includes many 8-bit operations. For example, incb al is to increase lower 8 bits (0–7 bit) of eax register. However, in most RISC ISA there is no corresponding operation on 8-bit registers. So a target instruction sequence must be found to replace the 8-bit source instruction, and the sequence length must be as short as possible. In ARM-v7 ISA, BFI [10] copies any number of low order bits from a register into the same number of adjacent bits at any position in the

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:8

Start ELF analyze ARM

Source architecture?

Native execution

X86

Source ISA (IA-32)

IA-32 Binary

Load source file Fetch a source instruction

Command line shell

TransARM

Interpreter

N

Controller

Exceed hot threshold? N

Translator

File loader

N ARM platform (hardware, OS...) (a) Figure 4

Y

Execute translated code in TCC Translate Store code in TCC

Interpret & profile

Translated code cache

Target ISA (ARM)

Y

Translated?

ELF analyzer

End of program? Y End (b)

(a) The infrastructure of TransARM; (b) control flow of TransARM.

destination register, so 8-bit operations can be converted into such an instruction sequence: first do its operation in a temporary register, and then copy the result to the 8-bit destination register using BFI. • 32-bit immediate operand problem. Since IA-32 is a variable length ISA, IA-32 instructions may have 32-bit immediate operand, but ARM instructions can have at most 12 bits immediate operand. Therefore if there is a 32-bit immediate operand in the source instruction, an instruction sequence is needed to move this 32-bit immediate operand into a temporary register before the following operations. One possible solution is to use MOVW #imm16, rd and MOVT #imm16, rd [10]. MOVW moves the lower 16 bits of the immediate operand into the lower 16 bits of destination register, and MOVT writes an immediate value into the top half-word of the destination register.

4

Prototype system

To validate the feasibility and effectiveness of our approach, we design and implement a prototype system TransARM [11] as experimental platform. TransARM is a user level DBT that translates IA-32 integer executable program into ARM binary code so as to run on ARM platform. 4.1

Infrastructure and execution flow

As Figure 4(a) shows, TransARM consists of seven modules: controller, interpreter, translator, translated code cache (TCC), command line shell, ELF analyzer, and file loader. Controller is a manager which controls initialization and execution of other modules. Interpreter interprets IA-32 instructions and collects profile information during interpretation. In fact, each source instruction is handled by an interpretation routine if it is not translated. Each interpretation routine imitates the function of the corresponding instruction, and all these routines are linked as a runtime support library during execution. Translator extracts superblocks from hot code (a hot code is identified by the profile information which is collected by interpreter), translates and optimizes them into ARM binary code. The translated target code is cached in a private memory space TCC. TCC is responsible for allocating, scheduling, and releasing cache space of translated code. More details can be found in [11].

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:9

Table 3 Register mapping. The first 8 registers map to x86 registers EAX–EDI, and also map to ARM registers R0–R7. The next 6 ones (ADDR0, ADDR1, TMP0, TMP1, TMP2, TMP3) map to ARM register R8–R13, which are for temporary use. The last two IR LR and IR PC map to ARM register R14 and R15, which are Link Register and Program Counter Source register

Pseudo register

Target register

Source register

Pseudo register

Target register

(for IA-32 INT)

(for IR)

(for ARM)

(for IA-32 INT)

(for IR)

(for ARM)

EAX

IR EAX

R0

ADDR0

R8

ECX

IR ECX

R1

ADDR1

R9

EDX

IR EDX

R2

TMP0

R10

EBX

IR EBX

R3

TMP1

R11

ESP

IR ESP

R4

TMP2

R12

EBP

IR EBP

R5

TMP3

R13

ESI

IR ESI

R6

IR LR

R14

EDI

IR EDI

R7

IR PC

R15

Figure 4(b) illustrates the control flow of TransARM. Firstly it checks the ISA type of the executable file, if it is IA-32, and then loads the code into specified memory space. Then the source instructions are interpreted one by one, and profile information is collected simultaneously. When hot spot is found, it constructs superblock from the head of hot spot using NET [9], and then translates and optimizes the superblock. The basic blocks in a superblock are translated one by one using GSM. Translated code segments (a segment corresponds to a superblock) which can run on ARM processor is cached in TCC. If a translated code segment is replaced out of TCC for lack of space, re-translation and re-optimization should be performed before it is executed again. 4.2

Register allocation

To reduce overhead, register allocation is performed along with code generation. Since ARM processor has 16 general purpose registers, we set up 16 pseudo registers in IR (Table 3).When converting source instructions into IRs, source registers EAX–EDI are replaced by IR registers IR EAX–IR EDI. So register allocation (along with code generation) is just register name replacement. Since one source instruction may split into two or more IRs (Table 1), temporary registers are needed to temporarily store the intermediate results. In Table 1, x86 register eax is used to specify the memory address of source operand, so we need a temporary register to store the source operand temporarily. Here in target instructions we utilize ARM register R10 which maps to IR register TMP0 to store intermediate result. ADDR0 and ADDR1 are used to represent the memory address when resolving memory access mode, while IR LR and IR PC are referenced only in program control instructions.

5

Experimental methodology

• Schemes for comparison. We want to compare GSM with QEMU TCG, but it is not fair if other parts of the two DBTs are not the same, because QEMU is a full-system simulator with all of its components well developed, while TransARM is simplified and not comparable in some of its components. Hence we design DM (Direct Mapping) and ODM (Optimized Direct Mapping) algorithm. DM is a typical one-to-many mapping algorithm without any optimization. ODM is an optimized DM, and it is developed just as the code generation scheme of QEMU TCG. For a given basic block, ODM directly conducts oneto-many mapping and then performs DCE and CSE optimization to remove redundant code. For simplicity, GSM includes five patterns listed in Table 2. These five patterns are enough to demonstrate the feasibility of our approach. But for further performance improvement, more patterns should be found and applied. In this section, results will show that although only five patterns are applied, GSM has a significant effect on reducing target code. DM, ODM and GSM are all implemented in TransARM to conduct the comparison. We mainly compare GSM with ODM to show the superiority of GSM.

Chen X H, et al.

Table 4

Sci China Inf Sci

July 2014 Vol. 57 072106:10

Description of the selected SPEC 2000 and MiBench benchmarks

164.gzip

SPEC

2000 Data compression utility

197.parser

SPEC 2000

Natural language processing

254.gap

SPEC 2000

Computational group theory

255.vortex

SPEC 2000

Object oriented database

256.bzip

SPEC 2000

Data compression utility

adpcm

MiBench

ADPCM coder/decoder

blowfish

MiBench

Encryption algorithm

CRC32

MiBench

Computes the 32-bit CRC

dijkstra

MiBench

Dijkstra’s algorithm

sha

MiBench

NIST secure hash algorithm

stringsearch

MiBench

PBM string search

• Application benchmarks. Since TransARM neither translates nor interprets floating-point instructions for simplicity, we select SPEC CPU2000 INT benchmarks bzip2, parser, gzip, gap and vortex, and MiBench [31] benchmarks adpcm, blowfish, CRC32, dijkstra, sha, and stringsearch for evaluation. They are listed in Table 4. Note that all benchmarks include no floating-point instruction. All other Benchmarks in SPEC 2000 and MiBench omitted from this experiment include floating-point instructions. • Execution platform. The source platform is IA-32 INT, and the target platform is the CortexTM-A8 (ARM v7 ISA) processor which is running Angstrom with Linux kernel 2.6.29. • Evaluation indicators. In order to evaluate performance of the two algorithms, we introduce Code Expansion, Translation Overhead, Execution Speedup, and Pattern Significance as performance metrics. Code Expansion Rate indicates the percentage of code expansion before and after translation. It is calculated by (1). Note that the expanded code in this paper refers to runtime executed instructions, but not compiled static instructions. Code Expansion Rate =

Number of target instructions . Number of source instructions

(1)

In TransARM, code expansion occurs in these cases: (a) ALU instructions with memory access operation; (b) POP instructions whose destination operand is memory unit; (c) PUSH instructions whose source operand is memory unit; (d) 8-bit data-processing instructions, such as incb in x86; (e) data processing instructions with 32-bit immediate operand; (f) other cases. We calculate the code expansion rate of each basic block whenever the basic block is translated. We expect that the mean code expansion rate of basic block is low, which means the translated code is compact. We put more attention on the hot basic block and large basic block (i.e. basic block in large size). Hot basic blocks are executed for many times (e.g. basic blocks in loops). If it is translated better, the effect will be amplified and become remarkable. Large basic block means that it includes no less than a specific number of instructions. The more instructions it includes, the more chances that patterns will be found. Those basic blocks with only one or two instructions are not likely to include patterns. Basic blocks which are both hot and large are called key basic blocks because these basic blocks occupy most of the program execution and thus determine the program performance. Key basic blocks are the focus of our following discussion. To determine the dynamically executed number of target instructions, we instrument each basic block with a counter and count the number of instructions and eventually multiply the per basic block expansion factor with the number of executions of the basic block. Execution Speedup refers to dynamic execution speedup of translated code. Here we profile the execution time of target code, not including time for translation, i.e. with dynamic compilation time factored out. It is calculated by (2), which shows the performance improvement that code generation algorithm contributes to the entire dynamic execution. Execution Speedup =

Target code execution time using ODM . Target code execution time using GSM

(2)

July 2014 Vol. 57 072106:11

1.20

2.5

1.15

CRC32

dijkstra

sha

stringsearch

average

sha

stringsearch

average

(b)

1.10 1.09 1.08 1.07 1.06 1.05 1.04 1.03 1.02 1.01 1.00 0.99

18% Pattern significance

16% 14% 12% 10% 8% 6% 4% 2%

(c)

adpcm

vortex

gap

gzip

bzip2

parser

average

sha

stringsearch

dijkstra

CRC32

blowflsh

vortex

adpcm

gap

gzip

parser

0% bzip2

Translation slowdown

(a)

dijkstra

0.90

CRC32

0

blowflsh

0.95 bzip2

0.5

blowflsh

1.00

vortex

1.0

1.05

adpcm

DM ODM GSM

gap

1.5

1.10

gzip

2.0

System speed

3.0

parser

Sci China Inf Sci

bzip2 parser gzip gap vortex adpcm blowflsh CRC32 dijkstra sha stringsearch average

Code expansion rate

Chen X H, et al.

(d)

Figure 5 (a) Mean code expansion rate in key basic blocks; (b)execution speedup of GSM over ODM; (c) translation slowdown of GSM over ODM; (d) mean patterns significance in key basic blocks.

Translation Slowdown means that the increase of total time spends on translation (including optimization if it is needed). GSM seems a little more complex than ODM, and hence a comparison on the overhead of these two algorithms is necessary. This indicator is calculated by (3). If its value is close to 1, it demonstrates that complexity of GSM is lightweight enough and acceptable in practice. Translation Slowdown =

Translation overhead of GSM . Translation overhead of ODM

(3)

Pattern Significance represents significance of subgraph pattern for code generation. Here we define it as the probability of identifying a pattern in a basic block. For every basic block we profile, it is calculated by (4). Here we also mainly focus on patterns in the key basic blocks. Pattern Significance =

6 6.1

Number of nodes covered by selected patterns . Number of nodes in DFG

(4)

Results and analysis Code expansion

In this experiment, we calculate the value of the four indicators and compare the results of three schemes, as shown in Figure 5. We define a basic block executed more than 16 times as a hot basic block, and a basic block including more than 5 source instructions as a large basic block. These two parameters are called Hot Threshold and Large Threshold in the following. Their initial values (16 and 5) are determined

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:12

according to conclusions in [9] and observations in our experiment. The input data set is test scale for SPEC and small scale for MiBench. To demonstrate that GSM is stable for different conditions, we will vary the threshold value and input data set scale to test the algorithm sensitivity. The results show the same trend in different threshold configurations and program input scales. In Figure 5(a), we find that there is remarkable code reduction (average code expansion rate decreases from 2.16 to 1.42) from DM to ODM which means that the conversion from source instruction to IR produces a lot of redundant code, and the removal of redundant code is completely due to DCE and CSE. After such an optimization, the code compaction can only be performed by fusing instructions. Clearly GSM can take effect, although with only 5 patterns. We can see that GSM further decreases the code expansion rate compared to ODM for every benchmark (average code expansion rate decreases from 1.42 to 1.3, which means an about 8% reduction in code size for GSM vs. ODM). This demonstrates that GSM is better at leveraging target ISA and generates more compact target code. Meanwhile, this trend is stable for every benchmark, because patterns exist pervasively in all benchmarks. This means that GSM can be applied in most of applications and is guaranteed to take positive effect. Note that in average for all the three schemes, the code expansion rate of the five SPEC test applications is clearly bigger than that of the six MiBench ones (the average difference is 0.512). This may be due to the characteristic of program. Different types of program have different proportions of ALU instructions, memory access instructions, I/O instructions, etc. A computing intensive program may have a lot of ALU instructions, while an I/O intensive program may include mainly I/O related instructions. Since different types of instructions produce different numbers of redundant code, program type can really influence the code expansion rate. The two benchmarks have different characteristics. SPEC CPU2000 is mainly towards computing intensive applications, while MiBench consists of typical embedded applications. So the difference of the code expansion rate between the test applications is understandable. On the other hand, for SPEC and MiBench, the code shrinkage from ODM to GSM is also different to a certain degree (0.147 for SPEC ones and 0.085 for MiBench ones in average). An intuitive speculation is that GSM would perform better for MiBench than for SPEC because ARM is aimed at embedded field. But, on the contrary, we observe that code shrinks more for SPEC. There are three possible reasons. First the code expansion of SPEC is higher than that of MiBench originally, so there are more potential patterns; second, the mean size of basic blocks in applications are different for different program types and scales, and larger basic blocks may include more patterns; third, we only use five patterns in this experiment and much more patterns in embedded applications are not exploited. In fact, we will find that the first reason is not valid in section 5.4. We believe that if more patterns which are common in embedded applications are applied, the advantages of GSM for MiBench will be enhanced. We here advocate a usage scheme for GSM. Each instruction set has its advantage and applicable field. For example, ARM ISA is aimed at the field of embedded systems. To improve performance, ARM keeps upgrading and expanding the instruction set for embedded applications. This is why in ARM there are many powerful instructions which are not common in other ISA, such as vector instructions. For embedded applications, GSM for ARM can identify more patterns than other types of applications; in fact there are more ARM patterns in the embedded applications. So if GSM is applied for specific type of applications, and the target ISA is just aimed at this type of applications, GSM can really take full advantage of the target ISA, and would produce almost optimal compact target code. We observe an uncommon phenomenon that the code expansion rate of adpcm is below 1 (about 0.95) using GSM. This means that although the target platform is RISC, code expansion does not happen in the key basic blocks. Of cause we here only present the mean code expansion rate of key basic blocks. For all the basic blocks, the value is about 1.48. 6.2

Execution speedup

Figure 5(b) shows that GSM significantly and stably reduces target code execution time compared to ODM with an average speedup of about 1.1. This 10% improvement may be due to repeated execution of the key basic blocks. Although the average code reduction is only 8%, there are some key basic blocks, which have a code reduction more than 10%, take up the main part of the whole program because of their

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:13

repeatedly execution, making the execution of the whole target code speedup 10%. In other words, the speedup mainly comes from executing powerful instructions instead of chains of simple instructions; note that the target ISA is RISC whose execution time is proportional to the number of instructions in the ideal scene. Actually, since powerful instructions have replaced chains of simple instructions, the number of dynamically executed instructions is reduced. For SPEC benchmarks, the speedup is more stable than that for MiBench. This is perhaps due to the difference on program scale (SPEC is larger in scale), and another possible reason is that SPEC applications are all computing intensive, mainly requiring CPU resource, but MiBench ones have different characteristics. We observe that sha captures the maximal speedup, which is mainly due to its program characteristics. An ideal scenario is that GSM identifies a large number of patterns in key basic blocks, and these key basic blocks are executed for a huge number of times. If few patterns are found, or the key basic blocks are executed for a smaller number of times (even just 16 times), the best situation cannot be achieved. For sha, maybe the two conditions are both satisfied. sha is a implementation of NIST Secure Hash Algorithm which is mainly a regular (structured) loop. A basic block extracted from sha is likely to be executed for many times. In such a case, compacting basic blocks is just equivalent to optimizing a loop body, and the optimization effect will be sufficiently amplified by repeatedly execution of the loops. 6.3

Translation slowdown

Translation slowdown of GSM over ODM is shown in Figure 5(c). The statistic information of translation time is collected by software instrumentation, which may be not very accurate, but it tells the correct relative value. For all of the applications, the translation overhead is well controlled and the average slowdown is under 1.03. Even the biggest value is under 1.05. This indicates that GSM do not greatly increase translation overhead. gzip and vortex perform not so well probably because the programs are not structured and there are too many basic blocks to be translated and the code cache for translated basic blocks is not big enough which leads to frequent swap. Furthermore in these 11 selected test applications, the biggest translation time is 0.31 second which is negligible compared to the total execution time. Hence an average translation slowdown of 1.03 is acceptable. When program scale increases, translation slowdown may slightly weaken the effect of execution speedup. But overall GSM undoubtedly speeds up the entire system. 6.4

Pattern significance

We now explain the significance of subgraph pattern for code generation. Figure 5(a) indicates that although ODM removes redundant code by DCE and CSE, GSM still generates less target code than ODM. This is totally due to pattern mapping. After removing redundant code, the only optimization potential is to fuse instructions, and an efficient fusion can only be performed by subgraph mapping. Figure 5(d) shows the statistical value of pattern significance in the 11 test applications. For most of them, it is around 5%. For adpcm, blowfish and sha, it is all above 8%. Especially for adpcm, it is up to 16.25% which means there is about one sixth of the code related to patterns in key basic blocks. Since the program scale of adpcm is small, the number of key basic blocks is less than 10, and there are indeed many Pattern-IIIs in these key basic blocks. The results indicate that subgraph pattern is of great importance and it is really worthwhile to perform subgraph mapping in dynamic code generation if translation overhead is well controlled. Since we only choose five patterns, the average value of this indicator is not very high (about 6.42%). What’s more, some applications do not adapt ARM ISA, especially the SPEC ones. The characteristics of these applications determine that there is little potential to find patterns, no matter how much effort is made on finding them. We observe that the MiBench applications perform better than SPEC ones just because the MiBench ones are embedded system oriented. In fact, although only 5 patterns we applied, they have captured remarkable performance improvement. This result tells us that if we define more patterns according to the target ISA, and select those applications which better match the target ISA, a further efficiency improvement can be captured.

train/ large

test/small

sha

stringsearch

average

stringsearch

average

train/ large

test/small

train/ large

dijkstra

CRC32

adpcm

blowflsh

vortex

gap

gzip

parser

18 16 14 12 10 8 6 4 2 0 bzip2

Pattern significance (%) average

stringsearch

sha

dijkstra

CRC32

blowflsh

adpcm

vortex

gap

gzip

parser

(c)

dijkstra

(b)

1.12 1.10 1.08 1.06 1.04 1.02 1.00 0.98 bzip2

Translation slowdown

test/small

sha

(a)

CRC32

average

stringsearch

sha

dijkstra

CRC32

blowflsh

adpcm

vortex

gap

gzip

parser

bzip2

0.6

adpcm

1.0 0.8

blowflsh

1.2

vortex

1.4

train/ large

1.25 1.20 1.15 1.10 1.05 1.00 0.95 0.90 0.85 0.80 bzip2

Execution speedup

1.6

gzip

1.8 Code expansion rate

July 2014 Vol. 57 072106:14

parser

test/small

Sci China Inf Sci

gap

Chen X H, et al.

(d)

Figure 6 (a) Comparison of mean code expansion rate of GSM in key basic blocks between input data sets of test/small and train/large; (b) comparison of execution speedup between input data sets of test/small and train/large; (c) comparison of translation slowdown between input data sets of test/small and train/large; (d) comparison of pattern significance between input data sets of test/small and train/large.

Overall, GSM generates more compact code that ODM which further relieves code expansion, and slightly increases algorithm complexity. It improves efficiency of the entire DBT, and is proved to be feasible and practical for a wide range of applications. 6.5

Sensitivity analysis

To find the performance trend when input scale increases, we change the input data set (test, train and ref for SPEC, small and large for MiBench) and repeat the aforementioned experiment for each situation. Figure 6(a) compares the code expansion rate of GSM between the smaller input data set (test for SPEC and small for MiBench) and the bigger input data set (train for SPEC and large for MiBench). It almost shows the same trend for the different input scales. Similar results are obtained for execution speedup, translation slowdown and pattern significance as Figures 6 (b) and (c) show. This observation implies that our algorithm is stable for different input scales. Note that pattern significance of several benchmarks in Figure 6(d) slightly decreases compared to that in Figure 5(d), which is perhaps because some parts of code for processing input data become hot with the increase of input scale, but patterns in these basic blocks may be relatively less. We also test the algorithm sensitivity to the aforementioned two parameters, Hot Threshold and Large Threshold. We initiate them with 16 and 5. Now we alter them with different values to find their influence on the algorithm performance. We choose gzip as a sample because both ODM and GSM have moderate and stable effect on reducing its target code and speeding up its execution. In fact, other applications in our experiment show the same trend as gzip does, although we do not list them all in this paper. Figure 7(a) demonstrates that code expansion rate almost remains invariable as Hot Threshold increases in gzip. The implication behind this trend is that the proportion of code which can be removed

Chen X H, et al.

ODM

8

16

32 Hot threshold (a)

July 2014 Vol. 57 072106:15

GSM

DM

2.7 2.5 2.3 2.1 1.9 1.7 1.5 1.3

Code expansion rate

Code expansion rate

DM

Sci China Inf Sci

64

128

ODM

GSM

2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 5

8

10

16

20

Large threshold (b)

Figure 7 (a) Comparison of mean code expansion rate between different threshold values of hot basic block in gzip; (b) comparison of mean code expansion rate between different threshold values of large basic block in gzip.

by using many-to-one mapping in a basic block is not much related to how hot the basic block is. But in Figure 7(b) we observe an obvious decline in code expansion rate as Large Threshold increases in gzip. This is partly because the number of large basic blocks in the whole program becomes less as Large Threshold increases, while larger basic blocks have more opportunities to be compacted. Meanwhile, the decrease of GSM and ODM is quicker than that of DM, which also implies that code removal (either by optimization in ODM or by subgraph mapping in GSM) exerts more significant effect in larger basic blocks. Intuitively, a larger basic block has more optimization potential than a smaller one.

7

Conclusion

Binary translator enables transparent migration of legacy applications and is important for pervasiveness of embedded systems. Improving code generation efficiency in DBTs is an urgent problem. In this paper, we find the reason why code generation is not satisfactory in DBTs by rethinking the strength and the weakness of conventional code generation algorithms, and try to improve them. We propose GSM, a lightweight code generation algorithm, that employs DFG to identify data dependency between instructions and finds subgraph patterns to reduce target code. Many-to-one mapping scheme is applied to generate target code. The results show that, compared to existing algorithm, GSM significantly improves the quality of translated code with negligible extra overhead, and overall it speeds up the execution of target code. This demonstrates that for code generators in DBT, if we take full advantage of the powerful instructions, code expansion can be well controlled and DBT performance can be significantly improved.

Acknowledgements This work was supported by National High-tech R& D Program of China (863 Program) (Grant No. 2012AA010905), National Natural Science Foundation of China (Grant Nos. 61202121, 61272143, 61272144, 61070037), and Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20114307120013). Chen XuHao is supported by NUDT Graduate Innovation Fund. We thank the anonymous reviewers for their valuable suggestions.

References 1 Sites R L, Chernoff A, Kirket M B, et al. Binary translation. Commun ACM, 1993, 36: 69–81 2 L¨ u Y, Shen L, Wang Z, et al. Dynamically utilizing computation accelerators for extensible processors in a software approach. In: Proceedings of IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis, New York, 2009. 51–60 3 Kim H, Smith J E. Dynamic binary translation for accumulator-oriented architectures. In: Proceedings of International Symposium on Code Generation and Optimization, Washington, 2003. 25–35 4 Ebcioglu K, Altman E, Gschwind M, et al. Dynamic binary translation and optimization. IEEE Trans Comput, 2001, 50: 529–548

Chen X H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072106:16

5 Ertl M A. Optimal code selection in DAGs. In: Proceedings of Symposium on Principles of Programming Languages, New York, 1999. 242–249 6 Hwu W W, Mahlke S A, Chen W Y. The superblock: an effective technique for VLIW and superscalar compilation. J Supercomput, 1993, 7: 229–248 7 Ball T, Larus J R. Efficient path profiling. In: Proceedings of International Symposium on Microarchitecture, Paris, 1996. 1–12 8 Bala V, Duesterwald E, Banerjia S. Dynamo: a transparent dynamic optimization system. In: Proceedings of ACM Conference on Programming Language Design and Implementation, New York, 2000. 1–12 9 Duesterwald E, Bala V. Software profiling for hot path prediction: less is more. In: Proceedings of 12th International Conference on Architectural Support for Programming Languages and Operating Systems, New York, 2000. 202–211 10 ARM Corporation. ARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition, 2009 11 Chen W, Wang Z, Zheng Z, et al. TransARM: an efficient instruction set architecture emulator. Chin J Electron, 2010, 20: 6–10 12 Dehnert J C, Grant B K, Banning J P, et al. The Transmeta Code Morphing TM Software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In: Proceedings of IEEE/ACM International Symposium on Code Generation and Optimization, San Francisco, 2003. 15–24 13 Ebcioglu K, Altman E R. DAISY: dynamic compilation for 100% architectural compatibility. In: Proceedings of 24th International Symposium on Computer Architecture, New York, 1997. 26–37 14 Altman E R, Gschwind M, Sathaye S, et al. BOA: the Architecture of a Binary Translation Processor. IBM Research Report, 2000 15 Baraz L, Devor T, Etzion O, et al. IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium based systems. In: Proceedings of 36th International Symposium on Microarchitecture, Washington, 2003. 191–201 16 Bellard F. QEMU: a fast and portable dynamic translator. In: Proceedings of USENIX Annual Technical Conference, Berkeley, 2005. 41–46 17 Mihocka D, Shwartsman S. Virtualization without direct execution or jitting: designing a portable virtual machine infrastructure. In: Proceedings of 1st Workshop on Architectural and Microarchitectural Support for Binary Translation, Beijing, 2008. 1–16 18 Bansal S, Aiken A. Binary translation using peephole superoptimizers. In: Proceedings of 8th USENIX Symposium on Operating Systems Design and Implementation, Berkeley, 2006. 177–192 19 Hu S, Smith J E. Using dynamic binary translation to fuse dependent instructions. In: Proceedings of International Symposium on Code Generation and Optimization, Washington, 2004. 213–224 20 Clark N, Hormati A, Mahlke S, et al. Scalable subgraph mapping for acyclic computation accelerators. In: Proceedings of International Conference on Compilers, Architectures and Synthesis of Embedded Systems, New York, 2006. 147–157 21 Ishizaki K, Kawahito M, Yasue T, et al. Design, implementation, and evaluation of optimizations in a just-in-time compiler. In: Proceedings of Conference on Java Grande, New York, 1999. 119–128 22 Krall A. Efficient JavaVM just-in-time compilation. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Washington, 1998. 205–212 23 Suganuma T, Ogasawara T, Takeuchi M, et al. Overview of the IBM Java just-in-time compiler. IBM Syst J, 2000, 39: 175–193 24 B¨ ohm I, Franke B, Topham N. Cycle-accurate performance modeling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In: Proceedings of International Symposium on Systems, Architectures, Modeling, and Simulation, Samos, 2010. 1–10 25 Adl-Tabatabai A, Cierniak M, Lueh G, et al. Fast, effective code generation in a just-in-time Java compiler. In: Proceedings of Conference on Programming Language Design and Implementation, New York, 1998. 280–290 26 Fraser C W, Hanson D R, Proebsting T A. Engineering a simple, efficient code-generator generator. ACM Lett Program Lang Syst, 1992, 1: 213–226 27 Eckstein E, Konig O, Scholz B. Code instruction selection based on SSA-graphs. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems, Vienna, 2003. 49–65 28 Ebner D, Brandner F, Scholz B, et al. Generalized instruction selection using SSA-graphs. In: Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems, New York, 2008. 31–40 29 Lattner C, Adve V. LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of International Symposium on Code Generation and Optimization, Washington, 2004. 75–86 30 Hirzel M, Dincklage D V, Diwan A, et al. Fast online pointer analysis. ACM Trans Program Lang Syst, 2007, 29: 11 31 Guthaus M R, Ringenberg J S, Ernst D, et al. MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of IEEE 4th Annual Workshop on Workload Characterization, Washington, 2001. 3–14

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072107:1–072107:12 doi: 10.1007/s11432-013-4845-2

On the parameterized vertex cover problem for graphs with perfect matching WANG JianXin1 ∗ , LI WenJun1 , LI ShaoHua1 & CHEN JianEr1,2 1School 2Department

of Information Science and Engineering, Central South University, Changsha 410083, China; of Computer Science and Engineering, Texas A&M University, College Station, Texas 77843-3112, USA Received April 20, 2013; accepted August 8, 2013; published online January 6, 2014

Abstract A vertex cover of an n-vertex graph with perfect matching contains at least n/2 vertices. In this paper, we study the parameterized complexity of the problem vc-pm* that decides if a given graph with perfect matching has a vertex cover of size bounded by n/2 + k. We first present an algorithm of running time O ∗ (4k ) for a variation of the vertex cover problem on K¨ onig graphs with perfect matching. This algorithm combined with the iterative compression technique leads to an algorithm of running time O ∗ (9k ) for the problem vc-pm*. Our result improves the previous best algorithm of running time O ∗ (15k ) for the vc-pm* problem, which reduces the problem to the almost 2-sat problem and solves the latter by Razgon and O’Sullivan’s recent algorithm. Keywords

NP-complete, parameterized algorithm, vertex cover, iterative compression

Citation Wang J X, Li W J, Li S H, et al. On the parameterized vertex cover problem for graphs with perfect matching. Sci China Inf Sci, 2014, 57: 072107(12), doi: 10.1007/s11432-013-4845-2

1

Introduction

The vertex cover problem is a classical NP-complete problem [1], with important applications in fields such as computational biochemistry [2,3]. Various computational approaches to the problem, including approximation algorithms (see the survey article [4]) and parameterized algorithms [5–8], have been extensively studied. An approximation algorithm of ratio 2 for the problem can be easily achieved based on graph maximal matching [4], while it is now known that the problem is unlikely to have a polynomialtime approximation algorithm of ratio bounded by a constant smaller than 2 [9]. The best parameterized algorithm for the problem has a running time O∗ (1.2738k ) [10] (Following the recent convention, we use the notation O∗ (f (k)) to denote the bound O(f (k)nO(1) )), and it has been proved that the problem has no subexponential-time parameterized algorithms unless the Exponential Time Hypothesis (ETH) fails [11]. There has also been research on heuristic algorithms (e.g. [12]) for solving the minimum vertex cover problem. The performance of these algorithms is usually evaluated based on well-established DIMACS [13] and BHOSLIB [14] benchmarks. The vertex cover problem on graphs with perfect matching has been studied more recently. The problem is NP-hard [15]. Approximation algorithms for the problem have been studied [15,16]. Moreover, it is known that the problem and the vertex cover problem on general graphs have the same approximability threshold [15,17]. Combining this result with the inapproximability results in [9], we derive that the problem is unlikely to have a polynomial-time approximation algorithm of ratio bounded by a constant ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:2

smaller than 2. On the other hand, since a vertex cover of an n-vertex graph with perfect matching contains at least n/2 vertices, the set of all vertices of the graph makes a trivial vertex cover of ratio bounded by 2 for the graph. Therefore, the tight approximation ratio 2 for the vertex cover problem on graphs with perfect matching is closely related to the complexity of deciding if such a graph has its minimum vertex cover of size larger than n/2. In particular, studying if a given graph with perfect matching has its minimum vertex cover of size “slightly” larger than n/2 is interesting. In terms of parameterized complexity, since a minimum vertex cover of an n-vertex graph with perfect matching contains at least n/2 vertices, it seems more reasonable to parameterize the problem by studying whether such a graph has a vertex cover of size bounded by n/2+k, where k is the parameter [18–20]. This way of problem parameterization has been named “parameterizing above or below guaranteed values” in [20], which has received increasing attention recently. In particular, development of parameterized algorithms for the vertex cover problem on graphs with perfect matching that look for a vertex cover of size bounded by n/2 + k is of its independent interest. Recent research has shown close algorithmic relationships between this problem and a number of other important parameterized problems, including the problems above guar vertex cover, perfect vertex deletion, K¨ onig vertex deletion set, and almost 2-sat [20,21]. Motivated by the above observations, in this paper we will focus on the parameterized problem formally defined as follows. • Vertex cover on graphs with perfect matching vc-pm*. Given an n-vertex graph G with perfect matching and a parameter k, is there a vertex cover of size n/2 + k for the graph G? The almost 2-sat problem on an instance F of 2-sat looks for deleting k clauses from F so that the resulting formula becomes satisfiable. Razgon and O’Sullivan [21] recently proposed an algorithm that solves the almost 2-sat problem in time O∗ (15k ). On the other hand, Chen and Kanj [15] presented a polynomial time reduction that, on an n-vertex graph G with perfect matching, constructs an instance FG of 2-sat such that the graph G has a vertex cover of size n/2 + k if and only if there are k clauses in FG whose removal makes FG satisfiable. Combining these two results gives an algorithm of running time O∗ (15k ) for the problem vc-pm*. In this paper, we study more efficient algorithms for the problem vc-pm*. We start our study with a variation of the problem on K¨onig graphs (a graph is K¨onig if the size of its maximum matching is equal to that of its minimum vertex cover), which is formulated as follows. • Vertex cover on k¨ onig graphs with perfect matching vc-kpm*. Given an n-vertex K¨onig graph G with perfect matching, a parameter k, and a set U of vertices in G, is there a vertex cover X of size n/2 + k such that U ⊆ X? We remark that algorithms for some parameterized problems on K¨ onig graphs (but not including vc-kpm*) have been studied recently [22]. We first study the structures of K¨ onig graphs and of the vc-kpm* problem, which enables us to develop an algorithm of running time O∗ (4k ) for the vc-kpm* problem. Our algorithm has a style similar to that of Razgon-O’Sullivan’s algorithm for the annotated almost 2-sat problem (2-aslasat) [21]. However, our algorithm is more efficient (Razgon-O’Sullivan’s algorithm for the 2-aslasat problem runs in time O∗ (5k )). We then show that the algorithm for the vc-kpm* problem combined with the iterative compression techniques [23] gives an algorithm of running time O∗ (9k ) for the vc-pm* problem, improving the previous best algorithm, as described above, for the vc-pm* problem. As a byproduct, the analysis techniques employed in our discussion enable us to derive that RazgonO’Sullivan’s algorithm for the almost 2-sat problem actually runs in time O∗ (11k ), which is better than the bound O∗ (15k ) originally claimed in [21].

2

On K¨ onig graphs with perfect matching

In this section, we introduce some related terminology of general graphs, and study the properties of K¨ onig graphs.

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:3

Let G = (V , E) be a simple and undirected graph. For two vertices u, v ∈ V , denote by [u, v] an edge between u and v. For a vertex v ∈ V , let N (v) denote all neighbors of v in G, i.e., N (v) = {u | u ∈ V , [u, v] ∈ E}. For a subset V  ⊆ V , let G[V  ] be the graph induced by V  . If a matching M in the graph G saturates all vertices in G, M is called a perfect matching of G, and we will use |M | to denote the number of edges in M . We say e is a matched edge if e ∈ M , otherwise, e is an unmatched edge. For an edge subset E  ⊆ E, let V(E  ) be the set of endpoints of edges in E  . A graph G is K¨onig if the size of a minimum vertex cover of G equals the number of edges in a maximum matching of G. There is a linear-time algorithm [24] that, when a maximum matching of a graph G is given, tests if G is K¨onig, and in case G is K¨onig, constructs a minimum vertex cover for G. Combining this with the well-known graph matching algorithm [25], we have √ Proposition 1 ([24,25]). There is an O(m n)-time algorithm that tests if a given graph G is K¨onig, and if G is K¨onig then the algorithm constructs a minimum vertex cover for the graph. When a minimum vertex cover Vvc of a K¨onig graph G is given, we also have an independent set Vis = V \ Vvc of the graph G. Moreover, it is not difficult to verify that a matching M of G is maximum if and only if M saturates all vertices in Vvc and if every edge in M has an end in Vis and the other end in Vvc . In fact, this gives a well-known characterization of K¨onig graphs, stated as follows. Proposition 2 ([22,26]). A graph G = (V , E) is K¨onig if and only if its vertex set V can be partitioned into two sets Vis and Vvc , where Vis is an independent set and there is a matching M saturating Vvc such that each edge in M has one end in Vis and one end in Vvc . By Propositions 1 and 2 and with a polynomial-time pre-processing, we can assume that an instance of vc-kpm* is given as a tuple (G, M , Uvc , Uis , k), where G = (V , E) is an n-vertex K¨ onig graph whose vertex set V is decomposed into V = Vis ∪ Vvc with Vis a maximum independent set and Vvc a minimum vertex cover of G, Uvc ⊆ Vvc , Uis ⊆ Vis , and M is a perfect matching in which each edge has one end in Vis and the other end in Vvc . The instance (G, M , Uvc , Uis , k) is looking for a vertex cover X of size n/2 + k such that U = Uvc ∪ Uis ⊆ X. Furthermore, corresponding to (G, M , Uvc , Uis , k), we assume that (G, M , Uvc , Uis ) is an instance of the optimal version of vc-kpm* problem, which is looking for a minimum vertex cover X  of G such that U = Uvc ∪ Uis ⊆ X  . Let (G, M , Uvc , Uis , k) be an instance of vc-kpm*, where G = (Vis ∪ Vvc , E) and Uis ∪ Uvc = U . We first study the conditions under which G has a vertex cover X of size exact n/2 with U ⊆ X. An M -alternating walk from a vertex v1 in the graph G is a walk {v1 , v2 , . . . , vh } in G such that [v1 , v2 ] ∈ M , and that for any two consecutive edges in the walk, exactly one is in M . Note that we allow vertices and edges to repeat in an M -alternating walk. A U -walk in G (with respect to M ) is an M -alternating walk whose both ends are in U and whose last edge is also in M . Note that by definition, a U -walk is always of odd length. Lemma 1. Let U be a vertex subset in an n-vertex K¨onig graph G with a perfect matching M . Then G has a vertex cover X of size n/2 such that U ⊆ X if and only if there is no U -walk in G with respect to M . Proof. Suppose that the graph G has a vertex cover X of size n/2 such that U ⊆ X. Then each edge in the matching M has exact one end in X. Assume W = {v1 , v2 , . . . , v2h } is a U -walk, which means {v1 , v2h } ⊆ U and all edges [v2i−1 , v2i ] on W , i = 1, . . . , h, are in the matching M , so each of them has exact one end in X. For v1 ∈ U , it is easy to verify by induction that for all i, the vertex v2i−1 is in X and the vertex v2i is not in X – but this contradicts v2h ∈ U . This proves that if G has a vertex cover X of size n/2 with U ⊆ X, then there is no U -walk. For the other direction, suppose that there is no U -walk. Define a vertex subset X as follows: 1) For any M -alternating walk W = {v1 , v2 , . . . , v2h−1 , v2h } such that v1 ∈ U , include v2i−1 in the subset X, for all i = 1, . . . , h. 2) If no end of an edge e in the matching M is included in X by rule 1), then include the end of e that belongs to Vvc in the subset X. We verify that the subset X is a vertex cover of size n/2 for the graph G and that U ⊆ X.

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:4

Since M is a perfect matching, each vertex v in U plus the edge in M that is incident to v makes an M alternating walk that, by rule 1), includes v in X. This proves U ⊆ X. We now prove that each edge in the matching M has exactly one end in X. By rule 2), each edge in M has at least one end in X. Suppose that an edge [v, w] in M has both its ends in X, where v ∈ Vis and w ∈ Vvc . The vertex v cannot be included in X by rule 2); thus it must satisfy the condition in rule 1). This implies that vertex w should also be included in X by rule 1). Therefore, there exist an M -alternating walk W1 = {v1 , v2 , . . . , v2i−1 , v2i }, where v1 ∈ U and v2i−1 = v, and an M -alternating walk W2 = {w1 , w2 , . . . , w2j−1 , w2j }, where w1 ∈ U and w2j−1 = w. Note that we must also have v2i = w and w2j = v, because each vertex is the end of exactly one edge in the matching M . Therefore, the concatenation of the walk W1 and the reversed walk of W2 : {v1 , v2 , . . . , v2i−2 , v, w, w2j−2 , . . . , w2 , w1 } is a U -walk. But this contradicts the assumption that there is no U -walk. This contradiction shows that each edge in M has exactly one end in the subset X. As a consequence, |X| = n/2. What remains to prove is that X is a vertex cover for the graph G. By rule 2), each edge in the matching M has at least one end in X. Thus, if X is not a vertex cover for G, then there is an edge [v, w] not in M such that neither v nor w is in X. The vertices v and w cannot be both in Vis because Vis is an independent set. Thus, we can assume without loss of generality that v ∈ Vvc . Let [v, v  ] and [w, w ] be the edges in the matching M . Then v  ∈ Vis . As proved above, each edge in M has exact one end in X. Thus, v  ∈ X. Because v  ∈ Vis , v  must be included in X by rule 1), i.e., there is an M -alternating walk W3 = {v1 , v2 , . . . , v2i−1 , v2i }, where v1 ∈ U and v2i−1 = v  . This gives immediately v2i = v. Now, consider the M -alternating walk {v1 , v2 , . . . , v2i−1 , v2i , w, w }, which, by rule 1), would have put w in the set X, contradicting our assumption that w ∈ X. This contradiction shows that the edge [v, w] cannot exist. As a consequence, the subset X must be a vertex cover for the graph G. Let (G, M , Uvc , Uis , k) be an instance of vc-kpm*, where G = (V , E). Let M  be a set of edges in M and let V  = V(M  ). Denote by G = G(V , E \ M  ) the graph G with the edges in M  removed. We say that a path p in G is a U -walk in G if p is a U -walk in G. Moreover, note that the graph G[V \ V  ], i.e., the graph G with the vertices in V  removed, is a K¨ onig graph in which the edge set M  = M \ M  is a perfect matching. Our algorithm is based on the following theorem. Theorem 1. (G, M , Uvc , Uis , k) is a yes-instance of vc-kpm* if and only if there is a set M  of k edges in M such that the K¨onig graph G[V \ V  ] with the perfect matching M  = M \ M  has no U  -walk with respect to M  , where V  = V(M  ) and U  = U \ V  , which is true if and only if the graph G = G(V , E \ M  ) contains no U -walks. Proof. Assume that (G, M , Uvc , Uis , k) is a yes-instance of vc-kpm*, so the n-vertex graph G has a vertex cover X of size n/2 + k with U = Uvc ∪ Uis ⊆ X. Since M is a perfect matching in G, there are exact k edges in M whose both ends are in X. Let M  be the set of these k edges in M , and let V  = V(M  ) and U  = U \ V  . Then the (n − 2k)-vertex K¨onig graph G[V \ V  ] has a vertex cover X  = X \ V  of size (n/2 + k) − 2k = (n − 2k)/2 with U  ⊆ X  . By Lemma 1, the graph G[V \ V  ] has no U  -walk with respect to the perfect matching M  = M \ M  . Thus, if (G, M , Uvc , Uis , k) is a yes-instance of vc-kpm*, then every U -walk in G contains some vertices in V  = V(M  ). Since a U -walk W in G must start and end with edges in M , if W contains an endpoint of an edge e in M , then W must also contain the edge e. Therefore, if (G, M , Uvc , Uis , k) is a yes-instance of vc-kpm*, then every U -walk in G contains some edges in M  . As a consequence, the graph G = (V , E \ M  ) contains no U -walks. For the other direction, suppose the graph G = (V , E \ M  ) contains no U -walks for a set M  of k edges in M . Then every U -walk in G must contain at least one endpoint of some edge in M  . Let V  = V(M  ) and U  = U \ V  . Then the (n − 2k)-vertex K¨onig graph G[V \ V  ] has no U  -walk with respect to M  = M \ M  . By Lemma 1, the graph G[V \ V  ] has a vertex cover X  of size (n − 2k)/2 with U  ⊆ X  . Obviously, the set X = X  ∪ V  makes a vertex cover for the graph G and U ⊆ U  ∪ V  ⊆ X  ∪ V  = X, and the set X contains |X  | + |V  | = (n − 2k)/2 + 2k = n/2 + k vertices. Therefore, (G, M , Uvc , Uis , k) is a yes-instance of vc-kpm*.

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:5

By Theorem 1, to decide whether (G, M , Uvc , Uis , k) is a yes-instance of vc-kpm*, it suffices to determine whether there is a set M  of k edges in M that satisfies the conditions of the theorem. Let G = (Vvc ∪ Vis , E), where in the vertex partition Vvc ∪ Vis , every edge in the perfect matching M has one end in Vvc and the other end in Vis , Uvc ⊆ Vvc , and Uis ⊆ Vis . Also let U = Uvc ∪ Uis . We start with the following simple observation: Lemma 2. Let W = {v1 , v2 , . . . , v2h } be a U -walk with v1 ∈ Uvc . Then for all i, v2i−1 ∈ Vvc and v2i ∈ Vis . In particular, no U -walk can have both its end-vertices in the set Uvc . Proof. Consider the U -walk W = {v1 , v2 , . . . , v2h }. If v1 ∈ Uvc ⊆ Vvc , then v2 must be in Vis because by the definition of a U -walk, [v1 , v2 ] is an edge in M in which every edge has one end in Vvc and the other end in Vis . If 2h  4, we will also derive that v3 is in Vvc because [v2 , v3 ] is an edge in G and Vis is an independent set. Now a simple induction shows that for all i, v2i−1 ∈ Vvc and v2i ∈ Vis . This derives immediately that the end-vertex v2h of W is in Vis , and thus cannot be in Uvc , i.e., the U -walk W cannot have both its end-vertices in Uvc . Lemma 2 indicates that a U -walk in the graph G either 1) has one end-vertex in Uvc and the other end-vertex in Uis : in this case, we call the U -walk a Uvc -Uis walk, or 2) has both end-vertices in Uis : in this case, we call the U -walk a Uis -Uis walk. Lemma 3. There is a linear-time algorithm that, on a given instance (G, M , Uvc , Uis , k) of vc-kpm*, onig graph G if such a U -walk exists. constructs a Uvc -Uis walk with respect to M in the K¨ Proof. Consider a Uvc -Uis walk W = {v1 , v2 , . . . , v2h } in G, where v1 ∈ Uvc . By Lemma 2, each edge [v2i−1 , v2i ] in W ∩ M has its first vertex v2i−1 in Vvc and its second vertex v2i in Vis , and each edge [v2i , v2i+1 ] in W \ M has its first vertex v2i in Vis and its second vertex v2i+1 in Vvc . In particular, the Uvc -Uis walk W does not contain an edge whose both end-points are in Vvc . We construct a bipartite directed graph DG from the graph G as follows: 1) remove all edges whose both end-points are in Vvc ; 2) for each edge [v, w] in M where v ∈ Vvc and w ∈ Vis , assign a direction from v to w to the edge; and 3) for each edge [v, w] not in M where v ∈ Vvc and w ∈ Vis , assign a direction from w to v to the edge. By the above analysis, there is a one-to-one correspondence between Uvc -Uis walks in the graph G and directed walks in DG that starts with a vertex in Uvc and ends at a vertex in Uis (without any confusion, this kind of directed walks in DG will also be called Uvc -Uis walks in DG ). Uvc -Uis walks in the directed graph DG can be tested and constructed following the idea of the wellknown linear-time algorithm that constructs augmenting paths in a bipartite graph: we start with all the vertices in the set Uvc (level-0 vertices), and then apply a Breadth-First-Search process on the directed graph DG . The graph DG has a Uvc -Uis walk if and only if the Breadth-First-Search process encounters a vertex in Uis . Since there is a one-to-one correspondence between Uvc -Uis walks in DG and Uvc -Uis walks in the K¨onig graph G, this algorithm can be used to construct Uvc -Uis walks in G when such walks exist. A U -walk separator for an instance (G, M , Uvc , Uis ) of vc-kpm* is a set E  of edges in G such that every U -walk in G contains at least one edge in E  . Similarly, a Uvc -Uis separator for an instance (G, M , Uvc , Uis ) of vc-kpm* is a set E  of edges in G such that every Uvc -Uis walk in G contains at least one edge in E  . A minimum Uvc -Uis separator is a Uvc -Uis separator that contains the fewest edges over all Uvc -Uis separators. The number of edges in a minimum Uvc -Uis separator is denoted by min(Uvc -Uis ). Lemma 4. A minimum Uvc -Uis separator for an instance (G, M , Uvc , Uis ) of vc-kpm* can be constructed in time O(n3 ). Proof. As we did in Lemma 3, from the instance (G, M , Uvc , Uis ) of vc-kpm*, we can construct a bipartite directed graph DG by 1) removing all edges whose both end-vertices are in Vvc ; 2) for each edge [v, w] in M where v ∈ Vvc and w ∈ Vis , assign a direction to the edge [v, w] from v to w; and 3) for each edge

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:6

[v, w] not in M where v ∈ Vvc and w ∈ Vis , assign a direction to the edge [v, w] from w to v. The directed graph DG has the property that there is a one-to-one correspondence between the Uvc -Uis walks in the graph G and the directed walks in DG from vertices in Uvc to vertices in Uis . Therefore, constructing a minimum Uvc -Uis separator for the instance (G, M , Uvc , Uis ) of vc-kpm* can be implemented by constructing a minimum arc separator in the directed graph DG from the vertex set Uvc to the vertex set Uis , while the latter can be solved in time O(n3 ) by reducing the problem to the maximum flow problem [27]. The following lemma allows us to focus on a minimum Uvc -Uis separator whose edges are all in the perfect matching M . Lemma 5. For each instance (G, M , Uvc , Uis , k) of vc-kpm*, there is a minimum Uvc -Uis separator E0 such that E0 ⊆ M , and E0 can be constructed in time O(n3 ). Proof. Let E  be a minimum Uvc -Uis separator of (G, M , Uvc , Uis ) that contains an edge e = [u, v] ∈ / M. Let e = [u, u ] be the edge in M that has the vertex u as an endpoint. Let E  = E  \ {e} ∪ {e }. We show that the edge set E  is also a minimum Uvc -Uis separator. Every Uvc -Uis walk W contains at least one edge in E  . If W does not contain the edge e, then W contains an edge in E  \ {e} ⊆ E  . On the other hand, if W contains the edge e, then W contains the vertex u. By the definition of U -walks and since W must start and end with edges in M , the vertex u in W implies that the edge in M that contains u, i.e., the edge e = [u, u ], must also be in W . Therefore, if this case, W also contains an edge in E  = E  \ {e} ∪ {e }. This proves that E  is a Uvc -Uis separator, which is minimum because |E  |  |E  |. Therefore, from a minimum Uvc -Uis separator that contains unmatched edges, we can always construct a minimum Uvc -Uis separator that contains one less unmatched edge and one more matched edge. Iterating this process gives a minimum Uvc -Uis separator E0 whose edges are matched edges. It is easy to see that the minimum Uvc -Uis separator E0 can be constructed from any minimum Uvc -Uis separator in linear time. Now the lemma follows from Lemma 4. Finally, we observe the following property for Uis -Uis walks. Lemma 6. For an instance (G, M , Uvc , Uis , k) of vc-kpm*, where G = (Vvc ∪ Vis , E), every Uis -Uis walk in G contains an edge whose both end-vertices are in Vvc . Proof. First note that Vis is an independent set in the graph G. Therefore, every edge in G either has its both end-vertices in Vvc , or has one end-vertex in Vvc and the other end-vertex in Vis . Therefore, for a Uis -Uis walk W = {v1 , v2 , . . . , v2h } in G, where v1 ∈ Uis ⊆ Vis , if W contains no edge with both end-vertices in Vvc , then it is easy to see that for all i, 1  i  h, the vertex v2i−1 is in Vis and the vertex v2i is in Vvc – but this contradicts the fact v2h ∈ Uis ⊆ Vis because W is a Uis -Uis walk. This completes the proof. We say that a Uvc -Uis walk W = {v1 , v2 , . . . , v2h } is a short Uvc -Uis walk if v1 is the only vertex in W that is in Uvc . Note that once we have a Uvc -Uis walk W = {v1 , v2 , . . . , v2h }, a short Uvc -Uis walk can be easily constructed: let i be the largest integer such that the vertex v2i−1 in W is in Uvc . Then {v2i−1 , v2i , . . . , v2h } is a short Uvc -Uis walk. By Lemma 3, if the graph G has Uvc -Uis walks, then a short Uvc -Uis walk in G can be constructed in linear time. And it is easy to see that there is no short Uvc -Uis walk if and only if there is no Uvc -Uis walk.

3

Solving the vc-kpm* problem

Now we are ready to present a parameterized algorithm for vc-kpm*. According to Theorem 1, this is actually to find a minimum number of edges to break all U -walks. The outline of our algorithm, described in Table 1, will be analyzed by measure and bound in Theorem 2.

Wang J X, et al.

Sci China Inf Sci

Table 1

July 2014 Vol. 57 072107:7

The algorithm KPM

KPM(G, M , Uvc , Uis , k) Input: an instance (G, M , Uvc , Uis , k) of vc-kpm*, where G = (Vvc ∪ Vis , E), and k  0. Output: To determine if a vertex cover X of size n/2 + k for G exists with Uvc ∪ Uis ⊆ X. 1. if there is no U -walk in G, then return ‘yes’; 2. if k = 0 then return ‘no’; 3. if G has Uvc -Uis walks then choose an edge e = [v1 , v2 ] on a short Uvc -Uis walk 3.1 if v2 ∈ Uis then return KPM(G \ {v1 , v2 }, M \ [v1 , v2 ], Uvc \ {v1 }, Uis \ {v2 }, k − 1)  -U ) of (G, M , U  = U 3.2 else if min(Uvc -Uis ) of (G, M , Uvc , Uis ) =min(Uvc vc ∪ N (v2 ), Uis ) is vc then return KPM(G, M , Uvc ∪ N (v2 ), Uis , k) 3.3 else 3.3.1 R1 = KPM(G \ {v1 , v2 }, M \ [v1 , v2 ], Uvc \ {v1 }, Uis , k − 1) 3.3.2 if R1 is not ‘no’ then return ‘yes’ 3.3.3 R2 =KPM(G, M , Uvc ∪ N (v2 ), Uis , k) 3.3.4 if R2 is not ‘no’ then return ‘yes’ 3.4 return ‘no’ 4. if G has no Uvc -Uis walk then choose an edge e = [v1 , v2 ] on an Uis -Uis walk such that {v1 , v2 } ⊆ Vvc 4.1 R3 = KPM(G, M , Uvc ∪ {v1 }, Uis , k) 4.2 if R3 is not ‘no’ then return ‘yes’ 4.3 R4 = KPM(G, M , Uvc ∪ {v2 }, Uis , k) 4.4 if R4 is not ‘no’ then return ‘yes’ 4.5 return ‘no’

At the beginning of every recursive invocation the algorithm checks if there is no U -walk in G and, if so, returns yes; else if k = 0, returns no. If k = 0 and G has Uvc -Uis walks, then the algorithm chooses an edge e = [v1 , v2 ] on a short Uvc -Uis walk. In this case, if v2 ∈ Uis , e is removed. Else if min(Uvc -Uis ) of   (G, M , Uvc , Uis ) equals min(Uvc -Uis ) of (G, M , Uvc = Uvc ∪ N (v2 ), Uis ), e does not need to be removed and the algorithm calls recursively on (G, M , Uvc ∪ N (v2 ), Uis , k). Otherwise, the algorithm branches on removing e and not. If there is no Uvc -Uis walk in G, there must be Uis -Uis walks in G. Then the algorithm chooses an edge e = [v1 , v2 ] on an Uis -Uis walk such that {v1 , v2 } ⊆ Vvc . In this case, e does not need to be removed, and the algorithm calls itself recursively on (G, M , Uvc ∪ {v1 }, Uis , k) and (G, M , Uvc ∪ {v2 }, Uis , k). Before proving the correction of the algorithm, we show and prove a critical observation, which is formulated as the following lemma. Lemma 7. Let {v1 , v2 , ...} be a short Uvc -Uis walk with length at least 3. If min(Uvc -Uis ) of (G, M , Uvc ,    Uis ) equals min(Uvc -Uis ) of (G, M , Uvc , Uis ), where Uvc = Uvc ∪N (v2 ), then the instance (G, M , Uvc , Uis )  , Uis ) has a U  -walk has a U -walk separator of size bounded by k if and only if the instance (G, M , Uvc separator of size bounded by k, where U  = U ∪ N (v2 ).  Proof. For the sufficient direction, let S  be a U  -walk separator for the instance (G, M , Uvc , Uis ) with    size bounded by k. Then S breaks all the U -walks in G, which means S breaks all the Uvc -Uis walks and all the Uis -Uis walks in G, and S  must also be a U -walk separator for the instance (G, M , Uvc , Uis ).   -Uis ) of (G, M , Uvc , For the necessary direction, since min(Uvc -Uis ) of (G, M , Uvc , Uis ) equals min(Uvc Uis ), there exists a minimum Uvc -Uis separator C for (G, M , Uvc , Uis ) avoiding all the edges adjacent   -Uis separator for (G, M , Uvc , Uis ). Let S be a U -walk to v2 . In other words, C is also a minimum Uvc separator for (G, M , Uvc , Uis ) with size bounded by k. In the following, we will construct a U -walk  separator S  for (G, M , Uvc , Uis ) with size no larger than |S|. Obviously, we can assume that S and C consist of matched edges, otherwise, we can construct such edge set in linear time, see the proof of Lemma 5.  / S, then S  = S is a U  -walk separator for (G, M , Uvc , Uis ). Let W be a U  -walk of If [v1 , v2 ] ∈  (G, M , Uvc , Uis ) but not a U -walk of (G, M , Uvc , Uis ). Then W must be a U  -walk from a vertex in N (v2 ) to a vertex in Uis . Assume W = {v3 , . . . , v2h }. Then there is a corresponding U -walk W  =

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:8

{v1 , v2 , v3 , . . . , v2h } of (G, M , Uvc , Uis ). Since S is a U -walk separator for (G, M , Uvc , Uis ), S breaks W  . Furthermore, S consists of matched edges and [v1 , v2 ] ∈ / S. Then S must break W . Therefore, S  is also a U  -walk separator for (G, M , Uvc , Uis ). Now, we just consider the case where [v1 , v2 ] ∈ S. Same as we did in Lemma 4, we construct a bipartite directed graph DG = (V , A) from G. Let Cd = {(i, j)|(i, j) ∈ A, [i, j] ∈ C} and Sd = {(i, j)|(i, j) ∈ A, [i, j] ∈ S} be the arc sets in DG which consists of arcs corresponding to the edges in C and S respectively. We know that there is a one-to-one correspondence between the Uvc -Uis walks in the graph G and the directed walks in DG from vertices in Uvc to vertices in Uis , which means that S is a Uvc -Uis separator for (G, M , Uvc , Uis ) if and only if Sd is an arc set that breaks all the directed walks from Uvc to Uis in DG . A well-known fact is that arc separator functions, in directed graph, are submodular1) , which is critical to the construction of S  and can be shown briefly as follows. Let DG = (V , A) be a directed graph, for a vertex subset V  ⊆ V , δ + (V  ) = {(i, j)|(i, j) ∈ A, i ∈  V , j ∈ V \V  } is the arc set of DG consisting of all arcs from V  to V \V  in DG , i.e., the tail and the head of the arc in δ + (V  ) are in V  and V \V  respectively, f (V  ) = |δ + (V  )| (we use V  to denote V \V  in the following). Similarly, for any two vertex subset V1 , V2 ⊆ V , δ + (V1 : V2 ) = {(i, j)|(i, j) ∈ A, i ∈ V1 , j ∈ V2 } is the arc set of DG consisting of the arcs from V1 to V2 , f (V1 : V2 ) = |δ + (V1 : V2 )|. For the arc separator function, we have f (V1 ) + f (V2 ) − f (V1 ∪ V2 ) − f (V1 ∩ V2 ) = f (V1 ∩ V2 : V2 \V1 ) + f (V1 ∩ V2 : V1 ∪ V2 ) + f (V1 \V2 : V2 \V1 ) + f (V1 \V2 : V1 ∪ V2 ) + f (V1 ∩ V2 : V1 \V2 ) + f (V1 ∩ V2 : V1 ∪ V2 ) + f (V2 \V1 : V1 \V2 ) + f (V2 \V1 : V1 ∪ V2 ) − f (V1 \V2 : V1 ∪ V2 ) − f (V2 \V1 : V1 ∪ V2 ) − f (V1 ∩ V2 : V1 ∪ V2 ) − f (V1 ∩ V2 : V1 ∪ V2 ) − f (V1 ∩ V2 : V1 \V2 ) − f (V1 ∩ V2 : V2 \V1 ) = f (V1 \V2 : V2 \V1 ) + f (V2 \V1 : V1 \V2 )  0.

(1)

For any arc set X of DG = (V , A), let R(X) be the vertices reachable from Uvc in DG \X, i.e., if there is a directed walk from a vertex in Uvc to vertex v in DG \X, then v ∈ R(X), It is worth emphasizing that Uvc ⊆ R(X). Obviously, there are no directed walks from Uvc to Uis in directed graph   DG \δ + (R(Cd )) or DG \δ + (R(Sd )), for C is a minimum Uvc -Uis separator for (G, M , Uvc , Uis ) and S is a U -walk separator for (G, M , Uvc , Uis ). It is also clear that both R(Cd ) and R(Cd ) contain Uvc and are disjoint from Uis , which means there is no directed walk from Uvc to Uis in DG \δ + (R(Cd ) ∩ R(Sd )) or DG \δ + (R(Cd ) ∪ R(Sd )), i.e., δ + (R(Cd ) ∩ R(Sd )) and δ + (R(Cd ) ∪ R(Sd )) are arc separators from Uvc to Uis in DG . If V1 and V2 , in inequality (1), are replaced by R(Cd ) and R(Sd ) respectively, we get f (R(Cd )) + f (R(Sd ))  f (R(Cd ) ∪ R(Sd )) + f (R(Cd ) ∩ R(Sd )).

(2)

Before the construction of S  , we show the fact that δ + (R(Sd )) ⊆ Sd and δ + (R(Cd )) = Cd . First, we prove that, for any arc (i, j) in δ + (R(Sd )) or δ + (R(Cd )), i ∈ Vvc and j ∈ Vis . We assume that there exists an arc (i, j) in δ + (R(Sd )) such that i ∈ Vis and j ∈ Vvc . From the definition of δ + (R(Sd )), we know that i ∈ R(Sd ) and j ∈ V \R(X). Then arc (i, j) must belong to Sd . For i ∈ Vis and i ∈ Vvc , [i, j] is an unmatched edge, which contradicts the fact that S consists of matched edges. Because (i, j) is the only arc whose tail is i and i ∈ R(Sd ), we can conclude that (i, j) ∈ Sd , which means δ + (R(Sd )) ⊆ Sd . Similarly, we can prove that δ + (R(Cd )) ⊆ Cd . Now we are going to prove that Cd ⊆ δ + (R(Cd )). Assume there exists an arc (i, j) in Cd but not in δ + (R(Cd )), for i ∈ Vvc and j ∈ Vis . Then (i, j) is in DG (R(Cd )) or DG (V \R(Cd )). If (i, j) is in DG (R(Cd )), there is a directed walk from a vertex in Uvc to j in DG \Cd . But (i, j) ∈ Cd , i.e., there is no directed walk from Uvc to j in DG \Cd , which is contradictory to the definition of R(Cd ); if (i, j) is in DG (V \R(Cd )), then, obviously, Cd \{(i, j)} is also    a Uvc -Uis separator for (G, M , Uvc , Uis ), which is contradictory to the fact that C is a minimum Uvc -Uis  + separator for (G, M , Uvc , Uis ). Above all, we can conclude that δ (R(Cd )) = Cd . 1) Queyrannein M. An introduction to submodular functions and optimization. http://www.ima.umn.edu/optimization /seminar/queyranne.pdf, 2002.

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:9

  Since C is a minimum Uvc -Uis separator for (G, M , Uvc , Uis ), f (R(Cd )∩R(Sd ))  |Cd | = |δ + (R(Cd ))| = f (R(Cd )). Therefore, inequality (2) still holds true even after being subtracted:

f (R(Sd ))  f (R(Cd ) ∪ R(Sd )).

(3)

Let EG (X) = {[i, j]|[i, j] ∈ E, (i, j) ∈ X} be the edge subset of G corresponding to the arc subset X of DG , and E(Y ) be the edge set of the subgraph of G induced by vertex set Y . Now, we are going to prove that S  = EG (δ + (R(Cd ) ∪ R(Sd ))) ∪ (S ∩ E(R(Cd ) ∪ R(Sd ))) is a U  -walk separator for  (G, M , Uvc , Uis ), and which is no larger than S. Note that δ + (R(Sd )) ⊆ Sd and S ∩ E(R(Sd )) = ∅, S can be considered as EG (δ + (R(Sd ))) ∪ (S ∩ E(R(Sd ))). From inequality (3), we can get |S  |  |S|. In the following, we just need to prove that S   is a U  -walk separator for (G, M , Uvc , Uis ).   From the definition of S , we know that there is no U  -walk from Uvc to Uis in G\S  . Now, we  assume that there is a Uis -Uis walk W in G disjoint from S . So, W must contain an edge e from S; otherwise, S is not a U -walk separator for (G, M , Uvc , Uis ), a contradiction. If e ∈ EG (δ + (R(Cd ) ∪ R(Sd ))) or e ∈ S ∩ E((R(Cd ) ∪ R(Sd ))), e is in S  . Else if e ∈ E(R(Cd ) ∪ R(Sd )), without loss of generality, let e = [vi , vi+1 ], where vi ∈ Vvc , vi+1 ∈ Vis , and let W = {u1 , u2 , . . . , vi , vi+1 , . . . , u2h }.  Since e ∈ E(R(Cd ) ∪ R(Sd )), there is a directed walk from a vertex v ∈ Uvc to vi in DG , which means   W = {v, . . . , vi , vi+1 , . . . , u2h } is a U -walk from v to u2h in G. But this is contradictory to S  breaking  to Uis . Therefore, in both cases, W cannot exist in G\S  . all the U -walks from Uvc  By above construction, we actually show there is always a minimum U  -walk separator for (G, M , Uvc , Uis ) as the same size of minimum U -walk separator for (G, M , Uvc , Uis ), and the theorem follows directly. Theorem 2. vc-kpm* problem with the valid instance (G = (Vvc ∪ Vis , E), M , Uvc , Uis , k) can be correctly solved by algorithm KPM in O∗ (4k ). Proof. We first prove the correctness of algorithm KPM. By Lemma 1, the correctness of steps 1 and 2 are obvious. In order to break all Uvc -Uis walks and Uis -Uis walks, by Theorem 1, KPM needs to find a vertex set V ∗ of G, consisting of at most k vertex pairs saturated by k matched edges, and remove it from G, such that the resulting graph G\V ∗ contains no U -walk whose both end points are in U \ V ∗ . These are handled by step 3 and step 4 respectively. Now, we are going to prove the correctness of step 3. From the proof of Lemma 5 and Theorem 1, we know that when we remove a matched edge e from G, we can remove a matched vertex pair V(e) from G, which is crucial to the design of our algorithm. For any edge e = [u, v], if e is not a matched edge, then it does not need to be removed. There is exactly one matched edge containing u and one matched edge containing v, which means any U -walk containing the unmatched edged e must contain the matched edge e adjacent to e. In other words, intuitively, removing e is better than removing e for any instance of vc-kpm* problem. Let e = [v1 , v2 ] be chosen in step 3. There are three cases for e needed  to be considered. Case 1. v2 ∈ Uis ; Case 2. min(Uvc -Uis ) of (G, M , Uvc , Uis ) equals min(Uvc -Uis ) of   / Uis and min(Uvc -Uis ) of (G, M , Uvc , Uis ) is not (G, M , Uvc , Uis ), where Uvc = Uvc ∪N (v2 ); Case 3. v2 ∈   equal to min(Uvc -Uis ) of (G, M , Uvc , Uis ). For Case 1, obviously, e must be removed from G, otherwise there exists a Uvc -Uis walk {v1 , v2 } in G. This is handled in step 3.1. If v2 ∈ / Uis , i.e., there exists a short  walk whose first two vertices are v1 and v2 . We compare min(Uvc -Uis ) of (G, M , Uvc , Uis ) with min(Uvc    Uis ) of (G, M , Uvc , Uis ). If min(Uvc -Uis ) of (G, M , Uvc , Uis ) equals min(Uvc -Uis ) of (G, M , Uvc , Uis )  , Uis ) which is not larger than (Case 2), by Lemma 7, there exists a U  -walk separator for (G, M , Uvc the minimum U -walk separator for (G, M , Uvc , Uis ). Therefore, algorithm KPM just adds N (v2 ) into  Uvc , and applies itself recursively on (G, M , Uvc , Uis , k), which is handled in step 3.2. Otherwise (Case 3), there are two branches. One is to remove {v1 , v2 } from G, and the other is to add N (v2 ) into Uvc . These are handled in step 3.3.1 and step 3.3.3. By Lemma 6, algorithm KPM in step 4 can choose an edge correctly, and apply itself recursively on (G, M , Uvc ∪ {v1 }, Uis , k)) and (G, M , Uvc ∪ {v2 }, Uis , k). Finally, we analyze the time complexity of algorithm KPM. As same as in [20,21], the execution of the algorithm is recursive and can be depicted by a search tree ST , where each node in ST is

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:10

associated with an instance of vc-kpm* problem. The root of the search tree ST (G, M , Uvc , Uis , k) is associated with (G, M , Uvc , Uis , k). Furthermore, let T (k, r) be the number of leaves of the search tree ST (G, M , Uvc , Uis , k), where r denotes the min(Uvc -Uis ) of (G, M , Uvc , Uis ). From KPM (G, M , Uvc , Uis , k), we can list the following inequalities: by step 3.1, (4) T (k, r)  T (k − 1, r1 ),   where r1 is the min(Uvc -Uis ) of (G \ {v1 , v2 }, M \ [v1 , v2 ], Uvc \ {v1 }, Uis \ {v2 }), Uvc = Uvc \ {v1 },  Uis = Uis \ {v2 }; by step 3.2, (5) T (k, r)  T (k, r2 ),   -Uis ) of (G, M , Uvc ∪ N (v2 ), Uis ), Uvc = Uvc ∪ N (v2 ); where r2 is the min(Uvc by step 3.3, T (k, r)  T (k − 1, r3 ) + T (k, r4 ),

(6)

  where r3 is the min(Uvc -Uis ) of (G \ {v1 , v2 }, M \ [v1 , v2 ], Uvc \ {v1 }, Uis ), r4 is the min(Uvc -Uis ) of (G, M , Uvc ∪ N (v2 ), Uis ); by step 4, (7) T (k, r)  T (k, r5 ) + T (k, r6 ),   where r5 is the min(Uvc -Uis ) of (G, M , Uvc ∪{v1 }, Uis ), r6 is the min(Uvc -Uis ) of (G, M , Uvc ∪{v2 }, Uis ),   Uvc = Uvc ∪ {v1 } and Uvc = Uvc ∪ {v2 }.  -Uis ) of (G, M , Uvc ∪ N (v2 ), Uis ) It is obvious that r − 1 r1 , r3  r. Note that, in step 3.2, min(Uvc  equals min(Uvc -Uis ) of (G, M , Uvc , Uis ). From Lemma 7, we can get r2 = r. In step 3.3, since min(Uvc  Uis ) of (G, M , Uvc , Uis ) is not equal to min(Uvc -Uis ) of (G, M , Uvc , Uis ), we have r4  r + 1. The precondition of executing step 4 is that: G has no Uvc -Uis walk, i.e., min(Uvc -Uis ) of (G, M , Uvc , Uis ) = 0. But for the edge e = [v1 , v2 ] chosen in step 4, if the algorithm KPM adds the vertex v1 or v2 to Uvc , then   there exists at least one Uvc -Uis walk or Uvc -Uis walk in G, which means r5 , r6  1 = r + 1. It is easy to see that 0  r  k. Let T (k, r) = T  (2k − r), and let t = 2k − r. By above inequality (4), T  (t)  T  (t − 1); By (5), t is not decreased, but |Uvc | is increased and it cannot exceed |V |/2; By (6) and (7), T  (t)  2T  (t − 1). Obviously T  (1) = 1. Thus T (k, r) = T  (t)  2t  22k = 4k and the number of nodes in T is bounded by O∗ (4k ). By Lemma 4, the running time of each node is polynomial. Therefore, the running time of algorithm KPM is bounded by O∗ (4k ).

4

Solving the vc-pm* problem

In this section, we will show how to solve the vc-pm* problem. Theorem 3. The vc-pm* problem can be solved in time O∗ (9k ). Proof. For a given instance of vc-pm* problem (G, k), assume that M = {m1 , m2 , . . . , mn/2 } is a perfect matching of G. Iterative compression technique is used to solve the vc-pm* problem. Worth mentioning is that our procedure of iterative compression is different, in some details, from the general one for solving problems on graphs. In our procedure, the increment of the iterative part is not a vertex of G but two vertices (two end points of edge mi in M ), and the compression step is not to compress the solution size from k + 1 to k, but from q + k + 1 to q + k (1  q  n/2). Let Vq = V(m1 ) ∪ V(m2 ) ∪ · · · ∪ V(mq ), and Mq = {m1 , m2 , . . . , mq } (q = 1, 2, . . . , n/2), i.e., Vq is a set of endpoints of edges in M . It is easy to see that any vertex cover of G[Vq ] is larger than or equal to |Vq |/2. If G[Vq−1 ] has a vertex cover X of size q − 1 + k, then X ∪ V(mq ), with size q + k + 1, must be a vertex cover of G[Vq ]. Now, we prove the following truth briefly. Claim 1. If there does not exist a vertex cover of G[Vq ] with size at most q + k, then there does not exist a vertex cover of G with size at most n/2 + k.

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:11

Proof. Obviously, G[V \ Vq ] is a graph with perfect matching, |V \ Vq | = n − 2q, and the minimum vertex cover of G[V \ Vq ] is size of at least n/2 − q. Therefore, if the minimum vertex cover of G[Vq ] is at least q + k, then the minimum vertex cover of G is size of at least q + k + n/2 − q = n/2 + k, i.e., if there does not exist a vertex cover of G[Vq ] with size at most q + k, then there does not exist a vertex cover of G with size at most n/2 + k. This proves Claim 1. Let M ∗ = {[u, v] | {u, v} ⊆ X, [u, v] ∈ M } ∪ {mq }. It is easy to see that G[Vq ] is a graph with perfect matching, and |M ∗ | = (q − 1 + k) − (q − 1) + 1 = k + 1. Obviously, G[Vq \ V(M ∗ )] is a graph with a perfect matching of size q − 1 − k. By enumerating all partitions (D1 , D2 ) of V(M ∗ ), we will find a vertex cover X  of G[Vq ] with size at most q + k satisfying the requirement that X  contains all vertices of D1 and none of D2 . For each vertex v ∈ D2 , N (v) must belong to X  . Therefore, all vertices in U ∗ = N (D2 ) ∩ (Vq \ V(M ∗ )) belong to X  . For the reason that N (D2 ) ⊆ X  , a partition (D1 , D2 ) is called a valid partition if N (D2 ) ∩ D2 = ∅. For each edge e ∈ M ∗ , at least one endpoint of e must belong to X  . Let k + 1 − i be the number of edges whose two endpoints belong to X  . Then there are at most k+1 i i i i=1 Ck+1 2 valid partitions, where Ck+1 denotes the number of combinations of the edges such that, for each edge, there is exactly one endpoint belonging to X  . The compression step of our algorithm is based on valid partition and is to determine whether there is a vertex cover X  of G[V(Mq \ M ∗ )] such that 1) |X  | = (q − 1 − k) + (i − 1); 2) U ∗ ⊆ X  . Let G = (V  = (Vvc ∪ Vis ), E  ), where V  = Vq \ V(M ∗ ), Vvc = X \ V(M ∗ ), Vis = V  \ Vvc ,  E = {e | e ∈ G[V(Mq \ M ∗ )]}, M  = Mq \ M ∗ , U  = U ∗ , k  = (q − 1 − k) + (i − 1) − |V  |/2 = i − 1. If there exist two vertices vi and vj in V  such that [vi , vj ] ∈ M  , we delete vi and vj from G , and decrease k  by 1. For the reason that X is a vertex cover of G[Vq−1 ], Vvc = X \ V(M ∗ ) is a vertex cover of G and Vis is an independent set. Furthermore, each edge in M  exactly saturates one vertex in Vvc  and one vertex in Vis . By Proposition 2, G is a K¨onig graph. Let Uvc = Vvc ∩ U  , Uis = Vis ∩ U  . So  , Uis , k  ) is a valid instance of the vc-kpm* problem. By Theorem 1, the (G = (Vvc ∪ Vis , E  ), M  , Uvc   vc-kpm* problem with input (G = (Vvc ∪ Vis , E  ), M  , Uvc , Uis , k  ) can be solved in O∗ (4k ) = O∗ (4i )  i i i ∗ k time. Therefore, the vc-pm* problem can be solved in time O∗ ( k+1 i=1 Ck+1 2 4 ) = O (9 ).

5

Conclusion

In this paper, we study the Above Guarantee Vertex Cover problem on graphs with perfect matching. By using implicitly enforced parameter and iterative compression technique, a parameterized algorithm with running time O∗ (9k ) is presented, which greatly improves the previous best result O∗ (15k ). Furthermore, through some reduction, this algorithm can be used to solve the K¨ onig vertex deletion problem and r/bsplit vertex deletion problem in time O∗ (9k ) too [28]. Recently, several new results have been obtained on the Above Guarantee Vertex Cover problem. Raman et al. [29] gave a parameterized algorithm of running time O∗ (9k ), using important separator and iterative compression technique. Based on LP-relaxation, Cygan et al. [30] presented an O∗ (4k ) time parameterized algorithm. By analyzing the LP value in LPrelaxation, Narayanaswamy et al. [31] gave a parameterized algorithm of running time O∗ (2.6181k ). We remark that our results are obtained independently. Compared with above results, in this paper, new structure properties of K¨ onig graph are studied, and different algorithmic techniques are developed to solve the Above Guarantee Vertex Cover problem, which are our major contributions.

Acknowledgements This work was supported by National Natural Science Foundation of China (Grant Nos. 61232001, 61173051, 61128006, 71221061), Hunan Provincial Innovation Foundation For Postgraduate (Grant No. CX2011B088).

References 1 Garey M R, Johnson D S. Computers and Intractability: a Guide to the Thoery of NP-completeness. New York: Freeman, W. H. and Company, 1979

Wang J X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072107:12

2 Roth-Korostensky C. Algorithms for building multiple sequence alignments and evolutionary trees. Dissertation for Doctoral Degree. ETH Z¨ urich 13550, 2000 3 Stege U. Resolving conflicts from problems in computational biology. Dissertation for Doctoral Degree. ETH Z¨ urich 13364, 2000 4 Hochbaum D. Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems. In: Hochbaum D, ed. Approximation Algorithms for NP-Hard Problems. Boston: PWS Publishing Company, 1997. 94–143 5 Balasubramanian R, Fellows M R, Raman V. An improved fixed parameter algorithm for vertex cover. Inf Process Lett, 1998, 65: 163–168 6 Chen J, Kanj I A, Jia W. Vertex cover: further observations and further improvements. J Algorithm, 2001, 41: 280–301 7 Downey R G, Fellows M R. Parameterized computational feasibility. In: Clote P, Remmel J, eds. Feasible Mathematics II. Boston: Birkhauser, 1995. 219–244 8 Niedermeier R, Rossmanith P. Upper bounds for vertex cover further improved. Lect Note Comput Sci, 1999, 1563: 561–570 9 Khot S, Regev O. Vertex cover might be hard to approximate to within 2 − . J Comput Syst Sci, 2008, 74: 335–349 10 Chen J, Kanj I A, Xia G. Improved parameterized upper bounds for vertex cover. Lect Note Comput Sci, 2006, 4162: 238–249 11 Cai L, Juedes D W. On the existence of subexponential parameterized algorithms. J Comput Syst Sci, 2003, 67: 789–807 12 Cai S, Su K, Sattar A. Local search with edge weighting and configuration checking heuristics for minimum vertex cover. Artif Intell, 2001, 175: 1672–1696 13 Johnson D S, Trick M A, eds. Cliques, Coloring, and Satisfiability: second DIMACS Implementation Challenge. American Mathematical Society, 1996. Benchmarks available at ftp://dimacs.rutgers.edu/pub/challenges 14 Xu K, Boussemart F, Hemery F, et al. Random constraint satisfaction: easy generation of hard (satisfiable) instances. Artif Intell, 2007, 171: 514–534. Benchmarks available at http://www.nlsde.buaa.edu.cn/∼kexu/benchmarks/graphbenchmarks.htm 15 Chen J, Kanj I A. On approximating minimum vertex cover for graphs with perfect matching. Theor Comput Sci, 2005, 337: 305–318 16 Imamura T, Iwama K, Tsukiji T. Approximated vertex cover for graphs with perfect matchings. Trans Inf Syst, 2009, E89-D(8): 2405–2410 17 Chlebik M, Chlebikova J. Minimum 2sat-deletion inapproximability results and relations to minimum vertex cover. Discrete Appl Math, 2007, 155: 172–179 18 Mahajan M, Raman V. Parametrizing above guaranteed values: maxSat and maxCut. J Algorithm, 1999 3: 335–354 19 Mahajan M, Raman V, Sikdar S. Parameterizing MAX SNP problems above guaranteed values. Lect Note Comput Sci, 2006, 4169: 38–49 20 Mahajan M, Raman V, Sikdar S. Parameterizing above or below guaranteed values. J Comput Syst Sci, 2009, 75: 137–153 21 Razgon I, O’Sullivan B. Almost 2-SAT is Fixed-Parameter Tractable. Berlin: Springer, 2008. 551–562 22 Mishra S, Raman V, Saurabh S, et al. The complexity of finding subgraphs whose matching number equals the vertex cover number. Lect Note Comput Sci, 2007, 4835: 268–279 23 Reed B, Smith K, Vetta A. Finding odd cycle transversals. Oper Res Lett, 2004, 32: 299–301 24 Deming R. Independence numbers of graphs—an extension of the K¨ onig-Egerv´ ary theorem. Discrete Math, 1979, 27: 23–33 √ 25 Vazirani V. A theory of alternating paths and blossoms for proving correctness of the O( V E) general graph maximum matching algorithm. Combinatorica, 1994, 14: 71–109 26 Korach E, Nguyen T, Peis B. Subgraph characterization of red/blue-split graph and K¨ onig-Egerv´ ary graphs. In: Proceedings of 17th Annual ACM-SIAM Sympothesis on Discrete Algorithms (SODA), 2006. 842–850 27 Cormen T, Teiserson C, Rivest R, et al. Introduction to Algorithms. 2nd ed. Cambridge: MIT Press, 2011 28 Mishra S, Raman V, Saurabh S, et al. K¨ onig deletion sets and vertex covers above the matching size. Lect Note Comput Sci, 2008, 5369: 836–847 29 Raman V, Ramanujan S, Saurabh S. Paths, flowers and vertex cover. Lect Note Comput Sci, 2011, 6942: 382–393 30 Cygan M, Pilipczuk M, Pilipczuk M, et al. On multiway cut parameterized above lower bounds. Lect Note Comput Sci, 2011, 7112: 1–12 31 Narayanaswamy S, Raman V, Ramanujan S, et al. LP can be a cure for parameterized problems. In: Proceedings of 29th International Symposium on Theoretical Aspects of Computer Science, 2011. 338–349

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072108:1–072108:12 doi: 10.1007/s11432-013-4854-1

Proof systems for planning under 0-approximation semantics SHEN YuPing1 & ZHAO XiShun2 ∗ 1Department 2Institute

of Philosophy, Institute of Logic and Cognition, Sun Yat-sen University, Guangzhou 510275, China; of Logic and Cognition, Department of Philosophy, Sun Yat-sen University, Guangzhou 510275, China Received April 25, 2013; accepted September 25, 2013; published online January 16, 2014

Abstract In this paper we propose Hoare style proof systems called PR0D and PRKW0D for plan generation and plan verification under 0-approximation semantics of the action language AK . In PR0D (resp. PRKW0D ), a Hoare triple of the form {X}c{Y } (resp. {X}c{KWp}) means that all literals in Y become true (resp. p becomes known) after executing plan c in a state satisfying all literals in X. The proof systems are shown to be sound and complete, and more importantly, they give a way to efficiently generate and verify longer plans from existing verified shorter plans by applying so-called composition rule, provided that an enough number of shorter plans have been properly stored. The idea behind is a tradeoff between space and time, we refer it to off-line planning and point out that it could be applied to general planning problems. Keywords

Hoare proof systems, off-line planning, plan generation, plan verification, automated reasoning

Citation Shen Y P, Zhao X S. Proof systems for planning under 0-approximation semantics. Sci China Inf Sci, 2014, 57: 072108(12), doi: 10.1007/s11432-013-4854-1

1

Introduction

Planning refers to the procedure of finding a sequence of actions (i.e., a plan) that leads a possible world from an initial state to a goal state. Recently, planning under incomplete knowledge earns a lot of attention [1–6], one of the well-established logical frameworks is the action language AK [7]. Planning in AK is generally PSPACE-complete [8], i.e., it is a highly intractable problem. Therefore, simplified semantics for AK called i-approximations (i = 0, 1, . . . , ω) are proposed in [7]. Although planning under 0approximation remain PSPACE-complete, it becomes however NP in some interesting restricted cases [8]. Thus, 0-approximation is considered as a moderated semantics. A 0-approximation planner implemented in [3] appears quite successful in finding shorter plans. However, it still suffers from NP-completeness and thus is not good at finding longer plans. Furthermore, the work presented in [3] provides no obvious way to verify the correctness of a generated plan. Generally speaking, plan generation and plan verification are regarded as two challenging issues in the area of planning [9–12]. Many logical approaches have been proposed for plan generation, e.g., planning as propositional satisfiability (SAT) [13–15], planning as quantified Boolean formulas (QBF) [16,17], ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:2

planning as deduction [18], etc. Moreover, a number of approaches [7,18–21] have been dedicated to show the provable correctness of a given plan by using first order logic (FOL). Although these approaches appear efficient to produce or verify shorter plans, they still face a great challenge in generating or verifying longer plans due to the high computational complexity. In this paper, we shall propose sound and complete Hoare style proof systems [22,23] for the 0approximation of AK , which not only serve for plan generation and verification, but more importantly, allow to efficiently generate and verify longer plans from existing verified shorter plans. We briefly explain how this works: for a given domain description D (i.e., a theory describing the planning problem), two sets X, Y of fluent literals, and a plan c, we first consider the verification problem of determining whether D |= {X}c{Y }, that is, whether all literals of Y becomes true after executing c in any initial state in which all literals of X are true. This is achieved by giving a derivation in the corresponding Hoare style proof system whose last element is of the form {X}c{Y }. Provided that the proof system is sound, the derivation is naturally a certificate stating the correctness of the plan c. To see how the above idea applies to plan generation, we may first recall Raymond Reiter’s words [18], “. . . plans can be synthesized as a side-effect of theorem-proving.” Precisely speaking, given D and X, Y , constructing a derivation for {X}c{Y } for some c, equals finding a plan c which leads the world from {X} to {Y }. If there exists such a plan, there will always be such a derivation, provided that the proof system is complete. Now it is intuitive to see that from D |= {X}c1{Y } and D |= {Y }c2 {Z} we should obtain D |= {X}c1 ; c2 {Z}. That is, {X}c1{Y }, {Y }c2 {Z} {X}c1 ; c2 {Z} should be a valid rule in the proof system. One important observation on the above proof systems is that, when the agent (i.e., plan generator or executor) is free from tasks, she could compute shorter verified plans (i.e., proofs) as many as possible and store them into a well-maintained database. Such a database consists of a huge number of proofs of the form {X}c{Y }. Without loss of generality, we may assume these proofs are stored into a graph, where {X}, {Y } are nodes and c is a connecting edge. Provided the database contains enough shorter proofs, the agent could query plan existence quickly. Precisely speaking, asking whether a plan c exists for state {X  } to {Y  }, is equivalent to looking for a path c from {X  } to {Y  } in the graph. This is known as the PATH problem and could be easily computed (NL-complete, see [24]). We refer the above processes to off-line planning and on-line query respectively. The idea of off-line planning can be considered as a tradeoff between space and time [25]. For example, the compilation of a dictionary is a time-consuming off-line process; however, once the dictionary is done, extracting knowledge from it (i.e., on-line query) is very easy. The main contributions of the paper not only include the invention of proof systems for plan generation/verification under incomplete information, but also the off-line concept as a way to handle the high complexity of planning. In other words, our work provides a theoretical foundation for a new kind of methods for efficient plan generation/verification at the cost of spare time preprocessing. The paper is organized as follows. Section 2 recalls the language AK and the 0-approximation semantics. Sections 3 and 4 are devoted to the construction of the proof system and to the proving of the soundness and completeness. Section 5 provides some discussions and concludes the paper.

2

The language AK

The language AK [7] proposed by Baral & Son is a well-known framework for reasoning about sensing actions and conditional planning. In this section we recall the syntax and the 0-approximation semantics of AK . In addition we prove several new properties (e.g. the monotonicity of 0-transition function, see Lemma 1 below) which will be used in the next section.

Shen Y P, et al.

2.1

Sci China Inf Sci

July 2014 Vol. 57 072108:3

Syntax of AK

Two disjoint non-empty sets of symbols, called fluent names (or fluents) and action names (or actions), are introduced as the alphabet of the language AK . A fluent literal is either a fluent f or its negation ¬f . For a fluent f , by ¬¬f we mean f . For a fluent literal p, we define fln(p) := f if p is a fluent f or is ¬f . Given a set X of fluent literals, ¬X is defined as {¬p | p ∈ X}, and fln(X) is defined as {fln(p) | p ∈ X}. The language AK uses four kinds propositions for describing a domain. An initial-knowledge proposition (which is called v-proposition in [7]) is an expression of the form initially p,

(1)

where p is a fluent literal. Roughly speaking, the above proposition says that p is initially known to be true. An effect proposition (ef-proposition for short) is an expression of the form a causes p if p1 , . . . , pn ,

(2)

where a is an action and p, p1 , . . . , pn are fluent literals. We say p and {p1 , . . . , pn } are the effect and the precondition of the proposition, respectively. The intuitive meaning of the above proposition is that p is guaranteed to be true after the execution of action a in any state of the world where p1 , . . . , pn are true. If the precondition is empty then we drop the if part and simply say: a causes p. An executability proposition (ex-proposition for short) is an expression of the form executable a if p1 , . . . , pn ,

(3)

where a is an action and p1 , . . . , pn are fluent literals. Intuitively, it says that the action a is executable whenever p1 , . . . , pn are true. For convenience, we call {p1 , . . . , pn } the ex-preconditions of the proposition. A knowledge proposition (k-proposition for short) is of the form a determines f,

(4)

where a is an action and f is a fluent. Intuitively, the above proposition says that after a is executed the agent will know whether f is true or false. A proposition is either an initial-knowledge proposition, or an ef-proposition, or an ex-proposition, or a k-proposition. Two initial-knowledge propositions initially f and initially g are called contradictory if f = ¬g. Two effect propositions “a causes f if p1 , . . . , pn ” and “a causes g if q1 , . . . , qm ” are called contradictory if f = ¬g and {p1 , . . . , pn } ∩ {¬q1 , . . . , ¬qm } is empty. Definition 1 ( [7]). A domain description in AK is a set of propositions D which does not contain 1) contradictory initial-knowledge propositions, 2) contradictory ef-propositions. Actions occurring in knowledge propositions are called sensing actions, while actions occurring in effect propositions are called non-sensing actions. In this paper we request that for any domain description D the set of sensing actions in D and the set of non-sensing actions in D should be disjoint. Definition 2 (conditional plan [7]). A conditional plan is inductively defined as follows: 1) The empty sequence of actions, denoted by [ ], is a conditional plan. 2) If a is an action then a is a conditional plan. 3) If c1 and c2 are conditional plans then the combination c1 ; c2 is a conditional plan. 4) If c1 , . . . , cn (n  1) are conditional plans and ϕ1 , . . . , ϕn are conjunctions of fluent literals (which are mutually exclusive but not necessarily exhaustive) then the following is a conditional plan (also called a case plan): case ϕ1 → c1 . . . . .ϕn → cn . endcase. 5) Nothing else is a conditional plan. Propositions are used to describe a domain, whereas queries are used to ask questions about the domain. For a plan c, a set X of fluent literals, and a fluent literal p, we have two kinds of queries: Knows X after c,

(5)

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:4

Kwhether p after c.

(6)

Intuitively, query of the form (5) asks whether all literals in X will be known to be true after executing c, while query of the form (6) asks whether p will be either known to be true or known to be false after executing c. 2.2

0-approximation semantics

In this subsection we arbitrarily fix a domain description D without contradictory propositions. From now on when we speak of fluent names and action names we mean that they occur in propositions of D. According to [7], an a-state is a pair (T, F ) of two disjoint sets of fluent names. A fluent f is true (resp. false) in (T, F ) if f ∈ T (resp. f ∈ F ). Dually, ¬f is true (resp. false) if f is false (resp. true). For a fluent name f outside T ∪ F , both f and ¬f are unknown. A fluent literal p is called possibly true if it is not false (i.e., true or unknown). In the following we often use σ, δ to denote a-states. For a set X = {p1 , . . . , pm } of fluent literals, we say X is true in an a-state σ if and only if every pi is true in σ, i = 1, . . . , m. An action a is said to be 0-executable in an a-state σ if there exists an ex-proposition executable a if p1 , . . . , pn , such that p1 , . . . , pn are true in σ. The following notations were introduced in [7]. 1) e+ a (σ) := {f | f is a fluent and there exists “a causes f if p1 , . . . , pn ” in D such that p1 , . . . , pn are true in σ}. 2) e− a (σ) := {f | f is a fluent and there exists “a causes ¬f if p1 , . . . , pn ” in D such that p1 , . . . , pn are true in σ}. 3) Fa+ (σ) := {f | f is a fluent and there exists “a causes f if p1 , . . . , pn ” in D such that p1 , . . . , pn are possibly true in σ}. 4) Fa− (σ) := {f | f is a fluent and there exists “a causes ¬f if p1 , . . . , pn ” in D such that p1 , . . . , pn are possibly true in σ}. 5) K(a) := {f | f is a fluent and “a determines f ” is in D}. For an a-sate σ = (T, F ) and a non-sensing action a 0-executable in σ, the result after executing a − − + is defined as Res0 (a, σ) := ((T ∪ e+ a (σ)) \ Fa (σ), (F ∪ ea (σ)) \ Fa (σ)). The extension order on astates is defined as follows [7]: (T1 , F1 ) (T2 , F2 ) if and only if T1 ⊆ T2 , F1 ⊆ F2 . Please note that if (T1 , F1 ) (T2 , F2 ) then for a fluent literal p we have 1) if p is true (resp. false) in (T1 , F1 ) then p is true (resp. false) in (T2 , F2 ), 2) if p is unknown in (T2 , F2 ) then p must be unknown in (T1 , F1 ), and 3) if p is possibly true in (T2 , F2 ) then p is possibly true in (T1 , F1 ). Consequently, for any non-sensing action a and a-states σ1 and σ2 such that σ1 σ2 and a is 0+ − − executable in σ1 , we have 1) a is 0-executable in σ2 . 2) e+ a (σ1 ) ⊆ ea (σ2 ), and ea (σ1 ) ⊆ ea (σ2 ). 3) + + − − Fa (σ2 ) ⊆ Fa (σ1 ), and Fa (σ2 ) ⊆ Fa (σ1 ). Then we have the following proposition. Proposition 1. For any non-sensing action a and a-states σ1 and σ2 such that σ1 σ2 and a is 0-executable in σ1 , we have Res0 (a, σ1 ) Res0 (a, σ2 ). The 0-transition function Φ0 of D is defined as follows [7]. 1) If a is not 0-executable in σ, then Φ0 (a, σ) := {⊥}. 2) If a is 0-executable in σ and a is a non-sensing action, Φ0 (a, σ) := {Res0 (a, σ)}. 3) If a is 0-executable in σ = (T, F ) and a is a sensing action, then Φ0 (a, σ) := {(T  , F  ) | (T, F )

(T  , F  ) and T  ∪ F  = T ∪ F ∪ K(a)}. 4) Φ0 (a, Σ) := σ∈Σ Φ0 (a, σ). Letting Σ1 , Σ2 be two sets of a-states, we write Σ1 Σ2 if for every a-state δ in Σ2 , there is an a-state σ in Σ1 such that σ δ. The next proposition follows directly from Proposition 1, and the definition of Φ0 (a, σ) above. Proposition 2. Suppose σ1 σ2 and a is an action 0-executable in σ1 . Then Φ0 (a, σ1 ) Φ0 (a, σ2 ).

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:5

0 , which maps pairs of conditional plans and a-states into sets of The extended 0-transition function Φ a-states, is defined inductively as follows.

0 ([ ], σ) := {σ}. Definition 3 ([7]). 1) Φ

2) Φ0 (a, σ) := Φ0 (a, σ). 3) When c is a case plan case ϕ1 → c1 . . . . .ϕk → ck . endcase, 

0 (c, σ) := Φ0 (cj , σ), if ϕj is true in σ, Φ {⊥}, if none of ϕ1 , . . . , ϕk is true in σ. 

0 (c1 ; c2 , σ) := 4) Φ  0 (c1 ,σ) Φ0 (c2 , σ ). σ  ∈Φ

0 (c, ⊥) := {⊥}. 5) Φ

0 (c, σ).

0 (c, Σ) := Φ 6) Φ σ∈Σ

0 of a domain descripRemark 1. From the definitions above we know that transition functions Φ0 and Φ tion D do not depend on any initial-knowledge proposition. In other words, if two domain descriptions D1 and D2 contain the same non initial-knowledge propositions, then their transition functions coincide.

0 (c, σ). A conditional plan c is 0-executable in σ if ⊥ ∈ Φ Lemma 1 (Monotonicity Lemma). Let c be a plan, and Σ1 , Σ2 be two sets of a-states. Suppose Σ1 Σ2 ,

0 (c, Σ1 ) Φ

0 (c, Σ2 ). and c is 0-executable in every a-state on Σ1 . Then Φ Proof. We proceed by induction on the structure of the plan c. 1) Suppose c consists of only an action a. Consider an arbitrary a-state σ2 ∈ Φ0 (a, Σ2 ). Then there is an a-state σ2 = (T2 , F2 ) ∈ Σ2 such that σ2 ∈ Φ0 (a, σ2 ). Since Σ1 Σ2 , pick σ1 = (T1 , F1 ) ∈ Σ1 such that σ1 σ2 . It suffices to show that σ1 σ2 for some σ1 ∈ Φ0 (a, σ1 ). If a is a non-sensing action a, then the assertion follows directly from Proposition 2. Suppose a is a sensing action. Then σ2 must be of the form (T2 ∪ X, F2 ∪ Y ) because a is a sensing action, where X ∪ Y = K(a). Then clearly (T1 ∪ X, F1 ∪ Y ) must be in Φ0 (a, σ1 ). The assertion follows since (T1 ∪ X, F1 ∪ Y ) (T2 ∪ X, F2 ∪ Y ).

0 (c, Σ2 ). 2) Suppose c is a case plan case ϕ1 → c1 . . . . .ϕk → ck . endcase. Consider any a-state σ2 ∈ Φ 

Let σ1 ∈ Σ1 , σ2 ∈ Σ2 be such that σ1 σ2 and σ2 ∈ Φ0 (c, σ2 ). Since c is 0-executable in σ1 , some ϕi

0 (c, σ1 ) = is true in σ1 . Then ϕi is also true in σ2 since σ1 σ2 . Then by the induction hypothesis, Φ

0 (ci , σ1 ) Φ

0 (ci , σ2 ) = Φ

0 (c, σ2 ). Thus, there is σ  ∈ Φ

0 (c, Σ1 ) such that σ  σ  . Consequently, Φ 1 1 2 Φ0 (c, Σ1 ) Φ(c, Σ2 )



0 3) Suppose c = c1 ; c2 . By induction hypothesis by the definition of Φ Φ 0 (c1 , Σ1 ) Φ(c1 , Σ2 ). Then  



0 (c, Σ1 ) =

we have Φ  0 (c1 ,Σ1 ) Φ0 (c2 , σ )

 0 (c1 ,Σ2 ) Φ0 (c2 , σ ) = Φ0 (c, Σ2 ). σ  ∈Φ σ ∈Φ An a-state σ is called an initial a-state of D if for any fluent literal p, the initial-knowledge proposition “initially p” is in D then p is true in σ. Suppose D is a domain description, c is a conditional plan, X is a set of fluent literals, and p is a literal. The semantics for the queries are given below: Definition 4 ([7]). 1) D |=0 Knows X after c if for every initial a-state σ, the plan c is 0-executable

0 (c, σ). in σ, and X is true in every a-state in Φ 2) D |=0 Kwhether p after c if for every initial a-state σ, the plan c is 0-executable in σ, and p is

0 (c, σ). either true or false in every a-state in Φ Let TD := {f | “initially f ” ∈ D}, FD := {f | “initially ¬f ” ∈ D}. Obviously, (TD , FD ) is the least initial a-state of D, that is, (TD , FD ) σ for any initial a-state σ. The following lemma follows easily from Lemma 1. Lemma 2. 1) D |=0 Knows X after c if and only if the plan c is 0-executable in (TD , FD ), and X is

(TD , FD )). true in every a-state in Φ(c, 2) D |=0 Kwhether p after c if and only if the plan c is 0-executable in (TD , FD ), and p is either

(TD , FD )). true or false in every a-state in Φ(c,

Shen Y P, et al.

3

Sci China Inf Sci

July 2014 Vol. 57 072108:6

The proof system PR0D for Knows

A consistent set X of literals determines a unique a-state (TX , FX ) by TX := {f | f ∈ X} and FX := {f | ¬f ∈ X}. And conversely an a-state determines uniquely the set S(T,F ) := T ∪ ¬F . Obviously, p ∈ X if and only if p is true in (TX , FX ) for any literal p. In the following we will not distinguish sets of literals and a-states from each other. For example, Res0 (a, X) is nothing but Res0 (a, (TX , FX )) which

0 (c, X), which can be can be regarded as a set of literals. Analogically, we have notations Φ0 (c, X) and Φ regarded as collections of sets of literals. Definition 5. Let D be a domain description without initial-knowledge propositions. Suppose X, Y are two sets of fluent literals. By D |=0 {X}c{Y } we mean D ∪ ini(X) |=0 Knows Y after c. Here ini(X) := {initially p | p ∈ X}. Suppose D is a general domain description (that is, initially-knowledge propositions are allowed). Let D be the set of all non-initial-knowledge propositions of D, and let X := {p | “initially p” is in D}. Then D |=0 {X}c{Y } is equivalent to D |=0 Knows Y after c.

0 (c, X). By Lemma 2, D |=0 {X}c{Y } if and only if Y is true in every a-state in Φ In the remainder of this section we fixed a domain description D without initial-knowledge propositions. We always use X, Y, X  , Y  to denote consistent sets of fluent literals. The proof system PR0D consists of the following groups of axioms and rules 1–6. 1) AXIOM 1 (Empty). {X}[ ]{X}. 2) AXIOM 2 (Non-sensing action). {X}a{Res0 (a, X)}, where a is a non-sensing action 0-executable in X. 3) RULE 3 (Sensing action). {X ∪ X1 }c{Y }, . . . , {X ∪ Xm }c{Y } , {X}a; c{Y } where a is a sensing action 0-executable in X, and X1 , . . . , Xm are all sets X  of fluent literals such that fln(X  ) = K(a) and X ∪ X  is consistent. 4) RULE 4 (Case). ϕi ⊆ X, {X}ci ; c {Y } , {X}c; c {Y } where c is the case plan case ϕ1 → c1 . . . . . ϕm → cm . endcase, and 1  i  m. 5) RULE 5 (Composition). {X}c1{Y  }, {Y  }c2 {Y } . {X}c1; c2 {Y } 6) RULE 6 (Consequence). X  ⊆ X, {X }c{Y  }, Y ⊆ Y  . {X}c{Y } Definition 6. A proof sequence (or, derivation) of PR0D is a sequence {X1 }c1 {Y1 }, . . . , {Xn }cn {Yn } such that each {Xi }ci {Yi } is either an axiom in PR0D or is obtained from some of {X1 }c1 {Y1 }, . . . , {Xi−1 }ci−1 {Yi−1 } by applying a rule in PR0D . By D 0 {X}c{Y }, we mean that {X}c{Y } appears in some proof sequence of PR0D ; that is, {X}c{Y } can be derived from axioms and rules in PR0D . Example 1 ([7]). A bomb can only be safely defused if its alarm is switched off. Flipping the switch causes the alarm off if it is on and vice versa. At the beginning the agent only knows that the bomb is not disarmed and not exploded; however, it does not know whether or not the alarm is on, i.e., the

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:7

knowledge about initial state of the domain is incomplete. Let ⎧ ⎪ check determines alarm of f ⎪ ⎪ ⎪ ⎪ ⎪ def use causes disarmed if alarm of f ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ def use causes exploded if ¬alarm of f ⎪ ⎪ ⎪ ⎨ switch causes ¬alarm of f if alarm of f D := ⎪ switch causes alarm of f if ¬alarm of f ⎪ ⎪ ⎪ ⎪ ⎪ executable check if ¬exploded ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ executable switch if ¬exploded ⎪ ⎪ ⎪ ⎩ executable def use if ¬exploded

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭

.

Let c be the case plan: case ¬alarm of f → switch. alarm of f → [ ]. endcase, and c = check; c ; def use. Then the following is a proof sequence of PR0D . 1) {X1 }switch{Y1 } (AXIOM 2), 2)

{X1 }c {Y1 }

((1) and RULE 4),

3)

{Y1 }[ ]{Y1 }

(AXIOM 1),

4)



{Y1 }c {Y1 }

((3) and RULE 4),



5) {X2 }check; c {Y1 } ((2), (4) and RULE 3), 6) {Y1 }def use{Y2} (AXIOM 2), 7)

{X2 }c{Y2 }

((5), (6) and RULE 5),

where X1 = {¬disarmed, ¬exploded, ¬alarm of f }, Y1 = {¬disarmed, ¬exploded, alarm of f }, X2 = {¬disarmed, ¬exploded}, Y2 = {disarmed, ¬exploded, alarm of f }. Theorem 1 (Soundness of PR0D ). PR0D is sound. That is, for any conditional plan c and any consistent sets X, Y of fluent literals, D 0 {X}c{Y } implies D |=0 {X}c{Y }. Proof. Suppose D 0 {X}c{Y }. Then {X}c{Y } has a derivation. We shall proceed by induction on the

0 be 0-transition functions of D. Please note that for any set S length of the derivation. Let Φ0 and Φ

0 , respectively (see of fluent literals, the 0-transition functions of D ∪ ini(S) are the same as Φ0 and Φ Remark 1). 1) Suppose {X}c{Y } is an axiom in AXIOM 1. Then X = Y and c = [ ]. Clearly, D |=0 {X}[ ]{X}. 2) Suppose {X}c{Y } is an axiom in AXIOM 2, i.e., c consists of only a non-sensing action a which is

0 (a, X) = {Res0 (a, X)}, it follows that D |=0 {X}a{Y }. 0-executable in X, and Y = Res0 (a, X). Since Φ 3) Suppose {X}c{Y } is obtained by applying a rule in RULE 3. Then c = a; c1 for some sensing action a 0-executable in X, and {X}c{Y } is obtained from {X ∪ X1 }c1 {Y }, . . . , {X ∪ Xm }c1 {Y }, where X1 , . . . , Xm are all sets X  of fluent literals such that fln(X  ) = K(a) and X ∪ X  is consistent. By the induction hypothesis, D |=0 {X ∪ Xi }c1 {Y }, for i = 1, . . . , m. That is, all literals in Y are true in every

0 (c1 , X ∪ Xi ). Please note that Φ0 (a, X) = {X ∪ X1 , . . . , X ∪ Xm }. By the definition of Φ

0 (see set in Φ m

0 (c, X) = Definition 3), Φ i=1 Φ0 (c1 , X ∪ Xi ). Therefore, D |=0 {X}c{Y }. 4) Suppose {X}c{Y } is obtained by applying a rule in RULE 4. That is, c is a plan c1 ; c2 , where c1 is a case plan case ϕ1 → c1 . . . . .ϕn → cn . endcase such that for some i ∈ {1, . . . , n}, ϕi ⊆ X and {X}ci ; c2 {Y } has been derived. By the induction hypothesis, we have D |=0 {X}ci; c2 {Y }. By

0 (c, X) = Φ

0 (c2 , Φ

0 (c1 , σ)) = Φ

0 (c2 , Φ

0 (c , X)) = Φ

0 (c ; c2 , X). Then, all literals Definition 3, we have Φ i i

0 (c, X). Thus, D |=0 {X}c{Y }. of Y are true in Φ 5) Suppose {X}c{Y } is obtained from {X}c1 {Y  } and {Y  }c2 {Y } by applying a rule in RULE 5. By

0 (c1 , X), we have the inductive hypothesis, D |=0 {X}c1 {Y  } and D |=0 {Y  }c2 {Y }. Then for any S ∈ Φ  

0 (c2 , Y ) Φ

0 (c2 , S). Then Y ⊆ S (i.e., (TY  , FY  ) (TS , FS )). Thus, by Lemma 1, Φ 

0 (c2 , Y  )

0 (c2 , S) = Φ

0 (c, X). Φ Φ  0 (c1 ,X) S∈Φ

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:8

It follows that D |=0 {X}c{Y }. 6) Suppose {X}c{Y } is obtained by applying a rule in RULE 6. That is, there is X  ⊆ X and Y  ⊇ Y such that {X  }c{Y  } has been derived. Then by the induction hypothesis, all literals in Y  are known

0 (c, X  ) Φ

0 (c, X). Therefore,

0 (c, X  ), and so are literals in Y . By Lemma 1 we have Φ to be true in Φ D |=0 {X}c{Y }. Altogether, this completes the proof. Theorem 2 (Completeness of PR0D ). PR0D is complete. That is, for any conditional plan c and any consistent sets X, Y of fluent literals, D |=0 {X}c{Y } implies D 0 {X}c{Y }. Proof. Suppose D |=0 {X}c{Y }. We shall show D 0 {X}c{Y }. We shall proceed by induction on the structure of c. 1) Suppose c consists of only an action a. Then a is 0-executable in X. i) Case 1. a is a non-sensing action. Then all literals in Y are true in Res0 (a, X), that is, Y ⊆ Res0 (a, X). By Axiom 2, D 0 {X}a{Res0(a, X)}. Then by RULE 6, we obtain D 0 {X}a{Y }. ii) Case 2. a is a sensing action. Consider any p ∈ Y . We shall show p ∈ X. Suppose otherwise, then X  := X ∪ {¬p} is still consistent. Then Φ0 (a, X) Φ0 (a, X  ). Thus p should also be true in every a-state in Φ0 (a, X  ). On the other hand, ¬p is true in every a-state in Φ0 (a, X  ) since ¬p ∈ X  . This is a contradiction. Thus Y ⊆ X. Then for any set X  such that fln(X  ) = K(a) and X ∪ X  is consistent, we have D 0 {X ∪ X  }[ ]{Y }. Now applying RULE 3 we obtain D 0 {X}a{Y }. 2) Suppose c is a case plan case ϕ1 → c1 . . . . .ϕm → cm . endcase. Since D |=0 {X}c{Y }, it follows that ϕi ⊆ X for some i (otherwise, c would not be 0-executable in X). Then D |=0 {X}ci {Y }. By the induction hypothesis, D 0 {X}ci {Y }. By RULE 4 we have D 0 {X}c{Y }. 3) Suppose c is a composition plan c1 ; c2 . We shall show the assertion by induction on the structure of c1 .

0 (a; c2 , X) = Φ

0 (c2 , Res0 (a, X)). By the induction i) c1 is a non-sensing action a. By Definition 3, Φ hypothesis, D 0 {Res0 (a, X)}c2 {Y }. By AXIOM 2 and RULE 5, we obtain D 0 {X}c{Y }. ii) c1 is a sensing action a. Consider any X  such that fln(X  ) = K(a) and X ∪ X  is consistent. Since D |=0 {X}a; c2 {Y }, it follows that D |=0 {X ∪ X  }c2 {Y }. Then by the induction hypothesis we have D 0 {X ∪ X  }c2 {Y }. By RULE 3 we obtain D 0 {X}a; c2 {Y }. iii) c1 is a case plan case ϕ1 → c1 . . . . .ϕm → cm . endcase. Since c is 0-executable in X, it follows that ϕi ⊆ X for some i. Then D |=0 {X}ci ; c2 {Y }. By the induction hypothesis. D 0 {X}ci ; c2 {Y }. By RULE 4 we have D 0 {X}c1 ; c2 {Y }. iv) c1 is c1 ; c1 such that c and c are not empty. Then c is c1 ; (c1 ; c2 ). Now c1 is shorter. By the induction hypothesis, D 0 {X}c{Y }. Altogether, we complete the proof.

4

The proof system PRKW0D for Knows-Whether

In this section we shall construct a proof system for reasoning about Kwhether p after c (here p is a fluent literal). We also fix an arbitrary domain description D without initial knowledge-propositions. Similar to the notation {X}c{Y }, we introduce notation {X}c{KWp}. Definition 7. Let c be a plan, X be a consistent set of fluent literals, and p a fluent literal. By D |=0 {X}c{KWp} we mean D ∪ ini(X) |=0 Kwhether p after c. Proof system PRKW0D consists of axioms and rules of groups 1–6 in Section 3 and the following groups 7–12. 7) AXIOM 7. {X}a{KWf }, where a is a sensing action 0-executable in X, and f is a fluent name such that the k-proposition “a determines f ” belongs to D. 8) RULE 8. {X}c{{p}} . {X}c{KWp}

Shen Y P, et al.

9) RULE 9.

Sci China Inf Sci

July 2014 Vol. 57 072108:9

{X}c{KWp} . {X}c{KW¬p}

10) RULE 10 (Sensing action). {X ∪ X1 }c{KWp}, . . . , {X ∪ Xm }c{KWp} , {X}a; c{KWp} where a is a sensing action 0-executable in X, and X1 , . . . , Xm are all sets X  of fluent literals such that fln(X  ) = K(a) and X ∪ X  is consistent. 11) RULE 11 (Composition). {X}c1{Y }, {Y }c2 {KWp} . {X}c1 ; c2 {KWp} 12) RULE 12 (Case).

ϕi ⊆ X, {X}ci ; c {KWp} , {X}c; c{KWp}

where c is the case plan case ϕ1 → c1 . . . . .ϕn → cn . endcase, and 1  i  n. Definition 8 (proof sequence of PRKW0D ). A proof sequence (or, derivation) of PRKW0D is a sequence of elements with the form {S1 }c1 {T } or {S}c{KWp} such that each element is either an axiom in PRKW0D or is obtained from some of previous elements by applying a rule in PRWK0D . By D 0 {S}c{KWp}, we mean that {S}c{KWp} appears in some proof sequence of PRKW0D ; that is, {S}c{KWp} can be derived from axioms and rules in PRKW0D . Remark 2. Please note that {X}c{KWp} never appears as a premise in a rule with consequence of the form {X  }c {Y  }. Thus, {X}c{Y } is derivable in PRKW0D if and only if it is derivable in PR0D . So, for derivability of {X}c{Y } in PRKW0D , we still employ the notation D 0 {X}c{Y }. Theorem 3 (soundness of PRKW0D ). Given a plan c, then D 0 {X}c{KWp} implies D |=0 {X}c{KWp} for any consistent set X of fluent literals, and any fluent literal p. Proof. We can show this theorem by induction on the length of derivations. By the soundness of PR0D , there are six cases according to whether {S}c{KWp} is an axiom in AXIOM 7 or obtained by applying a rule in group 8–12. For each case, the proof is easy. We omit the proof. Theorem 4 (completeness of PRKW0D ). Given a plan c, then D |=0 {X}c{KWp} implies D 0 {X}c{KWp} for any consistent set X of fluent literals, and any fluent literal p. Proof. We proceed by induction on the structure of c. Suppose D |=0 {X}c{KWp}. 1) c is empty. Then it must be that p ∈ X or ¬p ∈ X. Then {X}[ ]{{p}} or {X}[ ]{{¬p}} is derivable. Then by RULE 8-9 we can derive {X}[ ]{KWp}. 2) c consists of only a sensing action a. Then a is 0-executable in X. If p ∈ X, it is clear that {X}a{{p}} is derivable. From RULE 8 we derive {X}a{KWp}. By the same argument, if ¬p ∈ X, then D 0 {X}a{KW¬p}, and then we can derive {X}a{KWp} by applying RULE 9. Now we suppose neither p nor ¬p is in X. We claim that the k-proposition “a determines fln(p)” belongs to D (otherwise, p and ¬p would remain unknown in every a-state in Φ0 (a, X). This contradicts the assumption D |=0 {X}a{KWp}). Now we have an axiom {X}a{KW fln(p)}. If p itself is a fluent name then we are done, else we derive {X}c{KWp} by applying RULE 9. 3) c consists of only a non-sensing action a. Since D |=0 {X}a{KWp}, it follows that a is 0-executable in X and either p or ¬p is true in Res0 (a, X). That is, p ∈ Res(a, X) or ¬p ∈ Res(a, X). Since D 0 {X}a{Res(a, X)}, we have D 0 {X}a{{p}} or D 0 {X}a{{¬p}}. Then either {X}a{KWp} or {X}a{KW¬p} can be derived by applying RULE 8. If {X}a{KW¬p} is derivable then we obtain {X}a{KWp} by applying RULE 9. 4) c is a case plan of the form case ϕ1 → c1 . . . . .ϕn → cn . endcase. Then there must be some i ∈ {1, . . . , n} such that ϕi ⊆ X. Otherwise, c would not be 0-executable. Then we can see that

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:10

D |=0 {X}ci {KWp}. By the induction hypothesis, we have D 0 {X}ci {KWp}. Then we can derive {X}c{KWp} by RULE 12. 5) Suppose c = c1 ; c2 such that c1 and c2 are non-empty. We show D 0 {X}c{KWp} by induction on the structure of c1 . i) c1 is a sensing action a. Let X1 , . . . , Xm be all sets X  of fluent literals such that fln(X  ) = K(a) and

0 (c2 , X ∪ Xi ) ⊆ X ∪ X  is consistent. Consider an arbitrary Xi . We have D |=0 {X ∪ Xi }c2 {KWp} since Φ

0 (a; c2 , X). By the induction hypothesis, D 0 {X ∪ Xi }c2 {KWp}. Now by RULE 10 we can derive Φ {X}a; c2{KWp}.

0 (c2 , Res0 (a, X)) = Φ

0 (a; c2 , X), ii) c1 is a non-sensing action a. Then a is 0-executable in X. Since Φ it follows that D |=0 {Res0 (a, X)}c2 {KWp}. By the induction hypothesis, {Res(a, X)}c2 {KWp} is derivable. Please note that {X}a{Res(a, X)} is an axiom in AXIOM 2. By RULE 11, we can derive {X}a; c2{KWp}. iii) c1 is a case plan case ϕ1 → c1 . . . . .ϕn → cn . endcase. We know that ϕi ⊆ X for some i ∈ {1, . . . , n}. It follows that D |=0 {X}ci ; c2 {KWp} since we have assumed D |=0 {X}c1; c2 {KWp}. By the induction hypothesis, D 0 {X}ci ; c2 {KWp}. Now applying RULE 12 we can derive {X}c1; c2 {KWp}. iv) c1 = c1 ; c2 such that c1 , c2 are not empty plan. Then c = c1 ; (c2 ; c2 ). Now c1 is shorter. Then {X}c{KWp} is derivable by the induction hypothesis. Altogether, we complete the proof. Please note that the construction of the proof systems PRKW0D depends essentially on the monotonicity

0 (see Lemma 1). According to [7], an action a is 1-executable in an a-state σ if it is 0property of Φ executable in every complete a-state extending σ. And if a non-sensing action a is 1-executable in σ, then Res1 (a, σ) is defined as the intersection of all Res0 (a, σ  ), σ  ∈ Comp(σ) which is the set of all complete a-states extending σ. Obviously, Res1 is monotonic; that is, if σ δ then Res1 (a, σ) Res1 (a, δ). Thus

1 (for precise definition please see [7]) are also monotonic. Therefore, in the transition functions Φ1 and Φ 0 PRKWD , if we replace {X}a{Res0 (a, X)} in AXIOM 2 by {X}a{Res1 (a, X)}, and replace in all groups “0-executable” by “1-executable”, we will obtain a sound and complete proof system PRKW1D for plan verification under 1-approximation. Please note, however, that since 1-exeutability is unlikely solvable in poly-time, to determine whether a rule in PRKW1D is applicable seems intractable.

5

Discussion

As pointed out in [26], there are two main streams of methodology in the area of planning: search and logic. The recent search methods follow the PDDL language [9] and require third party plan verification tools like VAL [11], PDVer [10], etc. In this paper our choice is the latter, i.e., logical approach. The advantages of our approach include: 1) logical frameworks that integrate plan generation/verification; 2) solid theoretical foundation for off-line planning. It is widely accepted that logical systems are wellsuited for verifying theories [27–29]. In fact, search methods lack formal features for plan verification, and it is not clear how to incorporate the off-line concept into search methods. A closely related area involving formal verification is Model Checking [30,31]. Generally speaking, model checking and planning have some common features, e.g., both perform state-space exploration of a given transition system; both adapt some logical frameworks for domain specification and/or representation, etc. In the past years, a cross-fertilization between the two areas have emerged. For example, in [32,33], model checking techniques based on ordered binary decision diagrams (OBDDs) have been applied to planning, and in [34], SAT-based planning techniques is applied to symbolic model checking, etc. One of the difficulties in applying model checking techniques to planning is the so-called state-space explosion problem. In contrast, our Hoare proof system based approach has been considered as an attempt to conquer the state space explosion problem via interactive procedure, as pointed out and discussed in [23]. Many other logical approaches were proposed for plan generation in the literature. For example, SAT-based planning [13–15] transforms a given domain to a propositional theory, and plan generation is achieved by computing the model of the theory via an existing SAT solver. Similar idea also applies to QBF [16,17]. In [3,6,7,35,36] the planning problem is reduced to logic programming such that a plan can

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:11

be extracted from the corresponding model of the translated logic program. One weakness of the above approaches is that they only use logical semantics for model generation, instead of using logical deduction to construct a plan (i.e., a proof). In contrast, deductive planning approaches proposed in [18,21,37–39] adopt automated reasoning procedures to construct plans, s.t. all generated plans are verified since they are essentially proofs. However, these frameworks differ from ours in that they do not provide a theoretical support for off-line planning; more precisely, they have no composition rule for merging existing proofs. There exists some other interesting related work. For example, plan repair [40,41] refers to updating an existing plan to match the change of the dynamic environment. In this sense, the method does generate a new plan from the existing one. However, plan repair is theoretically as hard as generating a new plan [40], while in our method, provided that the plan database is large enough, generating new plans could be much more easier.

6

Conclusion

In this paper we propose Hoare style proof systems PR0D and PRKW0D for plan generation and verification under incomplete information. The proof systems are sound and complete w.r.t. 0-approximation semantics of AK . The contributions of the paper not only include the invention of above proof systems, but also the off-line concept as a way to handle the high complexity of planning, i.e., our work provides a theoretical foundation for a new kind of methods for efficient plan generation/verification at the cost of spare time preprocessing. In future work we shall implement proof systems PR0D and PRKW0D on top of interactive theorem provers together with the plan storage and query architecture. One of the challenges is how to design good tactics such that interactive theorem provers could automatically generate a large amount of short plans as many as possible.

Acknowledgements This research was partially supported by National Natural Science Foundation of China (Grant Nos. 60970040, 61272059), Ministry of Eduction (Grant Nos. 11JJD720020, 10JZD0006), and Guangdong Young Scientists Programme (Grant Nos. GD10YZX03, WYM10114).

References 1 Moore R C. A formal theory of knowledge and action. In: Hobbs J R, Moore R C, eds. Formal Theories of the Commonsense World. New Jersey: Norwood, 1985. 319–358 2 Scherl R B, Levesque H J. Knowledge, action, and the frame problem. Artif Intell, 2003, 144: 1–39 3 Tu P H, Son T C, Baral C. Reasoning and planning with sensing actions, incomplete information, and static causal laws using answer set programming. Theory Pract Log Program, 2007, 7: 377–450 4 Petrick R P A, Bacchus F. Extending the knowledge-based approach to planning with incomplete information and sensing. In: Proceedings of the 9th International Conference on Principles of Knowledge Representation and Reasoning, Whistler, 2004. 613–622 5 Oglietti M. Understanding planning with incomplete information and sensing. Artif Intell, 2005, 164: 171–208 6 Nieuwenborgh D V, Eiter T, Vermeir D. Conditional planning with external functions. In: Proceedings of the 9th International Conference on Logic Programming and Nonmonotonic Reasoning, Tempe, 2007. 214–227 7 Son T C, Baral C. Formalizing sensing actions: a transition function based approach. Artif Intell, 2001, 125: 19–91 8 Baral C, Kreinovich V, Trejo R. Computational complexity of planning and approximate planning in the presence of incompleteness. Artif Intell, 2000, 122: 241–267 9 Gerevini A E, Haslum P, Long D, et al. Deterministic planning in the fifth international planning competition: Pddl3 and experimental evaluation of the planners. Artif Intell, 2009, 173: 619–668 10 Raimondi F, Pecheur C, Brat G. Pdver, a tool to verify pddl planning domains. In: Proceedings of Workshop on Verification and Validation of Planning and Scheduling Systems, Thessaloniki, 2009. 76–82 11 Howey R, Long D. Val’s progress: the automatic validation tool for pddl2.1 used in the international planning competition. In: Proceedings of the 13th International Conference on Automated Planning and Scheduling, Trento, 2003. 52–61

Shen Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072108:12

12 Traverso P, Ghallab M, Nau D. Automated Planning: Theorey and Practice. Amsterdam: Morgan Kaufmann Publishers, 2004 13 Rintanen J. Planning and SAT, Handbook of Satisfiability. Amsterdam: IOS Press, 2009. 483–503 14 Kautz H A, Selman B, Hoffmann J. Planning as satisfiability. In: Proceedings of the 10th European conference on Artificial intelligence, Vienna, 1992. 359–363 15 Dimopoulos Y, Hashmi M A, Moraitis P. μ-satplan: multi-agent planning as satisfiability. Knowl-Based Syst, 2012, 29: 54–62 16 Otwell C, Remshagen A, Truemper K. An effective QBF solver for planning problems. In: MSV/AMCS. CSREA Press, 2004. 311–316 17 Luca M D, Giunchiglia E, Narizzano M, et al. “Safe planning” as a QBF evaluation problem. In: Proceedings of the 2nd RoboCare Workshop, Genova, 2005 18 Reiter R. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. Cambridge: MIT Press, 2001 19 Kartha G N. Soundness and completeness theorems for three formalizations of action. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chamb´ ery, 1993. 724–731 20 Lin F, Shoham Y. Provably correct theories of action. J ACM, 1995, 42: 293–320 21 Lin F, Reiter R. How to progress a database. Artif Intell, 1997, 92: 131–167 22 Hoare C A R. An axiomatic basis for computer programming. Commun ACM, 1969, 12: 576–580 23 Ernst-R¨ udiger O, Krzysztof R A, Frank S B. Verification of Sequential and Concurrent Programs. Springer, 2009 24 Barak B, Arora S. Computational Complexity: a Modern Approach. Cambridge: Cambridge University Press, 2009 25 Cadoli M, Donini F M. A survey on knowledge compilation. AI Commun, 1997, 10: 137–150 26 Russell S J, Norvig P. Artificial Intelligence: a Modern Approach. Englewood Cliffs: Prentice Hall, 2009 27 Robinson A, Voronkov A, eds. Handbook of Automated Reasoning. Cambridge: Elsevier and MIT Press, 2001 28 Li W. Logical verification of scientific discovery. Sci China Inf Sci, 2010, 53: 677–684 29 Hinchey M, Bowen J P, Vassev E. Formal methods. In: Marciniak J J, ed. Encyclopedia of Software Engineering. Taylor & Francis, 2010. 308–320 30 Grumberg O, Clarke E M, Peled D A. Model Checking. Cambridge: MIT Press, 1999 31 Grumberg O, Veith H. 25 Years of Model Checking—History, Achievements, Perspectives. Berlin/Heidelberg: SpringerVerlag, 2008 32 Cimatti A, Roveri M, Traverso P. Automatic obdd-based generation of universal plans in non-deterministic domains. In: Proceedings of the 15th National/10th Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, Madison, 1998. 875–881 33 Cimatti A, Roveri M. Conformant planning via symbolic model checking. J Artif Intell Res, 2000, 13: 305–338 34 Biere A, Cimatti A, Clarke E M, et al. Symbolic model checking without BDDs. In: Proceedings of the 5th International Conference on Tools and Algorithms for Construction and Analysis of Systems, London, 1999. 193–207 35 Tu P H, Son T C, Gelfond M, et al. Approximation of action theories and its application to conformant planning. Artif Intell, 2011, 175: 79–119 36 Eiter T, Faber W, Leone N, et al. Answer set planning under action costs. J Artif Intell Res, 2003, 19: 25–71 37 Stephan W, Biundo S. A new logical framework for deductive planning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chamb´ ery, 1993. 32–38 38 Levesque H J, Reiter R, Lin F, et al. GOLOG: a logic programming language for dynamic domains. J Log Progr, 1997, 31: 59–83 39 Thielscher M. Flux: a logic programming method for reasoning agents. Theor Pract Log Prog, 2005, 5: 533–565 40 Krogt R V D, Weerdt M D. Plan repair as an extension of planning. In: Proceedings of the 15th International Conference on Automated Planning and Scheduling, Monterey, 2005. 161–170 41 Fox M, Gerevini A, Long D, et al. Plan stability: replanning versus plan repair. In: Proceedings of the 16th International Conference on Automated Planning and Scheduling, Ambleside, 2006. 212–221

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072109:1–072109:9 doi: 10.1007/s11432-013-5052-x

An upper (lower) bound for Max (Min) CSP HUANG Ping1,2 & YIN MingHao2 ∗ 1School

of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China; of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China

2School

Received May 9, 2013; accepted July 12, 2013; published online February 27, 2014

Abstract The random constraint satisfaction problem (CSP) instances generated by Model RB have been widely used in the field of CSP and have some nice features. In this paper, we consider two optimization versions of CSP, i.e., the maximum constraint satisfaction problem (Max-CSP) and the minimum satisfaction problem (Min-CSP) of Model RB. The problem of the Max-CSP is how to find an assignment to all the variables such that the maximum number of constraints are satisfied and the problem of Min-CSP is how to find an assignment to all the variables such that the minimum number of constraints are satisfied. We use the first moment method to prove that when r > 2α(1/p − 1) (or p > 2α/(2α + r)), an upper bound of Max-CSP can be derived. Similarly, we can prove that when r > 2α(1/p − 1) (or p > 2α/(2α + r)), a lower bound of Min-CSP can be derived. Keywords

Max CSP, Min CSP, RB model, upper bound, lower bound

Citation Huang P, Yin M H. An upper (lower) bound for Max (Min) CSP. Sci China Inf Sci, 2014, 57: 072109(9), doi: 10.1007/s11432-013-5052-x

1

Introduction

The problem of constraint satisfaction problem (CSP for short) involves the assignment of values to variables subject to a set of constraints [1]. In the past decades, this problem has been one of the most active and prolific research areas, since it can provide a unifying framework to study various kinds of combinatorial problems, from propositional logic to graph theory [2,3]. Moreover, this problem has also been a central and well studied problem in the field of computer science due to its significance in both academic and engineering application. Generally speaking, CSP tasks are computationally intractable (NP-hard), which means that unless P equals NP stands, one cannot find a polynomial algorithm to solve CSPs. Since the pioneering work by Cheeseman et al. In [4], there has been an increasing interest in the phase transition of computationally hard problems. Phase transition is usually a transformation from one state to another state suddenly when a particular parameter is varied. However, it seems that it is always difficult to obtain the location of the values of the parameters where phase transition occurs. The obtained phase transition points are usually in the form of some loose but hard won bounds theoretically. For instance, for the problem of propositional satisfiablity problems (SAT), the parameter controlled for phase transition is the density of clause (i.e. number of clauses/number of variables). Until now, within ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Huang P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072109:2

our knowledge, the best lower bound and upper bound of SAT are 3.52 [5] and 4.49 [6] respectively. For the phase transition of constraint satisfaction problem, a generalized version of SAT, it has also received great attention, both from experimental and theoretical aspects. Smith and Dyer showed in [7] that the variance of the number of solutions can be used to set bounds on the phase transition. Achliptas in [8] indicated that randomly generated CSP instances by standard random CSP model, say Model B, do not have an asymptotic threshold due to the presence of flawed variables. Accordingly, in the seminal paper of Xu and Li, a new type of random CSP model, as called Model RB, is introduced. Model RB is a revision of the standard random Model B and has some nice features. First, the existence of phase transition can be proven and the exact location of phase transition points can be quantified. Second, it is shown that both theoretically and experimentally Model RB can be used to generate hard satisfiable instances by translating CSPs into CNF formulas [9,10]. In [11], we further prove that phase transition of counting problems of CSP does exist as the number of variables approaches infinity, and the critical values where phase transitions occur can be precisely located. In this paper, we follow this line of research by considering two optimization versions of constraint satisfaction problems, specifically maximum constraint satisfaction problems (Max-CSP for short) and minimum constraint satisfaction problems (Min-CSP for short), both of which provide non-idempotent optimization frameworks with many applications in domains such as intelligent planning, scheduling, bioinformatics and probabilistic reasoning. The Max-CSP problem of a given CSP instance is to find an assignment that maximizes the number of satisfied constraints. One of the most well-known examples of Max-CSP is its propositional counterpart minimum satisfiability problem (Max-SAT for short). Recently, several results on bound analysis of Max-SAT have been given in [12–14]. Coppersmith et al. presented some results on bound analysis of Max-SAT by studying the expectation of the maximum number of satisfied clause [15]. In a recent paper, Xu et al. provided a tighter upper bound of Max 2-SAT by using the first moment argument via correcting the error items [16]. The problem of Min-CSP of a given CSP instance is to find an assignment that minimizes the number of satisfied constraints. Li et al. In [17] introduced a branch and bound algorithm for solving its propositional counterpart minimum satisfiability problem (Min-SAT for short). Since until now, within our knowledge, there has been no paper discussing the bound analysis of the general form of Max-CSP or Min-CSP, in this paper we begin the first step of studying phase transition of Max-CSP and Min-CSP by considering Model RB. For the problem of Max-CSP, we prove that, whenr is large enough, the expected maximum number of satisfied constraints f (k, n, α, r, p)  (1 − p + 2 p(1 − p)/c)rn ln n. In this way, we present an upper bound of f (k, n, α, r, p) for Max-CSP. For the problem of Min-CSP, we prove that when r is large enough, the expected minimum number  of satisfied constraints g(k, n, α, r, p)  (p − 2 p(1 − p)/c)rn ln n. Thus, we present a lower bound of g(k, n, α, r, p) for Min-CSP. Note that the upper bound of Max-CSP is always a loose upper bound of Min-CSP because the number of constraints satisfied in Max-CSP is always larger than the number of constraints satisfied in Min-CSP of the same CSP instance, considering the definitions of Max-CSP and Min-CSP. The following of this paper is organized as follows. First, we review some basic concepts in Section 2. Then in Sections 3 and 4, we present the upper bound of Max-CSP and the lower bound of Min-CSP of Model RB respectively. Conclusions are provided in Section 5.

2

Preliminary

Definition 1 (constraint satisfaction problem(CSP) instance). A constraint satisfaction problem(CSP) instance is defined as a triple (X, D, C) where X = (x1 , x2 , . . . , xn ) is a set of n variables, D is a mapping from X to a set of domains D = (D(x1 ) , D(x2 ) , . . . , D(xn ) ), where D(xi ) , Di for short, is the finite domain of its possible values, and for 2  k  n a constraint Ci1 ,i2 ,...,ik ∈ C is defined as a pair (Xi , Ri ) such that Xi = (xi1 , . . . , xik ) is a subset of X called the constraint scope and Ri , as called constraint relations, is a subset of the Cartesian product Di1 × · · · × Dik and it specifies the allowed combinations of values for

Huang P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072109:3

the variables in Ci . Definition 2 (constraint satisfaction problem(CSP) assignment). An assignment for a given CSP n instance (X, D, C) is a map from X to the disjoint Union i=1 Di with each f (xi ) ∈ Di . An assignment satisfies a constraint (xi1 , . . . , xik ), Ri iff f (Xi ) = (f (xi1 ), . . . , f (xik )) ∈ Ri . Definition 3 (constraint satisfaction problem, CSP). The constraint satisfaction problem (CSP) for a CSP instance consists in deciding whether there exists an assignment satisfying all the constraints. In Model B [7], for each instance, there are p1 n(n−1)/2 constraints. And for each constraint, (1−p2 )d2 consistent tuples are randomly selected. The formal definition of Model B is as follows. Definition 4 (Model B). A class of random CSP instances of model B will be denoted by a tuple (k, n, d, p1, p2) where, for each instance: 1) k  2 denotes the arity of each constraint, 2) n  2 denotes the number of variables, 3) d  2 denotes the size of each domain, 4) 1  p1 > 0 determines the number m = p1 (nk ) of constraints, 5) 1 > p2 > 0 determines the number t = p2 dk of disallowed tuples of each relation. Achlioptas et al. Ref. [18] pointed out that the instances generated using Model B suffer from (trivial) insolubility when problem size increases. And therefore Xu and Li [10] introduced an alternative random model, i.e. Mode RB. Definition 5 (Model RB [10]). A class of random CSP instances of model RB will be denoted by a tuple (k, n, α, r, p) where, for each instance: 1) k  2 denotes the arity of each constraint, 2) n  2 denotes the number of variables, 3) α > 0 determines the domain size d = nα of each domain, 4) r > 0 determines the number m = rn ln n of constraints, 5) 1 > p > 0 determines the number t = pdk of disallowed tuples of each relation. The main difference between Model RB and Model B is that the domain size in Model RB grows with an increase in the number of variables. The generation of random CSP instance in Model RB is done as follows: (1) Select m = rn ln n random constraints (with repetition), each one formed by randomly selecting k of n variables (without repetition). (2) For each constraint, select t = pdk incompatible tuples of values (without repetition). i.e., each constraint relation contains exactly (1 − p)dk compatible tuples of values. The following theorems show that, the instances generated by Model RB have exact phase transition. α

Theorem 1 [10]. If pcr = 1 − e− r , where α > k1 , r > 0 are two constraints, and k, α and r satisfy the α inequality k · e− r  1, then ⎧ ⎨ Lim Pr[SAT] = 1, when p < pcr , n→∞

⎩ Lim Pr[SAT] = 0, n→∞

Theorem 2 [10]. then

when p > pcr .

α , α > k1 and 0 < p < 1 are two constraints satisfying k  1/(1 − p), If rcr = − ln(1−p) ⎧ ⎨ Lim Pr[SAT] = 1, when r < rcr , n→∞

⎩ Lim Pr[SAT] = 0, n→∞

when r > rcr .

Theorems 1 and 2 show that, when p < pcr (r < rcr ), as n → ∞, the instances are almost all satisfiable, while p > pcr (r > rcr ), the instances are nearly all unsatisfiable. The exact point pcr (rcr ) is precisely located. Definition 6 (Max-CSP). Given a CSP instance P generated by Model RB, the Max-CSP is to find an assignment to all the variables such that the maximum number of constraints is satisfied.

Huang P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072109:4

Definition 7 (Min-CSP). Given a CSP instance P generated by Model RB, the Min-CSP is to find an assignment to all the variables such that the minimum number of constraints is satisfied.

3

An upper bound of MAX-CSP

In this section, we shall present the upper bound for random Max-CSP of Model RB. Considering a random CSP instance P (k, n, α, r, p) generated by Model RB, we use max P to denote the largest number of satisfied constraints of P . So in this section we focus on the functional behavior of max P . Let f (k, n, α, r, p) = E(max P ) where the function E denotes the expectation of maximum satisfied constraints over the random instances. For simplicity, in this paper, we use x in lieu of x and omit the notation .. Theorem 3. When r > 2α(1/p − 1), p  1/2, f (k, n, α, r, p) = E(max P ), where E is the expectation of the random instance. Then  f (k, n, α, r, p)  (1 − p + 2αp(1 − p)/r)rn ln n. (1) ∞ Theorem 4. When r > −α/ j=2 βj pj , p  1/2, f (k, n, α, r, p) = E(max P ), where E is the expectation of the random instance. Then f (k, n, α, r, p)  (1 − p + x)rn ln n,

(2)

where x is the sole positive root of the equation ∞

α  + βj xj = 0, r j=2 where βj = [(−1)j−1 − ((p − 1)/p)j−1 ]/((1 − p)j−1 j(j − 1)), j  2.

(3)

Proof. Let δ denote a real number, 0  δ  1. If max P > (1 − δ)rn ln n, there must exist a satisfied sub-CSP instance P  . The number of constraints in P  is more than (1−δ)rn ln n. And for each constraint, it is satisfied by any assignment with probability 1 − p; that is, it is dissatisfied with probability p. Then, the following inequality holds   δrn ln n  rn ln n  n (4) (1 − p)rn ln n−i pi , Pr = Pr(∃ satisfying P )  d i i=0 where Pr denotes the probabilistic function. Note that the last term of (4) is maximum when δ < p, so we have   rn ln n (1 − p)rn ln n−δrn ln n pδrn ln n . (5) Pr  dn (δrn ln n + 1) δrn ln n √ According to the Stirling’s formula n! ≈ 2πn(n/e)n , we can simplify inequality (5) into 1 dn (δrn ln n + 1)(δ −δ (1 − δ)δ−1 (1 − p)1−δ pδ )rn ln n . Pr   2πδ(1 − δ)rn ln n

(6)

Since d = nα , according to (6), we have α ln(δrn ln n + 1) ln Pr  + + ln(δ −δ (1 − δ)δ−1 (1 − p)1−δ pδ ). rn ln n r rn ln n ∞ Let δ = p − ε, and note that ln(1 + x) = i=1 (−1)i−1 xi /i and p  1/2,

(7)

Huang P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072109:5

ln (δ −δ (1 − δ)δ−1 (1 − p)1−δ pδ ) = ln(1 − p) + (δ − 1) ln(1 − δ) − δ ln δ − δ ln((1 − p)/p) = ln(1 − p) + (p − 1 − ε) ln(1 − p + ε) − (p − ε) ln(p − ε) − (p − ε) ln((1 − p)/p) = ln(1 − p) + (p − 1 − ε) ln(1 − p) + (p − 1 − ε) ln(1 + ε/(1 − p)) − (p − ε) ln((1 − p)/p) − (p − ε) ln p − (p − ε) ln(1 − ε/p) = (p − 1 − ε) ln(1 + ε/(1 − p)) − (p − ε) ln(1 − ε/p) = (p − 1 − ε)(x1 ε + x2 ε2 + x3 ε3 + · · · ) + (−p + ε)(x1 (ε(p−1)/p) + x2 (ε(p−1)/p)2 + x3 (ε(p−1)/p)3 + · · · ), where xi = (−1)i−1 /(i(1 − p)i ), i  1.  ∞ So ln(δ −δ (1 − δ)δ−1 (1 − p)1−δ pδ ) = j=2 βj εj , where

(8)

βj = [(−1)j−1 − ((1 − p)/p)j−1 ]/((1 − p)j−1 j(j − 1)), j  2.

(9)

Then we have ∞

ln Pr α ln(αrn ln n + 1)   + + [(−1)j−1 − ((p − 1)/p)j−1 ]/(1 − p)j−1 /(j(j − 1))εj . rn ln n r rn ln n j=2 When j  2, p  1/2, βj < 0 holds, we can simplify formula (10) by omitting ε2 , O(ε3 ) can be omitted immediately. So we have

∞

j=2

α ln(αrn ln n + 1) ln Pr  + + [−1 − (1 − p)/p]/(1 − p)/2ε2 . rn ln n r rn ln n

(10)

βj εj . If we reserve

(11)

To get Pr < 0 as n → ∞, we must make sure αr + [−1 − (1 − p)/p]/(1 − p)/2ε2 < 0, so we have to detain the root of the following equation α + [−1 − (1 − p)/p]/(1 − p)/2ε2 = 0. (12) r  Solving the equation, we get ε = ± 2αp(1 − p)/r. Recall that ε < p. We can get the following results.  2αp(1 − p)/r < ε < p, so r > 2α(1/p − 1). (13) That is to say, when r > 2α(1/p − 1), Pr(∃ satisfying P )  0. So, max P < (1 − δ)rn ln n = (1 − (p − ε))rn ln n = (1 − p + ε)rn ln n.  So f (k, n, α, r, p)  (1 − p + 2αp(1 − p)/r)rn ln n. Now we finish the proof of Theorem 3.  j To prove Theorem 4, reserving ∞ j=2 βj ε , we have

(14)



α ln(αrn ln n + 1)  ln Pr  + + βj ε j . rn ln n r rn ln n j=2

(15)

∞ ∞ In order to make αr + j=2 βj εj < 0, we have to get the root x of the equality αr + j=2 βj xj = 0.    ∞ ∞ ∞ For any y > x, y ∈ [0, +∞], αr + j=2 βj y j < 0 . We know that j=2 (−βj )y j < j=2 (−βj )pj , so ∞   ∞ ∞ α j j j j=2 (−βj )y < j=2 (−βj )p . According to Leibniz criterion, j=2 (−βj )p converges to a real r < number. We get the bound of r. Thus proof of Theorem 4 is finished. Note that Theorem 4 can be viewed as an improvement version of Theorem 3, and therefore can provide a more refined upper bound of Max-CSP. Using p as the parameter, we can get similar results.

Huang P, et al.

Figure 1

Sci China Inf Sci

July 2014 Vol. 57 072109:6

The curves show the upper bound of f (k, n, α, r, p)/(rn ln n) in Theorem 5. α

Theorem 5. When p > 2α/(2α+r), e− r  1/2, f (k, n, α, r, p) = E(max P ), where E is the expectation of the random instance. Then  (16) f (k, n, α, r, p)  (1 − p + 2αp(1 − p)/r)rn ln n. The proof of Theorem 5 is similar to that of Theorem 3, so we omit it here. Note that when p < 2α/(2α + r), the estimation of the upper bound by the first moment in Theorem 5 is invalid, as can be seen in Figure 1. Similar condition occurs when r < 2α(1/p − 1) in Theorem 3. r < r0 = 2α/(2α + r), the estimation of the upper bound by the first moment becomes invalid (see Figure 1). In Figure 1, X-axis denotes the values of r and Y-axis denotes the upper bound of f (k, n, α, r, p)/(rn ln n).

4

A lower bound for Min-CSP

In this section, we shall present the lower bound for random Min-CSP of Model RB. Considering a random CSP instance P (k, n, α, r, p) generated by Model RB, we use min P to denote the least number of satisfied constraints of P and then consider the functional behavior of min P . Let g(k, n, α, r, p) = E(min P (k, n, α, r, p)) where E denotes the expectation of minimum satisfied constraints over the random instance. Similar to the proof of the upper bound for Max-CSP, we use a first-moment method to prove the lower bound for Min-CSP. Theorem 6. When r > 2α(1/p − 1), p  1/2, g(k, n, α, r, p) = E(min P ), where E is the expectation of minimum satisfied constraints over the random instance. Then  (17) g(k, n, α, r, p)  (p − 2αp(1 − p)/r)rn ln n. Proof. Let δ denote a real number. If min P < δrn ln n, there must exist a satisfied CSP sub-instance P  , and the number of clauses in P  is less than rn ln n. And for each constraint satisfied by any assignment with probability 1 − p, it is dissatisfied with probability p. Then, the following inequality holds   δrn ln n  rn ln n  n (18) (1 − p)i prn ln n−i . Pr = Pr(∃ satisfying P )  d i i=0 Note that the last term of formula (18) is maximum when δ < 1 − p, so we have   rn ln n n (1 − p)δrn ln n prn ln n−δrn ln n . Pr  d (δrn ln n + 1) δrn ln n √ According to the Stirling’s formula n! ≈ 2πn(n/e)n , we can simplify inequality (19) into

(19)

1 dn (δrn ln n + 1)(δ −δ (1 − δ)δ−1 (1 − p)δ p1−δ )rn ln n . Pr   2πδ(1 − δ)rn ln n

(20)

Huang P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072109:7

Since d = nα , we have α ln(δrn ln n + 1) ln Pr  + + ln(δ −δ (1 − δ)δ−1 (1 − p)δ p1−δ ). rn ln n r rn ln n ∞ Let δ = 1 − p − ε. Note that ln(1 + x) = i=1 (−1)i−1 xi /i and p  1/2,

(21)

ln (δ −δ (1 − δ)δ−1 (1 − p)δ p1−δ ) = ln p + (δ − 1) ln(1 − δ) − δ ln δ + δ ln((1 − p)/p) = ln p + (−p − ε) ln(p + ε) − (1 − p − ε) ln(1 − p − ε) + (1 − p − ε) ln((1 − p)/p) = ln p + (−p − ε) ln p + (−p − ε) ln(1 + ε/p) − (1 − p − ε) ln(1 − p) − (1 − p − ε) ln(1 − ε/(1 − p)) + (1 − p − ε) ln(1 − p/p) = (−p − ε) ln(1 + ε/p) − (1 − p − ε) ln(1 − ε/(1 − p)) = (−p − ε)(x1 ε + x2 ε2 + x3 ε3 + · · · ) + (p − 1 + ε)(x1 (εp/(p−1)) + x2 (εp/(p−1))2 + x3 (εp/(p−1))3 + · · · ), where xi = (−1)i−1 /(ipi ), i  1. So ln(δ −δ (1 − δ)δ−1 (1 − p)δ p1−δ ) =

∞ 

(22)

βj ε j ,

j=2

where βj = [(−1)j−1 − (p/(1 − p))j−1 ]/pj−1 /(j(j − 1)), j  2.

(23)

Then we have ∞

α ln(αrn ln n + 1)  ln Pr  + + [(−1)j−1 − (p/(1 − p))j−1 ]/pj−1 /(j(j − 1))εj . rn ln n r rn ln n j=2 When j  2, p  1/2, βj < 0 holds, we can simplify formula (24) by omitting ε2 , O(ε3 ) can be omitted immediately. So we have

∞

ln Pr α ln(αrn ln n + 1)  + + [−1 − p/(1 − p)]/p/2ε2. rn ln n r rn ln n

j=2

(24)

βj εj . If we reserve

(25)

 To get P < 0 as n → ∞, we must make sure αr + [−1 − p/(1 − p)]/p/2ε2 < 0, such that 2αp(1 − p)/r < ε < p. So we get r > 2α(1/p−1). That is to say, when r > 2α(1/p−1), as n → ∞, Pr(∃ satisfying P )  0. So 

min P > δrn ln n = (p − ε)rn ln n.

(26)

Then g(k, n, α, r, p)  (p − 2αp(1 − p)/r)rn ln n. This finishes the proof of Theorem 6. α

Theorem 7. When p > 2α/(2α + r), e− r  1/2, g(k, n, α, r, p) = E(min P ), E is the expectation of the random instance. Then g(k, n, α, r, p)  (p −



2αp(1 − p)/r)rn ln n.

(27)

Theorem 7 uses p as the parameter. Similar to Theorem 5, when p < p0 = 2α/(2α + r), the estimation of the lower bound by the first moment becomes invalid (Figure 2). In Figure 2, X-axis denotes the values of p and Y-axis denotes the upper bound of g(k, n, α, r, p)/(rn ln n).

Huang P, et al.

Figure 2

5

Sci China Inf Sci

July 2014 Vol. 57 072109:8

The curves show the upper bound of g(k, n, α, r, p)/(rn ln n) in Theorem 7.

Conclusion and future work

In this paper, we focus on two optimization versions of CSPs of Model RB. When r > 2α(1/p − 1) (or p > 2α/(2α + r)), we can get an upper bound of Max-CSP. Moreover, the lower bound of Min-CSP is also derived. Note that the upper bound of Max-CSP is only a loose estimation for Min-CSP, and the lower bound of Min-CSP is also a loose estimation for Max-CSP. Since Model RB in standard CSP has been proven to have exact phase transition, we conjecture that, for Max-CSP and Min-CSP, there may be exact phase transition points too. And a more complicated analysis than the first-momoent argument, say second-moment argument, may help us to prove them [19–23].

Acknowledgements This research was supported by Program for New Century Excellent Talents in University (Grant No. NCET-130724), Natural Science Foundation of Jilin Province (Grant No. 201215006), and Natural Science Foundation of China (Grant Nos. 61370156, 61370052).

References 1 Shannon C E. A mathematical theory of communication. Bell Syst Tech J, 1948, 27: 379–423 2 Creignou N, Khanna S, Sudan M. Complexity Classifications of Boolean Constraint Satisfaction Problems. In: Monographs on Discrete Mathematics and Applications, Vol. 7. SIAM, 2001 3 Hell P, Nesetril J. Graphs and Homomorphisms. Oxford University Press, 2004 4 Cheeseman P, Kanefsky B, Taylor W M. Where the really hard problems are. In: Proceedings of IJCAI-91, Sydney, 1991. 331–337 5 Kaporis A C, Kirousis L M, Lalas E G. The probabilistic analysis of a greedy satisfiability algorithm. Random Struct Algor, 2006, 28: 444–480 6 D´ıaz Cort J, Lefteris K, Mitsche D, et al. A new upper bound for 3-SAT. In: Proceedings of FSTTCS’08, Bangalore, 2008. 163–174 7 Smith B M, Dyer M E. Locating the phase transition in binary constraint satisfaction problems. Artif Intell, 1996, 81: 155–181 8 Achlioptas D, Kirousis L M, Kranakis E, et al. Stamatiou: Random Constraint Satisfaction: A More Accurate Picture. Berlin/Heidelberg: Springer, 1997. 107–120 9 Xu K, Boussemart F, Hemery F, et al. Random constraint satisfaction: easy generation of hard (satisfiable) instances. Artif Intell, 2007, 171: 514–534 10 Xu K, Li W. Exact phase transitions in random constraint satisfaction problems. J Artif Intell Res, 2000, 12: 93–103 11 Huang P, Yin M H, Xu K. Exact phase transitions and approximate algorithm of #CSP. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, San Francisco, 2011. 1790–1791 12 Bollob´ as B, Borgs C, Chayes J T, et al. The scaling window of the 2-SAT transition. Random Struct Algor, 2001, 18: 201–256

Huang P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072109:9

13 Gramm J, Hirsch E A, Niedermeier R, et al. Worst-case upper bounds for MAX-2-SAT with an application to MAXCUT. Discrete Appl Math, 2003, 130: 139–155 14 Hirsch E A. A new algorithm for MAX-2-SAT. In: Proceedings of 17th International Symposium on Theoretical Aspects of Computer Science. Lect Notes in Comput Sci, vol. 1770. Springer-Verlag, 2000. 65–73 15 Coppersmith D, Gamarnik D, Hajiaghayi M T, et al. Random MAX SAT, random MAX CUT, and their phase transitions. Random Struct Algor, 2004, 24: 502–545 16 Xu X L, Gao Z S, Xu K. A tighter upper bound for random MAX 2-SAT. Inf Process Lett, 2011, 111: 115–119 17 Li C M, Zhu Z, Many` a F, et al. Minimum satisfiability and its applications. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Bellaterra, 2011. 605–610 18 Achlioptas D, Gomes C, Kautz H, et al. Generating satisfiable problem instances. In: Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence, 2000. 256–261 19 Liu T, Lin X, Wang C, et al. Large hinge width on sparse random hypergraphs. In: Proceedings of the 22th International Joint Conference on Artificial Intelligence, vol. 1, 2011. 611–616 20 Xu K, Li W. Many hard examples in exact phase transitions. Theor Comput Sci, 2006, 355: 291–302 21 Zhou J, Huang P, Yin M, et al. Phase transitions of EXPSPACE-complete problems. Int J Found Comput Sci, 2010, 21: 1073–1088 22 Zhou J, Yin M, Li X, et al. Phase transitions of EXPSPACE-complete problems: a further step. Int J Found Comput Sci, 2012, 23: 173–184 23 Gao J, Yin M, Xu K. Phase Transitions in Knowledge Compilation: An Experimental Study. Berlin/Heidelberg: Springer, 2011. 364–366

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072110:1–072110:11 doi: 10.1007/s11432-014-5096-6

What is the effective key length for a block cipher: an attack on every practical block cipher HUANG JiaLin & LAI XueJia∗ Cryptography and Information Security Lab, Department of Computer Science, Shanghai Jiaotong University, Shanghai 200240, China Received December 4, 2013; accepted March 6, 2014; published online May 5, 2014

Abstract Recently, several important block ciphers are considered to be broken by the brute-force-like cryptanalysis, with a time complexity faster than the exhaustive key search by going over the entire key space but performing less than a full encryption for each possible key. Motivated by this observation, we describe a meetin-the-middle attack that can always be successfully mounted against any practical block ciphers with success probability one. The data complexity of this attack is the smallest according to the unicity distance. The time complexity can be written as 2k (1 − ), where  > 0 for all practical block ciphers. Previously, the security bound that is commonly accepted is the length k of the given master key. From our result we point out that actually this k-bit security is always overestimated and can never be reached because of the inevitable loss of the key bits. No amount of clever design can prevent it, but increments of the number of rounds can reduce this key loss as much as possible. We give more insight into the problem of the upper bound of effective key bits in block ciphers, and show a more accurate bound. A suggestion about the relationship between the key size and block size is given. That is, when the number of rounds is fixed, it is better to take a key size equal to the block size. Also, effective key bits of many well-known block ciphers are calculated and analyzed, which also confirms their lower security margins than thought before. The results in this article motivate us to reconsider the real complexity that a valid attack should compare to. Keywords

block cipher, effective key bits, meet-in-the-middle, brute-force attack

Citation Huang J L, Lai X J. What is the effective key length for a block cipher: an attack on every practical block cipher. Sci China Inf Sci, 2014, 57: 072110(11), doi: 10.1007/s11432-014-5096-6

1

Introduction

As one of the fundamental primitives in symmetric cryptography, block ciphers play an important role in today’s secure communication. They protect data against unauthorized access and tampering in an insecure communication channel. Also, the design of many cryptographical schemes, such as secure encryption modes and authentication modes, is based on the security of block ciphers. Therefore, their security evaluation has been a hot research issue over the decades, giving rise to different analysis techniques. One line of research is the so-called provable security approach [1,2], such as indistinguishability analysis. This approach usually studies design principles or cipher structures by assuming the pseudorandomness of some components. Another line of research focuses on the practical security, that is, if ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:2

any cryptanalytic attacks can be mounted successfully on a block cipher, such as differential attacks, linear attacks, meet-in-the-middle attacks, related-key attacks, as well as other existing cryptanalysis techniques [3–5]. A block cipher is considered secure when it can resist against all known attacks. Traditionally, the strength of a cryptanalytic attack is measured by comparing it to the exhaustive search over the entire key space. Hence, the security of a block cipher highly relies on the key length. Recently, the full version of AES has been called broken because of the biclique attack [6], which performs faster than the exhaustive search. In [7], the authors proposed a complex meet-in-the-middle attack on KASUMI using various subtle weaknesses of the cipher. In [8], the authors proposed several techniques to speed up the exhaustive key search on the Full IDEA (by combining the BD-relation), KASUMI, and GOST. All of the above attacks have the following in common: by going over the entire key space with performing less than a full encryption for each possible key, the full rounds of the ciphers are targeted with a time complexity slightly faster than the exhaustive key search (for AES-256 is 2254.4 , for KASUMI is 2125.8 , for IDEA is 2126.8 ). These results are far from being any threat to the use of ciphers in practice. However, they motivate us to consider the realistic complexity that an attack should be compared to. That is, in a real world context, what should the time complexity of a valid attack at the most be. 1.1

Related work

In [8], Biham et al. recalled two well-known techniques to marginally reduce the time complexity of the exhaustive key search for almost any block ciphers. One is the distributive technique, which extracts the key bits that are not used in the first (or last) few operations of the encryption process. Another is the early abort technique referred in [9], which is to discard a wrong key before computing the full ciphertext. Assume that a subset K(1) of the key bits is not used in the first few operations, and a (possibly different) subset K(2) is not used in the last few operations. Then, Biham et al. proposed a more advanced algorithm using the meet-in-the-middle technique, as follows. For each value of the bits in K \ K(1) \ K(2), perform the following: 1. For each value of the bits in K(2) \ K(1), perform the first few operations of the encryption process for the given plaintext. Store the intermediate value and the corresponding value in K(2) \ K(1) in a table. 2. For each value of the bits in K(1) \ K(2), perform the last few operations in the decryption direction for the given ciphertext. Then, guess the value of the remaining bits in K(2), and complete the rest of the computation up to the intermediate value. Check the match with the values in the table. The above algorithm (called the Biham’s algorithm in this article) is enhanced further with the spliceand-cut technique by considering the common key bits that are not used in the operations between the plaintext and a pre-chosen intermediate value and in the last few operations, at the cost of increasing the data complexity (we call this the splice-and-cut version of Biham’s algorithm). Based on the cipher structures and weaknesses of the key schedules, Biham et al. showed the speedup for IDEA, GOST, and KASUMI. 1.2

Our contribution

Most block ciphers in common use are designed to have security equal to their key length (an exception is Triple-DES). Given that a key consists of k bits, the exhaustive search of the key space would take 2k encryption1) , with success probability of one when the number of plaintext–ciphertext pairs satisfies the unicity distance. In this article, by giving a universal attack which has a time complexity of 2k (1 − ) where  > 0, we point out that the previously thought bound of the effective key size k can never be achieved for almost all practically used block ciphers. The data complexity of this attack is the smallest according to the unicity distance, and the success probability is about one. We present a formulated description, measuring the effective key length explicitly with some general parameters, such as block size, key size, and number of 1) Note that in the average case this complexity is 2k−1 , but in this article we have considered the worst case.

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:3

rounds. Also, our algorithm is applied to many well-known block ciphers and their effective key bits are calculated. As predicted, the effective key bits of these ciphers are all less than the master key size k. Compared with previous work, our analysis is basing on more general structures and weaker assumptions, which have nothing to do with the specifics of key schedules. No matter how clever and secure a practical block cipher is, our algorithm is always available to the cryptanalyst. The data complexity of our algorithm reduces greatly. Only three instances are given for the splice-and-cut version of Biham’s algorithm: IDEA, KASUMI, and GOST. This indicates that weak key schedules (all these three ciphers have simply linear key schedules) and large amount of data complexity are required for the attacks. No instances are given for the basic Biham’s algorithm. We do a partial match in the middle, instead of using early abort technique in the ciphertext. More details, such as the computational complexity, and not just a rough description about the algorithm are presented. By the explicit quantization of the real bound of effective key lengths, the relationship between the key size and block size and the effect of increasing the number of rounds can be considered from a new point. This article is organized as follows. In Section 2, we introduce the basic notations and construction of block ciphers. In Section 3, a generic attack is proposed and its computational complexities are studied. The upper bound of effective key bits is also investigated in this section. In Section 4, we give several widely used block ciphers as examples to show their effective key lengths. Section 5 discusses and concludes with our results.

2

The construction, notations, and conventions

Based on Shannon’s conception of confusion and diffusion, most modern block ciphers have been designed to use many iterations of substitution (nonlinear layer) and permutation (linear layer) to obtain enough security (each iteration is referred to as one round). We give the following notations first. • P : plaintext • C: ciphertext • n: the block size • K: master key • k: the master key size • R: the number of rounds • S: the nonlinear layer • L: the linear layer • K r : the subkey used in round r, Kir is the ith sub-block in K r • X r : the input block to round r where X 0 = P , Xir is the ith sub-block in X r • Y r : the output block of the key mixing in round r, Yir is the ith sub-block in Y r • Z r : the output block of the nonlinear layer in round r, Zir is the ith sub-block in Z r For almost all block ciphers used in practice, their R-round generic structure is depicted in Figure 1. There can be more than one nonlinear or linear transformations in each round function. Usually the key mixing layer adds the subkey to the current state block using linear operations, such as XOR and modulo addition. Note that a round function in practice cannot be designed as random permutations. For a block cipher with key size k, the easiest and universal attack an adversary can mount is to simply try and guess each possible key. The probability of correctly guessing the key at the first attempt is 2−k . Adding an additional bit to the length of the key halves the probability that the key is correctly guessed. The time required to exhaust the whole key space is proportional to the time required to perform 2k encryption operations.

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:4

X0

P E

F0k Key mixing

F0k

Non-linear F1k Linear

X1

Fk⎣R/2⎦

⎣R/2⎦

Fk

b-bit

Key mixing m

FkR−1

Non-linear b-bit

m

C

Linear

Figure 1

3

Structure of an R-round block cipher E.

A generic attack

We introduce a generic meet-in-the-middle attack that can be mounted on every practical block cipher. This attack is given in Algorithm 1. S1 is the internal state that can be calculated from P only with k1 bits of subkeys, where k1 is maximum smaller than k that can be obtained. Similarly, S2 is the internal state that can be derived from C only with (other) k1 bits of subkeys. For any block cipher, the states of S1 and S2 certainly can be found. This algorithm has two phases, the meet-in-the-middle phase that generates the candidate list containing 2k−M keys where M is the size of the met intermediate value, and the check phase that examines the keys in the list. For further discussion, we make two assumptions that are reasonable for practical block ciphers. First, nonlinear transformations are assumed to consume much more time than linear transformations. Hence, as in previous works, only nonlinear operations are counted [6,10]. Second, key schedules are assumed as negligible, since they are usually simpler than the encryption function. Now we discuss the time complexity, data complexity, memory complexity, and success probability for Algorithm 1. 3.1

Time complexity

Based on the above assumptions, the time complexity is considered as follows. For any block ciphers, it is always smaller than 2k .

Tcomp = 2k1



Ntotal − NP →S1 − NC→S2 − Ndisc NC→S2 NP →S1 + + 2k−k1 Ntotal Ntotal Ntotal

+ 2k−M + 2k−M−n + 2k−M−2n + · · ·   Ntotal − NP →S1 − NC→S2 − Ndisc ≈ 2k Ntotal   NP →S1 + NC→S2 + Ndisc = 2k 1 − , Ntotal



(1a)

(1b)

where Ntotal means the total nonlinear components required in a full encryption. Denote NP →S1 as the required nonlinear components in the calculation from P to S1 . Denote NC→S2 as the required nonlinear

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:5

Algorithm 1: the generic meet-in-the-middle attack k Data: n + 1 pairs of plaintext and ciphertext Result: the output key K for each value of the first k1 key bits do Compute S1 from P with these k1 bits ; for each value of the remaining k − k1 key bits do R

Compute Z0 2 Store

R Z0 2

from S1 ;

in a table corresponding to the guessed key;

end end for each value of the last k1 key bits do Compute S2 from C with these k1 bits ; for each value of the remaining k − k1 key bits do R

Compute Z0 2 R Z0 2

from S2 ;

corresponding to the guessed key is in the table then add the guessed key into the candidate list; move onto the next guess; else move onto the next guess; end

if

end end k Check the keys in the candidate list with other n plaintext–ciphertext;

components in the calculation from C to S2 . Ndisc is the number of nonlinear components that do not need to be computed when partial matching techniques are used in the middle. The partial matching can N +NC→S2 +Ndisc is written filter M bits information of the key after the meet-in-the-middle phase. If P →S1 Ntotal as , then (1b) is 2k (1 − ), where  > 0. 3.2

Data complexity

The required number of pairs of plaintext–ciphertext here is U + 1, where U =  nk  is the smallest data complexity according to the unicity distance. We use the first pair of data to filter parts of the wrong keys and generate the candidate list. Then, we require at most another U pairs of plaintext–ciphertext for finding the right key. If we store the internal states before and after the meet-in-the-middle state, we can use the first data pair for filtering another n − M bits. Now the first data also can provide all its n-bit information for checking, the same as other pairs of plaintext–ciphertext. Thus, the data complexity can be reduced to U . Since the data complexity now is U + 1, which is small enough, this tradeoff is unnecessary. 3.3

Memory complexity

Algorithm 1 has a memory complexity of 2k · M bits. If more memory can be sacrificed, the data complexity can be lowered as mentioned above. The time–memory tradeoff is not our concern here. 3.4

Success probability

In the meet-in-the-middle phase, a wrong key is eliminated with a probability of 1 − 2−M . On examining with the second data pair in the candidate list, a wrong key is discarded with a probability of 1−2−(M+n). And on examining with the third data pair (if needed), a wrong key is eliminated with a probability of 1 − 2−(M+2n) , and so on. The success probability of Algorithm 1 is the product of these probabilities for all 2k − 1 wrong keys, which is approximately one. Algorithm 1 is similar with Biham’s algorithm, but has several differences. First, Biham’s algorithm does not mention where the intermediate value is to meet. We explicitly claim that the meet position does not influence the complexities of Algorithm 1. Without loss of generality, we fix this value as some

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:6

sub-block in the middle round. Second, instead of aborting the evaluation after computing parts of the ciphertext, we partially match in the middle before computing the full intermediate state. 3.5

More information for specific structures of block ciphers

Eq. (1a) can be made more concrete for specific cipher structures. Usually there are two major structures for block ciphers, SPN, and Feistel structure, as well as their generalized variants and combinations. The SPN structure constitutes of a layer of keyed confusion (nonlinear operations such as S-boxes) and a layer of diffusion (linear transformation), such as Serpent, AES, and ARIA that is widely used now. This structure is a direct implementation of Shannon’s confusion and diffusion concepts. By iterating the round function repeatedly, the dependency between inputs and outputs of the ciphers becomes complicated. Consider n-bit internal state W = (W0 , W1 , . . . , Wm−1 ) as a concatenation of m b-bit words Wi , where b is the size of a nonlinear sub-block. For most SPN ciphers, every nonlinear sub-block is keyed, and we match a b-bit word in the middle. Hence, the time complexity is written as:   Rm − (m − 1 + 2(k/b − 1)) k/b − 1 + 2k−k1 Tcomp = 2k1 2 + 2k−b + 2k−b−n + 2k−b−2n + · · · Rm Rm   m − 1 + 2(k/b − 1) ≈ 2k 1 − Rm   1 − 3b/n + 2k/n = 2k 1 − , (2) R where m > 1 in practical block ciphers because of the limit of the size of one nonlinear operation, as well as k > b. For an entire encryption, there are Rm nonlinear operations. For the first (k/b − 1) operations, we always do not need to guess all k bits of the key. We can compute S1 by only searching the first (k − b) bits, without guessing the remaining key bits. The time complexity of a factor of (k/b − 1) is saved here. The multiplication of 2 means that the computation from both the plaintext and the ciphertext should be considered. In the middle round, using the partial matching technique, we can only compute one nonlinear operation to get a b-bit filter and save another (m − 1) operation. Another primary structure is Feistel, with the input to each round divided into two halves. One half is transformed by some nonlinear round function and then XORed to the other half. Then these two halves are swapped except for the last round. For the Feistel structure, time complexity can be derived in the same way and the resulting formula is very similar. Because of the half diffusion property, at least one round of computation can be saved when matching in the middle. For other more detailed structures, such as MISTY (note that it has different sizes for nonlinear components, 7 to 7 bits and 9 to 9 bits S-boxes) and Lai-Massey structures, we give examples directly in Section 4. For the block ciphers, the meet-in-the-middle attack proposed in this article requires that the subkeys affect the round transformation with a separable pattern between different sub-blocks. That is, parts of subkeys directly act on parts of the internal state, e.g., Kir is mixed with Xir . If the round function is designed as random permutations, where the subkey can be regarded to act as a whole, then our attack will fail. Hence, our concern is of all the block ciphers existing in practice, which always satisfy this condition. 3.6

An upper bound of the effective key length

For any practical block cipher, Algorithm 1 is always available to the cryptanalyst. This indicates that a more accurate effective key length can be considered by taking the logarithm of the time complexity for this universal algorithm. For convenience, we can focus on (2) here, and the results for other structures are similar. The effective key size is k + log(1 − 1−3b/n+2k/n ), which is always smaller than k for any block ciphers. R Usually the size of nonlinear sub-block b is much smaller than n and k, and only takes a few fixed values. For conventional block ciphers, the routine size of S-boxe is 4, 8, or 16 bits (for MISTY and KASUMI this is 7 or 9 bits). And for lightweight block ciphers, the routine size is 3 or 4 bits. 1 − 3b/n + 2k/n

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:7

−0.2 −0.4

Key bits loss

−0.6 −0.8 −1.0 −1.2 R=10 R=15 R=20

−1.4 −1.6 −1.8 1.0 Figure 2

1.5

2.0 k/n

2.5

3.0

The relationship of key bits loss with key size, block size, and number of rounds.

is always larger than zero, and so log(1 − 1−3b/n+2k/n ) is smaller than zero. The previously accepted R security bound of k key bits actually cannot be achieved. ) as the loss of effective key bits. We ignore the factor of 3b/n and draw Denote log(1 − 1−3b/n+2k/n R the function of log(1 − 1+2k/n ) with fixed different R (see Figure 2). When R is the same and larger the R k/n, the more the key bits loss. When k/n is the same and the more rounds a cipher iterates, the lesser the key bits loss. Thus, we can avoid the loss of key bits as much as possible by increasing the number of rounds or shortening the ratio of k and n. This indicates what a relation of key size and block size should be in a secure design. Although it is not very precise, it still can be a rough guidance for most block ciphers. Also, from the formula we can conclude that when the number of rounds is sufficiently large, the key bits loss can approximately be zero.

4

Effective key lengths for block ciphers

In this section, many practical block ciphers are analyzed for their actual effective key lengths. For a clear exhibition of (1a), we consider conventional block ciphers and lightweight block ciphers, respectively2) . 4.1

Conventional block ciphers

We take AES, which is currently the most widely used block cipher, as an example. It is selected as the new standard for replacing DES by NIST in 2000. As we assumed before, we count the computational complexity for S-boxes. This is also taken in [6]. For AES-128, the key size k and block size n are both 128 bits, the size of non-liner sub-blocks (S-box) b is 8, and the number of rounds R is 10. There are 16 sub-blocks in the state, that is m = 16. Refer to [11] for more details about AES. Note that the whitening key K 0 should also be considered here. The detailed application of Algorithm 1 is as follows. Z05 is the 1 intermediate value for meeting. Choose (Z01 , Z11 , . . . , Z14 ) as S1 . Compute S1 from P by guessing (K00 , 0 0 ), 120 bits totally. Then, for each guess of K15 , the last 8 bits of the master key K complete K10 , . . . , K14 5 1 the encryption operations from S1 to Z0 . This requires a calculation of Z15 , Z 2 , Z 3 , Z 4 , Z05 , 50 S-boxes 5 10 totally. Store Z0 in a hash table corresponding to the guessed key. Choose (X110 , X210 , . . . , X15 ) as S2 . 10 10 10 Compute S2 from C by guessing (K1 , K2 , . . . , K15 ), 120 bits again. Then, for each guess of K010 , the last 8 bits of a mapping of the master key K complete the decryption operations from S2 to Z05 . This needs a computation of X010 , X 9 , X 8 , X 7 , X 6 , 65 S-boxes totally. There are 10 × 16 = 160 S-boxes for the full AES-128. Thus, for each guess of 128-bit of K only 115 S-boxes need to be computed. This means 115/160 of one full 10-round encryption for each guess. The time complexity of Algorithm 1 here is about 2128 × 115/160 = 2127.5 (this value also can be directly derived from (2)). Thus, the effective key 2) Without specification, the notation in this section is as mentioned in Sections 2 and 3.

Huang J L, et al.

Table 1

Sci China Inf Sci

July 2014 Vol. 57 072110:8

Time complexity of Algorithm 1 for conventional block ciphers, which also indicates effective key lengths

Block cipher

n

k

R

Time complexity of Algorithm 1

Previously best time complexity on full rounds

AES-128

128

128

10

2127.2

2126.1 [6]

AES-192

128

192

12

2191.1

2189.7 [6] 2254.4 [6]

AES-256

128

256

14

2255.1

SHACAL21)

256

512

64

2511

NO

MISTY1 [12]

64

128

8

2127.6

NO

12

2127.4

NO NO

ARIA-128 [13]

128

128

ARIA-128

128

192

14

2191.4

ARIA-128

128

256

16

2255.3

2126.1

2125.8 [7]

IDEA [14]

64

128

8.5

2127.4

KASUMI

64

128

8

2127.4

NO [10]

1) Handschuh H, Naccache D. SHACAL: a family of block ciphers. Submission to the NESSIE project, 2002.

length can be regarded as 127.5 bits. We consider a little more of the structure of AES, that is, its branch number in the diffusion layer. Only four bytes knowledge of Z 4 is needed for computing Z05 , and four bytes knowledge of X 6 is needed. This can save additional 24 S-boxes, and the time complexity of Algorithm 1 is reduced to 2127.2 . Similarly, the time complexity of Algorithm 1 for AES-256 is 2255.1 . Compared with our upper bound of key bits, the best attack result so far on AES-256 with a time complexity of 2254.4 has much less gain than expected, since the effective key bits of AES-256 is actually only 255.1 bits. We compute effective key lengths for other well-known block ciphers listed in Table 1. We briefly explain KASUMI [15]. Assume that the most time consuming sub-functions are three FI in each round for KASUMI. Only seven 16-bit words of the key require to be guessed before going to the third FI of round 1. Also, there is no need to guess all 128 bits of the key when the three FI operations are completed in round 8. Besides, the Feistel structure saves one more round in the middle, such that 127.4 there are 16 FI calculated for each guessed key. The time complexity is given as 2128 × 16 . 24 = 2 The above ciphers are all recommended as standards or used by the industry for secure communications. According to our analysis, their security margin needs to be reconsidered. For example, if an attack on SHACAL2 has a time complexity larger than 2511 , then this attack should be regarded as invalid. The best attack on full IDEA that was thought to have optimized 1.9 bits now should be regarded as only 1.3 bits optimization. 4.2

Lightweight block ciphers

Secure communication on extremely constrained devices has been developing, such as RFID tags and sensor nodes. The constraints are mainly driven by cost and result in highly limited computing power, chip area, and power supply, which mean that we must leave behind much of our conventional block ciphers. Thus, the development of lightweight block ciphers is progressing greatly, resulting in more and more aggressive designs that often show two features. First, innovative techniques are used to improve existing ciphers. Second, the security margins that the block ciphers are traditionally equipped with are reduced as much as possible to optimize the cipher performance. Because of these differences in conventional block ciphers, we discuss the application of Algorithm 1 on lightweight block ciphers separately. Take GOST as an example. GOST is known as the former Soviet encryption standard GOST 28147-89 which was standardized as the Russian encryption standard in 1989. It is well-suited for compact hardware implementations because of the simple structure, and the most compact implementation requires only 651 GE [16]. Therefore, GOST is considered as ultra lightweight. GOST has a 32-round Feistel structure with a 64-bit block size n and 256-bit key size k. The F -function consists of eight S-boxes. Refer to [17] for more details. The application of Algorithm 1 is as follows. Because of the Feistel structure, we can

Huang J L, et al.

Table 2

Sci China Inf Sci

July 2014 Vol. 57 072110:9

Time complexity of Algorithm 1 for lightweight block ciphers, which also indicates effective key lengths

Block cipher

n

k

R

Time complexity of Algorithm 1

Previously best time complexity on full rounds

GOST

64

256

32

2254.8

2224 [18]

PRESENT-80 [19]

64

80

31

279.7

NO

PRESENT-128

64

128

31

2127.6

NO NO

KATAN [20]

32/48/64

80

254

279.4

KTANTAN [20]

32/48/64

80

254

279.4

275.2 [21]

HIGHT [22]

64

128

32

2127.1

NO NO

XTEA [23]

64

128

64

2127.7

Piccolo-80 [24]

64

80

25

279.7

NO

31

2127.6

NO

Piccolo-128

64

128

check if R15 equals to L16 (Ri and Li are the right and left part of the input to round i). Compute P to S1 by guessing the seven 32-bit subkeys in the first seven rounds, and the least significant 28 bits of the subkey in round 8, 252 bits totally. Then, for each guess of the most significant 4 bits of the subkey in round 8, complete the encryption from S1 to R15 . This requires a calculation of 6 rounds and the last S-box in round 8, 49 S-boxes totally. Store the first 4 bits of R15 in a hash table corresponding to the guessed key. Similarly, compute C to S2 by guessing the seven 32-bit subkeys in the last seven rounds, as well as the least significant 28 bits of the subkey in round 25, 252 bits totally. Then, for each guess of the most significant 4 bits of the subkey in round 25, complete the decryption operations from S2 to L16 . This needs a computation from round 24 to round 16, and the last S-box in round 25, 73 S-boxes totally. Thus, for each guess of the 256-bit master key, 122 S-boxes require to be computed, which is 122/256 of a full 32 rounds encryption (there are 8 × 32 = 256 S-boxes for the full GOST). The time complexity of Algorithm 1 is about 2256 × 122/256 = 2254.9 , and so the effective key length is 254.9 bits. Also, we can only match part of R15 with part of L16 , e.g., their least significant 4 bits. To compute these 4 bits of R15 , only two S-boxes require to be calculated in round 14. Similarly, only two S-boxes are needed in round 16 for the matched 4 bits of L16 . Twelve S-boxes are saved now, so the time complexity is slightly reduced to 2254.8 . Note that previous attacks on full GOST make use of its self-similarity property and relatively simple key schedule. We only consider the basic structure, which means that even the key schedule is much more complicated, Algorithm 1 still cannot be avoided. Other results of lightweight block ciphers are summarized in Table 2. Some lightweight block ciphers have no nonlinear components, e.g., XTEA. In this situation, different linear operations in the round function are considered to cost the same time, or we can simply take the round function as a unit when computing the time complexity.

5

Discussion and conclusion

Recently, there are significant improvements on meet-in-the-middle attacks, as well as other brute-forcelike cryptanalysis. This makes us consider a universal attack on all block ciphers, except the traditional exhaustive search method. In a practical cryptographic primitive, there are always some independent sub-modules. Computing these sub-modules with an independent pattern, instead of a combinational pattern, will save the overall time complexity. We describe a generic meet-in-the-middle attack that can always be mounted against any practical block ciphers. No amount of clever design can prevent it, no matter how many rounds, or how complicated structure and key schedule the cipher has. Note that having many rounds is still an important and expedient way to protect against it, since a larger number of rounds brings a higher complexity for Algorithm 1. We indicate a more accurate upper bound of effective key lengths for practical block ciphers, and claim that no ciphers can reach their expected security margin, the given length of their master keys. Previously, exhaustive key search is generally considered as the

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:10

benchmark with which other attacks are measured. A theoretical break (or academic break) against a block cipher is an attack with time complexity less than that of exhaustive key search, i.e., 2k . Our analysis shows that tiny sacrifice of key bits is inevitable. Thus, if an attack has the computational complexity larger than Algorithm 1 (even still faster than exhaustive search), it cannot be regarded as a valid attack. Algorithm 1 is also used in many well-known block ciphers and their effective key lengths are calculated. As predicted, the effective key bits of these ciphers are all less than the master key size k. However, our attack will not create a real threat to the existing block ciphers, because of its limit caused by having to perform at least one operation for each possible key. Another interesting discussion is about the relationship between the block size with the master key size. Shannon’s work on information theory shows that to achieve the perfect secrecy, it is necessary for the key size to be at least as large as the block size. That is, k  n. According to our analysis in Section 3, when the number of rounds is fixed and the larger the k/n is, the more loss of effective key bits there is. Hence, k = n is the best solution in the block cipher design in this context. In the exhaustive key search, having to go through the entire key space before finding the correct key would be very unlucky, while being correct on the first guess would be very lucky. Thus, the expected time to recover a k-bit key is 2k−1 encryptions. Note that most of the effective key lengths we calculate for existing block ciphers are larger than this average case, although some are still smaller. Given that the time complexity of Algorithm 1 in this article is for the worst case, considering the average case and then comparing the result with 2k−1 can be undertaken for future work..

Acknowledgements This work was supported by National Natural Science Foundation of China (Grant Nos. 61073149, 61272440), Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20090073110027), State Key Laboratory of ASIC & System (Grant No. 11KF002), and Key Lab of Information Network Security, Ministry of Public Security (Grant No. C11603).

References 1 Luby M, Rackoff C. How to construct pseudo-random permutations from pseudo-random functions. In: Proceedings of Advances in Cryptology. Berlin/Heidelberg: Springer, 1986. 447–447 2 Even S, Mansour Y. A construction of a cipher from a single pseudorandom permutation. In: Proceedings of Advances in Cryptology. Berlin/Heidelberg: Springer, 1993. 210–224 3 Zhang B, Jin C H. Practical security against linear cryptanalysis for SMS4-like ciphers with SP round function. Sci China Inf Sci, 2012, 55: 2161–2170 4 Lv J Q. Differential attack on five rounds of the SC2000 block cipher. J Comput Sci Technol, 2011, 26: 722–731 5 Su B Z, Wu W L, Zhang W T. Security of the SMS4 block cipher against differential cryptanalysis. J Comput Sci Technol, 2011, 26: 130–138 6 Bogdanov A, Khovratovich D, Rechberger C. Biclique cryptanalysis of the full AES. In: Proceedings of the 17th International Conference on the Theory and Application of Cryptology and Information Security, Berlin/Heidelberg: Springer-Verlag, 2011. 344–371 7 Jia K, Yu H, Wang X. A meet-in-the-middle attack on the full KASUMI. Cryptology ePrint Archive, Report 2011/466, 2011 8 Biham E, Dunkelman O, Keller N, et al. New data-efficient attacks on reduced-round IDEA. Cryptology ePrint Archive, Report 2011/417, 2011 9 Lu J, Wei Y, Kim J, et al. Cryptanalysis of reduced versions of the Camellia block cipher. IET Inf Secur, 2012, 6: 228–238 10 Khovratovich D, Leurent G, Rechberger C. Narrow-Bicliques: cryptanalysis of full IDEA. Lect Note Comput Sci, 2012, 7237: 392–410 11 Daemen J, Rijmen V. AES proposal: Rijndael. In: Proceedings of the 1st Advanced Encryption Standard (AES) Conference, Ventura, 1998 12 Matsui M. New block encryption algorithm MISTY. Lect Note Comput Sci, 1997, 1267: 54–68 13 Kwon D, Kim J, Park S, et al. New block cipher: ARIA. Lect Note Comput Sci, 2004, 2971: 432–445 14 Lai X J, Massey J L, Murphy S. Markov ciphers and differential cryptanalysis. Lect Note Comput Sci, 1991, 547: 17–38

Huang J L, et al.

Sci China Inf Sci

July 2014 Vol. 57 072110:11

15 3rd Generation Partnership Project. Technical Specification Group Services and System Aspects, 3G Security, Specification of the 3GPP Confidentiality and Integrity Algorithms: KASUMI Specification. V3.1.1. 2001 16 Poschmann A, Ling S, Wang H. 256 bit standardized crypto for 650 GE: GOST revisited. In: Proceedings of Proceedings of the 12th International Conference on Cryptographic Hardware and Embedded Systems. Berlin/Heidelberg: Springer-Verlag, 2010. 219–233 17 National Soviet Bureau of Standards. Information Processing System—Cryptographic Protection—Cryptographic Algorithm GOST 28147-89. 1989 18 Dinur I, Dunkelman O, Shamir A. Improved attacks on full GOST. In: Proceedings of Fast Software Encryption. Berlin/Heidelberg: Springer, 2012. 9–28 19 Bogdanov A, Knudsen L R, Leander G, et al. PRESENT: an ultra-lightweight block cipher. Lect Note Comput Sci, 2007, 4727: 450–466 20 Canni`ere C D, Dunkelman O, Knezevic M. KATAN and KTANTAN—a family of small and efficient hardware-oriented block ciphers. Lect Note Comput Sci, 2009, 5747: 272–288 21 Bogdanov A, Rechberger C. A 3-subset meet-in-the-middle attack: cryptanalysis of the lightweight block cipher KTANTAN. Lect Note Comput Sci, 2010, 6544: 229–240 22 Hong D, Sung J, Hong S, et al. HIGHT: a new block cipher suitable for low-resource device. Lect Note Comput Sci, 2006, 4249: 46–59 23 Needham R M, Wheeler D J. TEA Extensions. Technical Report, Cambridge University, Cambridge, 1997 24 Shibutani K, Isobe T, Hiwatari H, et al. Piccolo: an ultra-lightweight block cipher. Lect Note Comput Sci, 2011, 6917: 342–357

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072111:1–072111:5 doi: 10.1007/s11432-013-4983-6

Cryptanalysis of a signcryption scheme with fast online signing and short signcryptext ZHOU DeHua1,2 , WENG Jian1,3 ∗ , GUAN ChaoWen3 , DENG Robert3 , CHEN MinRong4,5 & CHEN KeFei2 1Department

of Computer Science, Jinan University, Guangzhou 510632, China; of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China; 3School of Information Systems, Singapore Management University, Singapore 178902, Singapore; 4College of Information Engineering, Shenzhen University, Shenzhen 518060, China; 5School of Computer, South China Normal University, Guangzhou 510631, China 2Department

Received April 15, 2013; accepted May 23, 2013; published online May 9, 2014

Abstract Signcryption is functional combination of encryption and signature, efficiency higher than the separate signing and encrypting. Recently, Youn et al. presented a new signcryption scheme, which has fast online signing and short signcryptext, and is efficient enough for mobile applications. This scheme is claimed to be both existentially unforgeable and semantically secure. However, in this paper we shall show that it is not existentially unforgeable. Keywords

signcryption, existential unforgeability, semantical security, insider attack, bilinear pairing

Citation Zhou D H, Weng J, Guan C W, et al. Cryptanalysis of a signcryption scheme with fast online signing and short signcryptext. Sci China Inf Sci, 2014, 57: 072111(5), doi: 10.1007/s11432-013-4983-6

1

Introduction

Unforgeability and confidentiality are two basic security requirements in secure digital communications. Usually, the first requirement can be achieved by using digital signatures, and the latter can be ensured with encryption schemes. In Crypto’97, Zheng [1] introduced the notion of signcryption, which can be viewed as the functional combination of encryption and signature, and its efficiency is higher than the separate signing and encrypting. Since then, a number of signcryption schemes have been proposed, e.g., [2–10], including some variants for different settings, e.g., identity-based signcryption [11–14], certificateless signcryption [15–17]. Since more and more mobile devices are used for communications, it would be more and more necessary to design cryptographic schemes with lower computational overhead and less bandwidth. To this end, Youn and Hong [18] recently presented a signcryption scheme (denoted by YH scheme hereinafter), which enjoys the features of fast online signing and short signcryptext, and hence is quite suitable for mobile communications. This scheme is claimed to satisfy the existential unforgeability under chosen-message insider attacks. Unfortunately, in this paper, we shall indicate that this scheme does not satisfy the existential unforgeability under chosen-message insider attacks. We present two kind of attacks against their scheme: one is in accordance with their security model [18], and the other is outside their security model but is meaningful in the real world. ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Zhou D H, et al.

2

Sci China Inf Sci

July 2014 Vol. 57 072111:2

Preliminaries

Since YH scheme [18] involves some fundamental backgrounds, we here first review these backgrounds. 2.1

Bilinear pairing

Suppose (G1 , +) is a cyclic addictive group with generator P , and it has prime order p. (G2 , ·) is a cyclic multiplicative group, and its order is also p. We say that a map e : G1 × G1 → G2 is a bilinear pairing if it can simultaneously satisfy the below properties: • ∀P1 , P2 ∈ G1 , ∀x, y ∈ Z∗p , e(xP1 , yP2 ) = e(P1 , P2 )xy holds; • There exist Q1 , Q2 ∈ G1 satisfying e(Q1 , Q2 ) = 1G2 . Here 1G2 is the identity element in group G2 ; • There is an algorithm which can efficiently compute e(P1 , P2 ) for ∀P1 , P2 ∈ G1 . 2.2

Signcryption

Usually, a signcryption scheme includes the following three algorithms [18]: • KGen. Given a security parameter k, this algorithm outputs a private/public key pair(sk, pk). • Signcrypt. Given a message (plaintext) m ∈ M (here we use M to denote the message space), the private key skS of a sender and the public key pkR of a recipient, this algorithm returns a signcryptext c. • Unsigncrypt. On inputting a signcryptext c, the public key pkS of the sender and the secret key skR of the recipient, this algorithm returns a message m, or a special symbol “reject” to indicate that c is invalid. 2.3

Security notions for signcryption scheme

In [18], Youn and Hong formalized two security notions for signcryption: one is existential unforgeability against chosen-message insider attacks, and the other is semantical security against adaptively chosen-signcryptext insider attacks. Since our attacks presented in this paper are against the existential unforgeability of their scheme, we here only review the first security notion. The security notion of existential unforgeability under chosen-message attacks (EUF-CMA) for signcryption can be defined by the following game (denoted by GameEUF−CMA ), which is played between an SC adversary F and a challenger: • Initialization. The challenger first runs algorithm KGen to obtain a public/secret key pair (pkU ∗ , skU ∗ ), and then gives pkU ∗ to adversary F , while skU ∗ is kept secret by the challenger. • Queries. In this phase, adversary F can adaptively issues a polynomial number of queries as below: –Signcrypt query: F submits the public key pkR (= pkU ∗ ) of a recipient and a message m ∈ M, and then is given a corresponding signcryptext c. –Unsigncrypt query: F submits the public key pkS of a sender and a signcryptext c, the challenger runs algorithm Unsigncrypt on input (pkS , skU ∗ , c) and returns its output to F . • Forgery. In this phase, F outputs a signcryptext c and the private key skR of a recipient corresponding to public key pkR . We say that adversary F wins the above game, if the following two conditions are satisfied: (1) Unsigncrypt(c∗ , skR , pkU ∗ ) = m∗ ; (2) adversary F has not issued a signcryption query on (pkR , m∗ ). Note that GameEUF−CMA in fact describes an inside-security model [18]. We define adversary F ’s SC advantage as   . (1) Adv(F ) = Pr F wins GameEUF-CMA SC Definition 2.1. We say that a signcryption scheme is existentially unforgeable against chosen-message insider attacks (SC-EUF-CMA), if there is no polynomial-time adversary F who can have non-negligible . advantage Adv(F ) in the above game GameEUF−CMA SC

Zhou D H, et al.

3

Sci China Inf Sci

July 2014 Vol. 57 072111:3

Review of YH scheme

YH scheme [18] is specified by the following algorithms: System parameters. On inputting a security parameter k, this algorithm generates two bilinear groups (G1 , +) and (G2 , ·) with prime order p (here p is determined by the security parameter). Suppose the bilinear map is e : G1 × G1 → G2 . It then chooses a generator P ∈ G1 , and two secure hash functions H1 : {0, 1}∗ → Zp and H2 : G1 × G1 × G1 → {0, 1}k1 . Here k1 denotes the length of key in an IND-CCA secure symmetric encryption scheme (Enc, Dec). Finally, it outputs the system parameters params = {G1 , G2 , P, e, H1 , H2 , Enc, Dec, k1 , k2 }.

(2)

KGen. Via this algorithm, each user U randomly chooses xU ∈ Zp , and defines his/her public/private key pair to be (pkU , skU ) = (xU , QU ) = (xU , xU P ). Signcrypt. On inputting a message m ∈ {0, 1}∗, the private key xS of the sender S, and the public key QR of the recipient R, this algorithm first randomly chooses r ∈ {0, 1}k2 . Then it computes u = (xS + H1 (mr))−1 mod p, U = uP and V = EncK (mrQS ), where K = H2 (U, QR , uQR ). Finally, it sends c = (U, V ) as the signcryptext to recipient R. Unsigncryption. On inputting a signcryptext c = (U, V ), the public key pkS = QS of the sender, and the secret key skR = xR corresponding to the public key pkR of the recipient, this algorithm first computes K = H2 (U, QR , xR U ). Then it computes DecK (V ) to recover mrQS . Next, it checks whether e(U, QS + H1 (mr)P ) = e(P, P ) holds. If yes, it outputs the message m; otherwise, it outputs “reject” indicating c invalid.

4

Cryptanalysis of YH scheme

In [18], YH scheme is claimed to be existentially unforgeable against chosen-message insider attacks. Unfortunately, this is not true, since there exist chosen-message insider attacks against the scheme. We here first indicate how an adversary F can break the existential unforgeability of YH scheme, inside the security model as defined in [18]. Concretely, adversary F works as follows: 1. In the Initialization stage, F is given a public key pkU ∗ = QU ∗ = xU ∗ P . 2. In the Query stage, F issues a signcrypt query on a public key pkB = QB = xB P and a message m∗ ∈ M (note that the corresponding secret key skB = xB can be known to F , since he can generate this public key himself), and then is given a signcryptext c = (U, V ). According to YH scheme, the signcryptext c = (U, V ) should be of the following forms: U = uP, V = EncK (m∗ rQU ∗ ),

(3)

where r ∈ {0, 1}k2 , u = (xU ∗ + H1 (m∗ r))−1 mod p and K = H2 (U, QB , uQB ). 3. With skB = xB , adversary F computes K = H2 (U, QB , xB U ), and then recovers (m∗ rQU ∗ ) by computing DecK (V ). 4. F randomly picks xR ∈ Zp and defines another user R’s public key to be pkR = QR = xR P . Then it computes K  = H2 (U, QR , xR U ) and V  = EncK  (m∗ rQU ∗ ). 5. Finally, F returns a signcryptext c∗ = (U, V  ) and the private key skR = xR of a recipient corresponding to public key pkR = xR P . Observe that, c∗ is indeed a valid signcryptext with respect to the sender U ∗ , the recipient R and the message m∗ , since e(U, QU ∗ + H1 (m∗ r)P ) = e((xU ∗ + H1 (m∗ r))−1 P, QU ∗ + H1 (m∗ r)P ) = e((xU ∗ + H1 (m∗ r))−1 P, xU ∗ P + H1 (m∗ r)P ) = e((xU ∗ + H1 (m∗ r))−1 P, (xU ∗ + H1 (m∗ r))P ) = e(P, P )(xU ∗ +H1 (m = e(P, P ).



r))−1 (xU ∗ +H1 (m∗ r))

(4)

Zhou D H, et al.

Sci China Inf Sci

July 2014 Vol. 57 072111:4

Note also that, F has never submitted the signcrypt query (m∗ , pkR ). Thus the above adversary F can win GameEUF-CMA with non-negligible advantage, and hence it means that YH scheme is not existentially SC unforgeable against chosen-message insider attacks. The above attack is in accordance with the security model defined in [18]. Next, we show another attack which is outside their security model but is meaningful in the real world. Roughly speaking, given a signcryptext c generated by a sender S for a recipient R with respect to message m, the adversary can forge another signcryptext c∗ on behalf of another sender S ∗ (even if the underlying secret key is unknown to the adversary) for another recipient R∗ with respect to any message m∗ . Concretely, given a signcryptext c = (U, V ) with the following forms: U = uP, V = EncK (mrQS ),

(5)

where r ∈ {0, 1}k2 , u = (xS + H1 (mr))−1 mod p and K = H2 (U, QR , uQR ). The adversary randomly chooses r∗ ∈ Z∗p and an arbitrary message m∗ , and defines the public key for the sender S ∗ as pkS ∗ = QS ∗ = QS + (H1 (mr) − H1 (m∗ r∗ )) P . Note that the public key for the sender S ∗ is in fact of the form pkS ∗ = QS ∗ = (xS + H1 (mr) − H1 (m∗ r∗ )) P , although the underlying secret key skS ∗ = (xS + H1 (mr) − H1 (m∗ r∗ )) mod p is unknown to the adversary. Next, the adversary picks xR∗ ∈ Z∗p and defines another recipient’s public/secret key pair to be pkR∗ = QR∗ = xR∗ P and skR∗ = xR∗ . Then the adversary computes K ∗ = H2 (U, QR∗ , xR∗ U ), V ∗ = EncK ∗ (m ∗ r∗ QS ∗ ), and defines c∗ = (U, V ∗ ) as the signcryptext with respect to the sender S ∗ , the recipient R∗ and the message m∗ . Note that, this is indeed a valid signcryptext, since e(U, QS ∗ + H1 (m∗ r∗ )P ) = e((xS + H1 (mr))−1 P, (xS + H1 (mr) − H1 (m∗ r∗ )) P + H1 (m∗ r)P ) = e((xS + H1 (mr))−1 P, (xS + H1 (mr) − H1 (m∗ r∗ ) + H1 (m∗ r)) P ) = e((xS + H1 (mr))−1 P, (xS + H1 (mr))P ) = e(P, P )(xS +H1 (mr)) = e(P, P ).

5

−1

(xS +H1 (mr))

(6)

Conclusion

In this paper, we analyzed a short signcryption scheme recently proposed by Youn and Hong. By presenting two concrete attacks, we indicated that their scheme is not existentially unforgeable against chosen-message insider attacks. Thus it would be interesting to present new secure and efficient signcryption schemes for mobile communications.

Acknowledgements This work was supported by National Science Foundation of China (Grant Nos. 61272413, 61005049, 61133014, 61070249, 61272415), Fok Ying Tung Education Foundation (Grant No. 131066), Program for New Century Excellent Talents in University (Grant No. NCET-12-0680), Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20134401110011), Opening Project of Shanghai Key Laboratory of Integrate Administration Technologies for Information Security (Grand No. AGK2011003), R&D Foundation of Shenzhen Basic Research Project (Grant No. JC201105170617A), Guangdong Natural Science Foundation (Grant No. S2011010001206), Foundation for Distinguished Young Talents in Higher Education of Guangdong (Grant No. 2012LYM 0027), Fundamental Research Funds for the Central Universities and A*STAR SERC (Grant No. 102 101 0027 in Singapore).

References 1 Zheng Y L. Digital signcryption or how to achieve cost(signature & encryption) 0 (see [6,7]). Three simple facts are 1) ρs,c (Z) = ρs,−c (Z), 2) ρs,c (Z) decreases with the increase of c ∈ [0, 0.5], and 3) ρs,c (Z)/ρs,0.0 (Z) increases with the increase of s > 0, for any fixed c ∈ (0, 0.5]. Klein [6] and Gentry, Peikert et al. [7] obviously knew these simple facts, and they needed them for designing their Gaussian sampling algorithms. For a message m, the algorithm proceeds as follows: 1. Choose a string r and compute x = H(m, r) ∈ Zn . 2. Let v (n) ← (0, 0, . . . , 0), c(n) ← x, rate ← 1. For i ← n, n − 1, . . . , 1, do:  (i) ,b(i)   (a) ti = frac c , and let Ti = b(i) ,b(i) 

c(i) ,b(i)  b(i) ,b(i) 

− ti .

(b) Choose zi ∈ DZ,di ,ti . (c) Let c(i−1) ← c(i) − (Ti + zi )a(i) , and let v (i−1) ← v (i) + (Ti + zi )a(i) . (d) rate ← rate × {ρdi ,ti (Z)/ρdi ,0.0 (Z)}. 3. With probability rate output y = x − v (0) , otherwise go to step 1.  At the end of step 2, we obtain y and rate = ni=1 ρdi ,ti (Z)/ρdi ,0.0 (Z). We call rate the accepting rate of y. 3.2

Correctness

Lemma 1. Respectively take x and y as obtained at the end of step 1 and at the end of step 2 of the novel Gaussian sampling algorithm. Then (1) x ≡ y (mod Λ).   (2) y = x − (T1 + z1 )a(1) + (T2 + z2 )a(2) + · · · + (Tn + zn )a(n) = (t1 − z1 )b(1) + (t2 − z2 )b(2) + · · · + (tn − zn )b(n) . Therefore (t1 , t2 , . . . , tn ) uniquely corresponds to y, and is independent to x. (3) The conditional probability of “at the end of step 2 we obtain y” under the condition “at the n end of step 1 we obtain x” is exp{−y2 /s2 }/ i=1 ρdi ,ti (Z). Therefore this conditional probability is independent to x, and can be taken as the conditional probability of “at the end of step 2 we obtain y” under the condition “at the end of step 1 we obtain x such that x ≡ y (mod Λ)”. Proof. Lemma 1 is clear by [7]. We take H(·) as satisfying “universal hashing”. So that Assumption 1 is trivial.

Hu Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072112:5

Assumption 1. For any fixed m, the randomness of the string r makes the distribution of x indistinguishable from uniform distribution over Zn /Λ, where x = H(m, r) ∈ Zn . Lemma 2 ( [6,7]). Suppose {xk , P (xk ), k = 1, 2, . . . , K} is a probabilistic distribution. Make such a ranwith probability q(xk ) output dom experiment: choose a sample xk from this probabilistic distribution,   xk , K otherwise repeat. Then we obtain the probabilistic distribution { xk , P (xk )q(xk )/ i=1 P (xi )q(xi ) , k = 1, 2, . . . , K}. Proposition 1. Under the novel sampling algorithm, the distribution of final output y = x − v (0) is DZn ,s,(0,0,...,0) . Proof. By Lemma 1 and Assumption 1, if step 3 of the novel sampling algorithm is changed into “output n y = x − v (0) ”, the probability of final output y would be exp(−y2 /s2 )/(Zn /Λ × ( i=1 ρdi ,ti (Z))). Now consider Lemma 2, and we know that n ρ (Z) exp(−y2 /s2 ) exp(−y2 /s2 )  n  × ni=1 di ,ti  n . = Zn /Λ × Zn /Λ × i=1 ρdi ,0,0 (Z) i=1 ρdi ,ti (Z) i=1 ρdi ,0.0 (Z) So that, under the novel sampling algorithm, the probability of final output v (0) is   exp(−y2 /s2 ) exp(−u2 /s2 ) exp(−  y 2 /s2 )  n   n  = . 2 2 Zn /Λ × Zn /Λ × u∈Zn exp(−  u  /s ) i=1 ρdi ,0.0 (Z) i=1 ρdi ,0.0 (Z) u∈Zn Proposition 1 is proven. Lemma 2 and Proposition 1 show the function of the accepting rate for rounding the output. Proposition 1 tells us that, for any s > 0, the novel Gaussian sampling algorithm will output a Gaussian variable with the deviation s. 3.3

Efficiency

Notice that step 2 of the novel Gaussian sampling algorithm is almost original Gaussian sampling algorithm, except for an additional computation to obtain the accepting rate rate. The computation of rate is simple because we can look up Table 1, to obtain ρs,c (Z). Table 1 lists values of ρs,c (Z) for c ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5} and s2 ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8}. Table 1 is a crude table, but is enough for our analysis. For practical usage, it should be more exact. So that the novel Gaussian sampling algorithm is almost repeated implementations of original Gaussian sampling algorithm, with random repeating times. We will see that smaller deviation s means more repeating times. n (Z) is a lower bound of the accepting rate. By considering mean value of geometi=1 ρdi ,0.5 (Z)/ρ  di ,0.0 ric distribution, ni=1 ρdi ,0.0 (Z)/ρdi ,0.5 (Z) is an upper bound of the average times implementing step 2 of the novel algorithm. That is, we have Proposition 2. Proposition 2. It needs to implement step 2 of the novel Gaussian sampling algorithm averagely not n more than i=1 ρdi ,0.0 (Z)/ρdi ,0.5 (Z) times to output y = x − v (0) . So that the total average time cost of n the novel Gaussian sampling algorithm is not more than ( i=1 ρdi ,0.0 (Z)/ρdi ,0.5 (Z))(Ts + Th ). 3.4

Results

 n  Denote d = min{d1 , d2 , . . . , dn }. Then ni=1 ρdi ,0.5 (Z)/ρdi ,0.0 (Z)  ρd,0.5 (Z)/ρd,0.0 (Z) , and we have the following results. Suppose n = 50. To guarantee the distribution of v (0) of original Gaussian sampling algorithm to be DZn ,s,(0,0,...,0) , it should be satisfied s  1.116B. With s = 0.71B averagely not more than e1.45 times implementing step 2 of the novel Gaussian sampling algorithm will output y.

Hu Y P, et al.

Sci China Inf Sci

Table 1

July 2014 Vol. 57 072112:6

The values of ρs,c (Z)

s2 \c

0.0

0.1

0.2

0.3

0.4

0.5

0.1

1.0000908

0.9051465

0.6719822

0.4140163

0.2292202

0.1641700

0.2

1.0134759

0.9710097

0.8602396

0.7241362

0.6146861

0.5730356

0.3

1.0713512

1.0521431

1.0018654

0.9397380

0.8894914

0.8703026

0.4

1.1642607

1.1559982

1.1343668

1.1076292

1.0859984

1.0777363

0.5

1.2713416

1.2678986

1.2588849

1.2477434

1.2387297

1.2352867

0.6

1.3802971

1.3788913

1.3752112

1.3706625

1.3669823

1.3655766

0.7

1.4859043

1.4853384

1.4838569

1.4820256

1.4805442

1.4799783

0.8

1.5865115

1.5862861

1.5856957

1.5849661

1.5843759

1.5841503

0.9

1.6819640

1.6818750

1.6816416

1.6813531

1.6811198

1.6810306

1.0

1.7726372

1.7726022

1.7725105

1.7723972

1.7723055

1.7722704

1.1

1.8590369

1.8590232

1.8589875

1.8589431

1.8589073

1.8588936

1.2

1.9416538

1.9416485

1.9416345

1.9416173

1.9416033

1.9415981

1.3

2.0209191

2.0209172

2.0209117

2.0209050

2.0208995

2.0208974

1.4

2.0971999

2.0971990

2.0971971

2.0971944

2.0971923

2.0971916

1.5

2.1708055

2.1708050

2.1708043

2.1708033

2.1708024

2.1708021

1.6

2.2419970

2.2419970

2.2419968

2.2419963

2.2419960

2.2419958

1.7

2.3109972

2.3109972

2.3109972

2.3109970

2.3109968

2.3109968

1.8

2.3779964

2.3779964

2.3779964

2.3779964

2.3779962

2.3779962

Suppose n = 100. To guarantee the distribution of v (0) of original Gaussian sampling algorithm to be DZn ,s,(0,0,...,0) , it should be satisfied s  1.210B. With s = 0.77B averagely not more than e times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 200. To guarantee the distribution of v (0) of original Gaussian sampling algorithm to be DZn ,s,(0,0,...,0) , it should be satisfied s  1.298B. With s = 0.77B averagely not more than e2 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 500. To guarantee the distribution of v (0) of original Gaussian sampling algorithm to be DZn ,s,(0,0,...,0) , it should be satisfied s  1.406B. With s = 0.84B averagely not more than e2 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 1000. To guarantee the distribution of v (0) of original Gaussian sampling algorithm to be DZn ,s,(0,0,...,0) , it should be satisfied s  1.483B. With s = 0.89B averagely not more than e1.5 times implementing step 2 of the novel Gaussian sampling algorithm will output y.   All data above can be obtained by computing ρd,0.5 (Z)/ρd,0.0(Z) according to Table 1. 3.5

Efficiency under Assumption 2

 n  More accurately, E(rate) = E i=1 ρdi ,ti (Z)/ρdi ,0.0 (Z) is the average value of the accepting rate, and it is the probability of the event that, at step 3, we output a y instead of going to step 1. By considering mean value of geometric distribution, 1/E(rate) is the average times implementing step 2 of the novel algorithm. To obtain E(rate) we present Assumption 2. According to our experiments, Assumption 2 is approximately true for random lattice with the dimension n  20 and for the deviation s  0.2B. Assumption 2. Let Λ be an n-dimensional lattice, and s > 0 be a parameter. Implement step 2 of the novel Gaussian sampling algorithm, with this s, to obtain y and corresponding {t1 , t2 , . . . , tn }. It is n assumed that each of {t1 , t2 , . . . , tn } is uniformly distributed over [−0.5, 0.5], and that E( i=1 ρdi ,ti (Z) n /ρdi ,0.0 (Z)) = i=1 (E(ρdi ,ti (Z))/ρdi ,0.0 (Z)). By Assumption 2 and Table 1, we can approximately obtain average value of the accepting rate of n step 2, not only obtaining its lower bound. We have that E(rate) = i=1 (E(ρdi ,ti (Z))/ρdi ,0.0 (Z)) =  n 0.5 ρdi ,t (Z)dt/ρdi ,0.0 (Z). This immediately results in Proposition 3. i=1 2 0

Hu Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072112:7

Proposition 3. Suppose Assumption 2 is true. Then it needs to implement step 2 of the novel Gaussian  0.5 n sampling algorithm averagely i=1 ρdi ,0.0 (Z)/(2 0 ρdi ,t (Z)dt) times to output y = x−v (0) . So that the  0.5 n total average time cost of the novel Gaussian sampling algorithm is ( i=1 ρdi ,0.0 (Z)/(2 0 ρdi ,t (Z)dt))(Ts +Th ). 3.6

Results under Assumption 2

 0.5  0.5 n Denote d = min{d1 , d2 , . . . , dn }. Then i=1 2 0 ρdi ,t (Z)dt/ρdi ,0.0 (Z)  (2 0 ρd,t (Z)dt/ρd,0.0 (Z))n , and we have the following results. Suppose n = 50. With s = 0.64B, averagely not more than e1.6 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.71B, averagely not more than e0.8 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 100. With s = 0.71B, averagely not more than e1.6 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.77B, averagely not more than e0.6 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 200. With s = 0.77B, averagely not more than e1.2 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.84B, averagely not more than e0.4 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 500. With s = 0.84B, averagely not more than e times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.89B, averagely not more than e0.48 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 1000. With s = 0.89B, averagely not more than e0.96 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.95B, averagely not more than e0.57 times implementing step 2 of the novel Gaussian sampling algorithm will output y.  0.5 All data above can be obtained by computing 2 0 ρd,t (Z)dt according to Table 1. These results show an interesting feature that, if the deviation of the novel Gaussian sampling algorithm is not smaller than 0.64 of that of original Gaussian sampling algorithm, the time cost is not clearly increased. The reason is that the average time cost of original Gaussian sampling algorithm, excluding the use of the hash function, increases linearly with the increase of the parameter s. Here we only take one of the above results as an example. Suppose that we have a 200-dimensional lattice and, to guarantee Gaussian distribution, the deviation of original sampling should be not smaller than 1.298B. Under Assumption 2, a Gaussian sample with the deviation s = 0.84B can be obtained by averagely not more than e0.4 times implementing step 2 of the novel Gaussian sampling algorithm. The total average time cost of original Gaussian sampling algorithm is not smaller than T1.298B + Th . The total average time cost of the novel Gaussian sampling algorithm is not larger than e0.4 (T0.84B + Th ) = 0.967T1.298B + 1.5Th. Although we cannot determine Th , a reasonable assumption is that Th is much smaller than T1.298B . In many special cases, the results are better. • Example. Consider the case B = b(1)  = · · · = b(n/2)  = 2b(n/2+1)  = · · · = 2b(n) . This  0.5 means d = d1 = d2 = · · · = dn/2 = 12 dn/2+1 = · · · = 12 dn and E(rate) = (2 0 ρd,t (Z)dt/ρd,0.0 (Z))n/2 ×  0.5 (2 0 ρ2d,t (Z)dt/ρ2d,0.0 (Z))n/2 . Suppose n = 50. With s = 0.64B, averagely e0.96 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.71B, averagely e0.35 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 100. With s = 0.71B, averagely e0.7 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.77B, averagely e0.28 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 200. With s = 0.77B, averagely e0.56 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.84B, averagely e0.2 times implementing step 2 of the novel Gaussian sampling algorithm will output y. Suppose n = 500. With s = 0.84B, averagely e0.5 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.89B, averagely e0.35 times implementing step 2 of the

Hu Y P, et al.

Sci China Inf Sci

July 2014 Vol. 57 072112:8

novel Gaussian sampling algorithm will output y. Suppose n = 1000. With s = 0.89B, averagely e0.7 times implementing step 2 of the novel Gaussian sampling algorithm will output y. With s = 0.95B, averagely e0.5 times implementing step 2 of the novel Gaussian sampling algorithm will output y.  0.5  0.5 All data above can be obtained by computing (2 0 ρd,t (Z)dt/ρd,0.0 (Z))×(2 0 ρ2d,t (Z)dt/ρ2d,0.0 (Z)) according to Table 1. In general cases, if the deviation of the novel Gaussian sampling algorithm is smaller than 0.64 of that of original Gaussian sampling algorithm, the average time cost may be clearly larger, and it may increase rapidly with further decrease of the deviation.

4

Summarization

In this paper we present a novel Gaussian sampling algorithm, for the purpose of decreasing the deviations, therefore reducing the space sizes of lattice based public-key ciphers. The novel Gaussian sampling algorithm can decrease the deviation to 0.64 ∼ 0.75 of that of original Gaussian sampling algorithm without clearly increasing the average time cost. Then the average time cost may increase rapidly with further decrease of the deviation.

Acknowledgements This work was supported by National Natural Science Foundation of China (Grant Nos. 61173151, 61303198) and Science and Technology on Communication Security Laboratory (Grant No. 9140C110201110C1102). This work was also supported by Huawei Co. (Grant No. YBCB2012026).

References 1 Goldreich O, Goldwasser S, Halevi S. Public-key cryptosystem from lattice reduction problems. In: Proceedings of CRYPTO’1997, Santa Barbara, 1997. 112–131 2 Hoffstein J, Howgrave-Graham N, Pipher J, et al. NTRUSign: digital signatures using the NTRU lattice. In: Proceedings of CT-RSA’2003, San Francisco, 2003. 122–140 3 Szydlo M. Hypercubic lattice reduction and analysis of GGH and NTRU signatures. In: Proceedings of EUROCRYPT’2003, Warsaw, 2003. 433–448 4 Nguyen P Q, Regev O. Learning a parallelepiped: cryptanalysis of GGH and NTRU signatures. In: Proceedings of EUROCRYPT’2006, Saint Petersburg, 2006. 271–288 5 Hu Y P, Wang B C, He W C. NTRUSign with a new perturbation. IEEE Trans Inf Theory, 2008, 54: 3216–3221 6 Klein P. Finding the closest lattice vector when it’s unusually close. In: Proceedings of SODA’2000, San Francisco, 2000. 937–941 7 Gentry C, Peikert C, Vaikuntanathan V. How to use a short basis: trapdoors for hard lattices and new cryptographic constructions. In: Proceedings of STOC’2008, Victoria, 2008. 197–206 8 Peikert C. An efficient and parallel Gaussian sampler for lattices. In: Proceedings of CRYPTO’2010, Santa Barbara, 2010. 80–97 9 Babai, Lov´ asz. Lattice reduction and the nearest lattice point problem. Combinatorica, 1986, 6: 1–13 10 Cash D, Hofheinz D, Kiltz E, et al. Bonsai trees, or how to delegate a lattice basis. In: Proceedings of Eurocrypt’2010, Nice, 2010. 523–552 11 Agrawal S, Boneh D, Boyen X. Efficient lattice (H)IBE in the standard model. In: Proceedings of Eurocrypt’2010, Nice, 2010. 553–572 12 R¨ uckert M. Lattice-based blind signatures. In: Proceedings of ASIACRYPT’2010, Singapore, 2010. 413–430 13 Gordo S D, Katz J, Vaikuntanathan V. A group signature scheme from lattice assumptions. In: Proceedings of ASIACRYPT’2010, Singapore, 2010. 395–412 14 R¨ uckert M. Strongly ungorgeable signatures and hierarchical identity-based signatures from lattices without random oracles. In: Proceedings of PQCrypto’2010, Darmstadt, 2010. 182–200 15 Wang F H, Hu Y P, Wang B C. Lattice-based linearly homomorphic signature scheme over binary field. Sci China Inf Sci, 2013, 56: 112108

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072113:1–072113:11 doi: 10.1007/s11432-013-4898-2

Real-time control of human actions using inertial sensors LIU HuaJun1 , HE FaZhi1 ∗ , ZHU FuXi1 & ZHU Qing2 1School

2School

of Computer, Wuhan University, Wuhan 430072, China; of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China Received March 6, 2013; accepted March 26, 2013; published online June 20, 2013

Abstract Our study proposes a new local model to accurately control an avatar using six inertial sensors in real-time. Creating such a system to assist interactive control of a full-body avatar is challenging because control signals from our performance interfaces are usually inadequate to completely determine the whole body movement of human actors. We use a pre-captured motion database to construct a group of local regression models, which are used along with the control signals to synthesize whole body human movement. By synthesizing a variety of human movements based on actors’ control in real-time, this study verifies the effectiveness of the proposed system. Compared with the previous models, our proposed model can synthesize more accurate results. Our system is suitable for common use because it is much cheaper than commercial motion capture systems. Keywords simulation

avatars, motion capture/editing/synthesis, interaction techniques, game interaction, animation

Citation Liu H J, He F Z, Zhu F X, et al. Real-time control of human actions using inertial sensors. Sci China Inf Sci, 2014, 57: 072113(11), doi: 10.1007/s11432-013-4898-2

1

Introduction

The ability to synthesize human action precisely in real-time can give a user/trainee the chance to control a virtual avatar using his/her own body movements, navigate the virtual world, or accomplish a virtual task. Such a system could also be used in real sports training, rehabilitation, and real-time control of game characters or robotic systems such as tele-operation. The challenge has already been partially solved by commercial motion capture (mocap) equipment, however, it is quite expensive for common use. Because the systems generally require the performer to wear skin-tight clothing along with no less than 40 retro-reflective markers, 18 magnetic or inertial sensors, or a full-body exoskeleton, they are cumbersome for actors. Recently, major game console companies, including Microsoft, Sony, and Nintendo, have developed next generation hardware devices to capture the online performance of individual players. These control interfaces are suitable as performance interfaces because of their low cost and unobtrusiveness. However, control signals from these devices are often noisy and low-dimensional, and therefore cannot be used to control human movement accurately. ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2013 

info.scichina.com

link.springer.com

Liu H J, et al.

Figure 1

Sci China Inf Sci

July 2014 Vol. 57 072113:2

The actor wearing six sensors to accurately control a virtual character.

This study presents a new approach to performance animation that uses six inertial sensors to create a system to control an entire avatar accurately (see Figure 1). Our system employs inertial sensors because they are low-cost, compact, and highly accurate. However, constructing a performance animation interface is challenging because control signals from the equipment are fairly low-dimensional, and often inadequate for determining the full-body movement of actors. (Usually, more than fifty degrees of freedom (DOF) are used in representing a virtual human character.) Our approach is to teach an online dynamic model using a pre-captured motion database and employ it to constrain the synthesized pose to look natural. The proposed model predicts the current pose qt using its previous m poses qt−1 , . . . , qt−m using a group of mathematical functions. Generally, it is difficult to predict how people move because human action is highly nonlinear. Instead of teaching a global dynamic motion model, which is often not appropriate for modeling the nonlinear properties of complex human movements, this study proposes to construct online local regression models. While running, we use the K-nearest neighbor search method to find the K closest sequences that are similar to the recently synthesized poses from the pre-captured database. These examples along with their subsequent poses are employed as training data to obtain a prediction function that finds the relationship between the previous m poses and the current pose. At each moment, our system produces a new local model for the next pose. The proposed model is effective for human motion because it takes the heterogeneity of a pre-captured database into full consideration. Using the constrained maximum a posteriori (MAP) inference approach, the problem of online motion reconstruction is formulated using a priori information from online local models together with a likelihood term imposed by the control signals.

2

Related work

• Performance animation interfaces. Commercial mocap equipment is one of the most popular technologies used to control a virtual character. The mocap system is based on passive/active optical, exoskeleton, and magnetic sensors, all of which are capable of carrying out real-time capture of an actor’s movements. However, the systems are quite expensive when used by large numbers of actors. In addition, they are very complex, cumbersome and tedious, because they require the performer to wear skin-tight clothing along with more than 40 retro-reflective markers which must be carefully positioned, 18 magnetic or inertial sensors, or an exoskeleton. Synthesizing total character motion using only a few sensors has already been well explored. To control a standing avatar in real-time, Badler [1] proposed the inverse kinematics (IK) method together with four magnetic sensors. To reduce the kinematic redundancy, a heuristic method was used in their approach. In contrast, our study adopts a data-driven approach. Compared to Badler’s solution, Semwal [2] used eight magnetic sensors and added an analytic method to the IK technique. Using a foot pressure sensor, Yin [3] created a system that searches and duplicates motion from a prerecorded database. However, the

Liu H J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072113:3

method can barely reconstruct whole human movements for a narrow range of motions, and foot pressure sensors cannot provide enough information to synthesize an upper body action accurately. Using five cheap inertial sensors, Slyper’s method [4] can control and synthesize upper-body movement. Different from their work, we could control whole body human actions using just six sensors. In addition, we created a group of time-varying local models to constrain and synthesize real-time human motion rather than search for the closest example and implement it. Recently, Ha [5] created a system, which used one foot pressure sensor and Sony PlayStation Move to reconstruct upper-body motion, which was similar to Slyper’s method. However, our method can achieve not just upper-body but full-body control. Tautges [6] used accelerometers as signal providing equipment, and his method exceeded space limitations, but accelerometers cannot provide positional information, so the reconstructed results were not natural enough. Compared with our method in this study, their method cannot achieve accurate motion control. Liu [7] reached full-body human motion control using six inertial sensors, however, we used a more powerful model to enable dimensionality reduction and achieve better results. More recently, some researchers developed an accurate human motion control method using depth information provided by a depth camera [8,9], whereas, our interactive human motion control system is based on six inertial sensors. Unlike vision sensors, inertial sensors do not suffer from occlusion problems. In this study, we focus on marker-based approaches. • Data-driven animation. A number of data-driven approaches were developed. Typically, we use three extremely different approaches: interpolating [10–12], motion graph [13–18] and a statistical modeling constraint approach [7, 19–25]. Because the first two approaches cannot achieve real-time requirements, we constructed a statistical dynamic model to predict the current pose using previous synthesized poses. To date, statistical motion models have been widely applied to synthesize realistic human motion. They can be used for inverse kinematics [20], interactively control human action using several retro-reflective markers [19], perturbations of natural-looking human motion [21], using manipulation interfaces to edit human motion [23], using the Gaussian process latent variable model (GPLVM) to synthesize human motion [25], performance animation using a global model [22], real-time motion control using a local principle component regression (PCR) model [7], building physically-valid motion models for human motion synthesis [24], and others. Among the above-mentioned statistical models, ours is extremely similar to local models constructed in the subspace for online control of human motions [7,19], because all of them were built during runtime and based on training data which were close to the current example. Nevertheless, there is a very important difference. For regression learning approaches, the training data can be divided into two parts: input and output data. The principle component analysis (PCA) model used in [19] only focuses on the dimensionality reduction of the input data, and the PCR model used in [7] focuses on the dimensionality reduction of both the input and output data. However, these two models fail to recognize the projection relationship between the input and output training data. The model we propose in this study estimates an input-output projection with a linear combination of basis regression equations or functions, therefore, it can add more spatial-temporal relationships consistently in a pre-captured database than previous models. Our test in Section 7 shows that the proposed model can synthesize more natural-looking human actions than previous local pose models. In addition, the online local dynamic models make it easier to find suitable structures for high-dimensional global models.

3

Overview of performance interface

Our performance interface automatically transforms control inputs from six inertial sensors into realistic human actions by building sequential local models during runtime and then using them to interpret the performer’s action (see Figure 2). Our performance interface includes the following components. • Calibration of the local coordinates for sensors and skeletal sizing. A calibration step is implemented for two reasons: one is that different actors have different skeletal size; the other is that for the same user, the way he/she wears the sensors may vary, so our system needs to map the coordinates of each sensor to the control coordinates of the user’s body. Thus, a new calibration approach is introduced,

Liu H J, et al.

Motion performance

Sci China Inf Sci

July 2014 Vol. 57 072113:4

Online motion synthesis Online local model Online local modeling

Preprocessed motion capture data

Figure 2 System overview. An actor wearing six inertial sensors performs the desired motions using the InterSense IS-900 system. The motion performance step automatically reconstructs the 3D orientation and position of each sensor for every time step in real-time. While running, the performance interface automatically transforms control signals into high quality human motion using a motion database.

Figure 3

Sensors for our avatar control system.

which is robust to both different users and various sensor placements. Our performance interface requires the user to wear six inertial sensors on his/her head, center of torso, both hands and both ankles for performance-driven animation, as shown in Figure 3. By guiding the user to complete eight “calibration” poses, the calibration step can estimate the user’s skeletal size and each sensor’s local coordinates at the same time. • Online modeling of human dynamic behavior. A novel statistical local model is presented for our online motion synthesis. Our performance interface uses sequential local linear models which are constructed from a pre-captured database to model various human actions on the fly. One advantage of modeling is that our proposed models have the ability to predict the movements of actors in local regions of the configuration space. • Online motion reconstruction. While running, the actor performs the desired motion using six inertial sensors. The global 3D orientations and 3D positions of all sensors are recorded simultaneously using our performance interfaces, [c1 ,. . . ,ct ]. This information is useful because it describes the trajectories of special points and vectors on the body of the avatar. By combining the current control signals ct provided by the sensors and the constructed local probabilistic model based on previous m reconstructed

Liu H J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072113:5

˜ = [q˜t−1 ,. . . ,q˜t−m ], our system can synthesize the user’s pose qt in a constrained MAP framework: poses Q ˜ ∝ max Pr(ct |qt ) · Pr qt |Q ˜ . max Pr qt |ct , Q qt

(1)

qt

˜ we can convert the By applying the negative log to the posteriori distribution function Pr(qt |ct , Q), constrained MAP problem into an energy minimization problem: ˜ , min − ln Pr(ct |qt ) + − ln Pr qt |Q (2) qt   !   ! Econtrol

Eprior

where Econtrol is the likelihood term that measures the extent to which the synthesized pose qt matches the current signals ct , and Eprior is the prior term that describes the prior distribution of human motion. Conceptually, the prior term tests the naturalness of the reconstructed pose. The calibration step is completed offline, however, the motion modeling and reconstruction process are executed online. In the following sections, we give a detailed description of these three components.

4

Calibration of local coordinates for skeletal size and sensors

An InterSense IS-900 system was used to record 3D orientation/position data of all inertial sensors (40 fps) in real-time from our performance interfaces. The IS-900 processes motion signals from a tracking device to compute 3-DOF orientation and position data, where the orientation data is integrated from magnetometers, gyroscopes, and accelerometers, and the position data is provided by ultrasonic sensors. The calibration step proposed above ensures that the performance interfaces are adequate for various sensor placements and for actors with different skeletal lengthes. Furthermore, the skeletal size calibration step aims to calculate the size of the actor’s skeleton and the sensors’ local coordinates calibration step computes each inertial sensor’s local coordinates. Eight “calibration” poses are used for calibration. The interface guides the user to perform the same pose as the green pose (target pose) which is shown on the screen, and the performance interface records the global orientation and location of inertial measurement sensors under these calibration poses (see the accompanying video). To reduce the ambiguity in the process of modeling a human skeleton, a lowdimensional eigen model is built based on data from a human skeleton. All skeletal data in our experiments are from the Carnegie Mellon University (CMU) motion capture database1) and are represented in the format of the Acclaim Skeleton File. The skeletal size is represented by a vector s which records each bone’s length. The vectors oj and pj which are related to the sensor’s local coordinate systems, are used to represent the orientations and positions of the j th inertial measurement sensor. The vectors qi , i = 1, . . . , 8 are used to represent the calibration poses. Hence, we can now solve the nonlinear optimization problem for our calibration step: "2 " H "2 " "2 " " "  " " " " j" j" arg min λh eh " . "f (s, oj ; qi ) − di " + "f (s, pj ; qi ) − li " + α "s − e0 − " " {oj },{pj },{λh },s i

j

(3)

h=1

In the above formula, given the orientations oj and positions pj of the j th sensors, and the actor’s skeletal size, s, the forward kinematics (FK) function f calculates the calibration poses’ joint angles qi , The vectors dji and lij are the recorded global directions and locations of the j th inertial sensor for the ith calibration pose. The scalar α weighs the importance of the skeletal model priors learned from the prerecorded data. The vector e0 is the mean value of skeletal model, and eh , h = 1, . . . , H are the eigen vectors. To calculate oj , pj , and s, we run an optimization calculation using the Levenberg-Marquardt algorithm [14]. 1) http://mocap.cs.cmu.edu.

Liu H J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072113:6



Prediction function

Figure 4 The key concept of our online local modeling. The points on the top line are recently reconstructed poses, and the other lines are the K motion examples from the motion capture database that are close to the recently reconstructed poses. We establish the relationships for these K motion examples and their subsequent poses to predict the next pose on the top line.

5

Online modeling of human dynamic behavior

The motion control problem is definitely challenging because the information from six inertial sensors attached to a user cannot fully constrain a full-body avatar’s joint angles, because the control signals are of low-dimensionality while the full-body joint angles are of high-dimensionality. Our approach is to automatically build sequential online local regression models to adequately constrain the synthesized pose within the natural-looking solution space. We assume human action can be represented by an m-order Markov chain, so the current pose qt can be considered to depend only on previous m poses: Pr(qt |qt−1 , . . . , q1 ) = Pr(qt |qt−1 , . . . , qt−m ). Nevertheless, modeling the dynamic behavior of human motion is difficult because human action is nonlinear, and a global dynamic model may not be sufficient to model complex movement. To solve this problem, sequential local regression models are constructed on the fly to predict how humans move. To predict the current pose at frame t, the first step is to search the motion database captured ˜ = in advance, and find the motion segments closest to the recently constructed motion segment Q [q˜t−1 , . . . , q˜t−m ]. The K closest motion segments [qtk −1 , . . . , qtk −m ], along with their subsequent poses qtk , k = 1, . . . , K, which are then used as training data to learn a prediction function g that maps the previous m poses to the current pose, as shown in Figure 4. Suppose a linear relationship exists between an input joint angle vector x = [qt−1 , . . . , qt−m ] and an output joint angle vector y = qt . For simplification, the function of the proposed model which is represented using linear regression is (4) y = αT x + βy , where the input joint angle x is an m × D-dimensional vector. D represents the dimension of the DOF for a human character and y is the joint angle value for the output. Regression coefficients α are vectors, and βy represents a homoscedastic noise variable, which is independent of vector x. Moreover, given the K motion examples {(xk ; yk )}, k = 1, . . . , K, which are similar to the current synthesized poses, and by K minimizing the expected error E = k=1 yk − αT xk 2 , we can obtain the coefficients α: α = (X T X)−1 X T y.

(5)

The row of the matrix X includes the input joint angle vectors xk , k = 1, . . . , K, and the K output joint angle values are stacked in vector y. In our implementation, first, we put input motion data X and output motion data y together, represented as A = [Xy], and principle component analysis is then applied to A. Thus, the eigenvectors can be extracted from the covariance matrix C = AT A. Therefore, the principal subspace contains the direction of the joint angle data distribution. When we perform dimensionality reduction, we prove that the directions existing in the input joint angle space have highly predictive values. In our implementation,

Liu H J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072113:7

by mapping input motion joint angle data as close as possible to the principal subspace, this subspace is directly used for regression operation. We can decompose the eigen vector matrix U into Ux and Uy , U T = [UxT , UyT ], where Ux represents the input joint angle space and Uy represents the output joint angle space. To obtain a mapping relationship from the input to output joint angle spaces, first, we minimize x − Ux v2 with respect to the value of eigen vector v, and can then achieve v = (UxT Ux )−1 Ux x, and the output y = Uy v. Thus we can obtain the regression coefficients  −1 α = Uy UxT Ux Ux .

(6)

Because the matrix U is orthogonal, it means UxT Ux + UyT Uy = 1, and the invertible square matrixes E and S have the feature: (E + U SU T )−1 = E −1 − E −1 U (S −1 + U T E −1 U )−1 U T E −1 , so we can obtain a more easy-to-calculate appraoch for the coefficients α:    −1 Uy UyT . α = Ux UyT − UyT Uy UyT − I

(7)

Suppose there exists a Gaussian distributed noise variable βy , its standard deviation σ can be estimated by yk − αxk , k = 1, . . . , K. In our experiment, a predicted function for each DOF of the synthesized pose is constructed, therefore, to predict the d th DOF of the pose, we can describe our local regression model as ˜ qt,d = αT (8) d Q + N (0, σd ), where qt,d , σd are scalars, qt,d represents the d th DOF of the t th frame pose, and σd represents the ˜ are vectors, αd are the regression coefficients standard deviation for the d th prediction function. αd , Q ˜ for d th DOF, and Q are the reconstructed poses before current synthesized pose.

6

Online motion synthesis

In this section, we solve the problem of how to synthesize sequential poses from the control information provided by six inertial sensors. During runtime, our performance animation system automatically combines the control signals and online local regression models, and synthesizes a performer’s poses frame by frame. 6.1

Control stability

The stability of our control system is very important. However, the control signals ct provided by inertial sensors might change because of Gaussian noise. Suppose σ is the standard normal distribution, the control term of sensors can be defined as Econtrol = − ln Pr(ct |qt ) ∝

f (qt ; s˜, L) − ct 2 , 2πσ 2

(9)

where qt , s˜, L, ct are vectors, qt is the synthesized pose, s˜ is the avatar’s skeletal size, L are the inertial sensors’ local coordinates, and ct are the observation data provided by sensors. The FK function f calculates the global coordinate values for the current pose. There exist outliers in control signals provided by inertial sensors, especially for the positional data. Because of the ultrasonic sensors and the occlusion problems, the positional data may be destroyed by outliers, missing data and error accumulation. Focusing on these problems, we adopt the Lorentzian robust estimator to filter the noise data. Thus the matching cost term can be defined as follows:   e2 ρ(e) = log 1 + 2 , (10) 2σ where e is the distance value between the predicted signals and the observation signals, and the parameter σ is for the robust estimator.

Liu H J, et al.

6.2

Sci China Inf Sci

July 2014 Vol. 57 072113:8

Motion priors

We use the prior term to constrain the synthesized motion to meet the probabilistic distribution up to similar motion data in the local region. In our animation system, the prior term is defined as follows:   D T ˜ 2 # − α Q) (q t,d d ˜ ∝ Pr(qt |Q) . (11) exp − 2πσd2 d=1

where qt,d , d = 1, . . . , D is the d th DOF of the current pose qt . αd and σd are the regression coefficients ˜ sequentially records the m previous and standard deviation of the d th prediction model. The vector Q synthesized poses [q˜t−1 , . . . , q˜t−m ]. ˜ We can obtain the following energy formulation by minimizing the negative log of Pr(qt |Q): Eprior =

 (qt,d − αT Q) ˜ 2 d

d

6.3

2πσd2

.

(12)

Implementation details

We adopted gradient-based optimization using the Levenberg-Marquardt method2) for the objective function which is defined in (2), and used the most similar motion example which already exists in the database to initialize the optimization. Because of a good initialization, the optimization converged quickly. The computational efficiency of our animation system mainly relies on the searching scope in the motion databases, so we accelerated the process for the K nearest neighbor search with a similar strategy in [19]. Our system was able to reach an average frame rate of 37 fps real-time synthesis.

7

Results and evaluation

The database we captured includes five full-body behavior movements: golf swing (2537 frames), basketball (6582), boxing (29852), walking (20866) and running (5772). All of them were recorded using a Vicon mocap system3) which has a frame rate of 120 fps. To match the inertial sensor’s frame rate, the original mocap data were downsampled to 40 fps. We verified the effectiveness of our proposed approach on various movements based on a large motion database and evaluated the reconstructed results with ground-truth data. • Testing on performance control and evaluation. We used all the control signals provided by six inertial sensors to control a full-body avatar in real-time (please watch the video). The video also shows the performance comparison between our method and the IK method. Because the inertial sensor can provide two types of data (orientation and position), we used the position data for center of torso, head and both ankles, and used orientation data for both hands. The results show that without prior data, we cannot achieve real-time control based only on the IK technique. In addition, we used the leave-one-out error evaluation method to evaluate the quality of the synthesized actions. Figure 5 shows the average synthesis errors. The errors were calculated by degrees per joint angle per frame, using the average distance between the motion captured by the Vicon mocap system and the synthesized motion. We considered three types of control information: 1) 3D position and 3D orientation; 2) only 3D orientation; 3) only 3D position. We found that if we used both position and orientation constraints, the reconstruction errors were the lowest in the above three combinations of information. In addition, compared with 3D orientation information, 3D position information was more useful and had better compliance with the constraints in motion reconstruction. • Calibration for skeleton and local coordinates. Our system needs to be robust for different users and various sensor wearing styles, so we used an average skeleton size which was calculated using different skeleton files from the CMU mocap database. It was used as the standard subject for skeleton 2) Lourakis M I A. Levmar: Levenberg-Marquardt Nonlinear Least Squares Algorithms In C/C++. 2009. 3) http://www.vicon.com.

IK Local PCA Local PCR Our method

Sci China Inf Sci

July 2014 Vol. 57 072113:9

14

6

12 10

Mean error (°)

20 18 16 14 12 10 8 6 4 2 0

Mean error (°)

Mean error (°)

Liu H J, et al.

8 6 4

0

Boxing Running Walking Basketball (a)

4 3 2 1

2 Golf

5

Golf

Boxing Running Walking Basketball

0

Golf

Boxing Running Walking Basketball

(b)

(c)

Figure 5 Comparisons with inverse kinematics, local principle component analysis and local principle component regression algorithms. (a) Motion synthesis using only orientation signals provided by all the inertial sensors; (b) motion synthesis using only position signals provided by all the inertial sensors; (c) motion synthesis using both orientation and position signals provided by all the inertial sensors. The bars from left to right are mean error from IK, local PCA, local PCR and our method.

9 8 -

Score

7 6 5 4 3 2 1 0

With calibration

Without calibration

Figure 6 Study of the comparison between calibration and no calibration by the user. Score 9 means most realistic, and score 0 means least realistic. Table 1

Skeleton size comparison for calibration

Skeleton size

Femur

Tibia

Back

Neck

Head

Clavicle

Humerus

Radius

Wrist

Standard Subject

7.23

7.54

7.89

4.21

1.93

3.89

6.57

4.02

1.85

Calibration data 1

6.59

7.38

7.39

3.97

2.48

3.73

5.56

3.06

1.51

Ground truth 1

6.53

7.41

7.42

3.94

2.59

3.75

5.51

3.09

1.54

Calibration data 2

6.58

6.89

7.03

3.45

1.76

3.49

5.04

2.85

1.37

Ground truth 2

6.57

6.81

6.99

3.39

1.72

3.47

4.99

2.71

1.35

calibration. We tested different users, and Table 1 shows the calibration results for several skeletons of two different users. We found that, after our calibration step, the user’s skeleton size was close to his/her ground truth data captured by the Vicon mocap system. After the calibration process, we also obtained local coordinates for each sensor. We asked sixteen users to provide a score (1–9) for the online synthesized motions, without telling them whether we calibrated or not. The users were chosen from undergraduate students with little experience of 3D animation. We tested different human movements. Figure 6 shows the study of the comparisons of motion quality with and without calibration by the user. We found that users usually chose the motion after calibration as the better one, and give it a high score. The results from the user study told us that our calibration step is important and useful for the quality of our online motion control. • Comparisons with previous algorithms. To test its performance, we compared the IK techniques,

Liu H J, et al.

Sci China Inf Sci

7 Local PCA Local PCR Our method

5

Average reconstruction error (Degree per joint angle)

Average reconstruction error (Degree per joint angle)

6

4 3 2 1 0 0

July 2014 Vol. 57 072113:10

100 200 300 400 500 600 700 800 900 1000 Frame number (a)

Local PCA Local PCR Our method

6 5 4 3 2 1 0 0

200

400

600 800 1000 1200 1400 1600 Frame number (b)

Figure 7 Frame-by-frame comparison for one testing sequence. (a) Walking motion; (b) boxing motion. The lines from top to bottom are reconstruction errors from local PCA, local PCR and our method. Table 2

Comparison of the average reconstruction errors for different methods and different databases 65609 poses

1.1 M poses

IK

4.76

4.76

LPCA

1.97

1.42

LPCR

1.75

1.19

Our method

1.43

0.87

local PCA models in [19] and local PCR models in [7] with the proposed model in our study. Figure 5 shows the standard deviations and mean errors of the reconstruction errors for various movements (golf swing, basketball, boxing, walking and running). Figure 7 shows the frame-by-frame comparison of reconstruction errors for single test data between the local PCA model, local PCR model and our proposed model. The assessment results indicated the synthesis results using our proposed method were better than the results created by the other two methods. • Different information from sensors. The video of our study analyzed four combinations of input signals from the inertial sensors. The results told us that the more constraints used, the smaller the reconstruction errors. It is not surprising that when the total information from all sensors was used, the reconstruction errors were the smallest. • Testing on different databases. Table 2 gives the average reconstruction errors of five different actions from four algorithms for two different training databases. One database has 65,609 poses based on five captured motion sequences, and the other has 1.1 M poses downloaded from the CMU database. The reconstruction errors were calculated using both 3D orientation and position constraints from six inertial sensors. We found that when the size of the training database was increased, the reconstruction error reduced. By testing on different databases, we also verified the benefits of the proposed model.

8

Conclusion

In this study, a new local model was introduced for real-time control of a virtual person using only six initial sensors. The proposed method, which was based on a data-driven approach, was to use several nearest motion examples to construct sequential online local regression models for online motion synthesis. There was one limitation for the proposed method, which was that the motion data needed to be previously prepared for online search. However, the proposed model demonstrated better performance over previous models, and our performance interface which used only six sensors was much cheaper and less intrusive for full-body avatar control.

Liu H J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072113:11

Acknowledgements This work was supported by National Science and Technology Support Program (Grant No. 2012BAH35B02), National Natural Science Foundation of China (Grant No. 61070078), and Fundamental Research Funds for Central University. Particular thanks to the kind guidance of Prof. Jinxiang Chai in Texas A&M University.

References 1 Badler N I, Hollick M, Granieri J. Realtime control of a virtual human using minimal sensors. Presence, 1993, 2: 82–86 2 Semwal S, Hightower R, Stansfield S. Mapping algorithms for real-time control of an avatar using eight sensors. Presence, 1998, 7: 1–21 3 Yin K, Pai D K. FootSee: an interactive animation system. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, San Diego, 2003. 329–338 4 Slyper R, Hodgins J. Action capture with accelerometers. In: Proceedings of 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Dublin, 2008. 193–199 5 Ha S, Bai Y, Liu C. Human motion reconstruction from force sensors. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Vancouver, 2011. 129–138 6 Tautges J, Zinke A, Kruger B, et al. Motion reconstruction using sparse accelerometer data. ACM Trans Graph, 2011, 30: 18 7 Liu H J, Wei X L, Chai J X, et al. Realtime human motion control with a small number of inertial sensors. In: Proceedings of the 2011 Symposium on Interactive 3D Graphics and Games. New York: ACM, 2011. 133–140 8 Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington DC: IEEE Computer Society, 2011. 1297–1304 9 Wei X L, Zhang P Z, Chai J X. Accurate realtime full-body motion capture using a single depth camera. ACM Trans Graph, 2012, 31: 188 10 Kovar L, Gleicher M. Automated extraction and parameterization of motions in large data sets. ACM Trans Graph, 2004, 23: 559–568 11 Kwon T, Shin S Y. Motion modeling for online locomotion synthesis. In: ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Los Angeles, 2005. 29–38 12 Mukai T, Kuriyama S. Geostatistical motion interpolation. ACM Trans Graph, 2005, 24: 1062–1070 13 Heck R, Gleicher M. Parametric motion graphs. In: Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games. New York: ACM, 2007. 129–136 14 Kovar L, Gleicher M, Pighin F. Motion graphs. ACM Trans Graph, 2002, 21: 473–482 15 Lee Y, Wampler K, Bernstein G, et al. Motion fields for interactive character locomotion. ACM Trans Graph, 2010, 29: 1–8 16 Levine S, Wang J, Haraux A, et al. Continuous character control with low-dimensional embeddings. ACM Trans Graph, 2012, 31: 28 17 Min J Y, Chai J X. Motion graphs++: a compact generative model for semantic motion analysis and synthesis. ACM Trans Graph, 2012, 31: 153 18 Safonova A, Hodgins J K. Construction and optimal search of interpolated motion graphs. ACM Trans Graph, 2007, 26: 108 19 Chai J X, Hodgins J. Performance animation from low-dimensional control signals. ACM Trans Graph, 2005, 24: 686–696 20 Grochow K, Martin S L, Hertzmann A, et al. Style-based inverse kinematics. ACM Trans Graph, 2004, 23: 522–531 21 Lau M, Chai J X, Xu Y Q, et al. Face poser: interactive modeling of 3D facial expressions using facial priors. ACM Trans Graph, 2009, 29: 3 22 Liu H J, He F Z, Cai X T, et al. Performance-based control interfaces using mixture of factor analyzers. Visual Comput, 2011, 27: 595–603 23 Min J Y, Chen Y L, Chai J X. Interactive generation of human animation with deformable motion models. ACM Trans Graph, 2009, 29: 9 24 Wei X L, Min J Y, Chai J X. Physically valid statistical models for human motion generation. ACM Trans Graph, 2011, 30: 19 25 Ye Y, Liu C. Synthesis of responsive motion using a dynamic model. Comput Graph Forum, 2010, 29: 555–562

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072114:1–072114:11 doi: 10.1007/s11432-013-4934-2

Video color conceptualization using optimization CAO XiaoChun1 ∗ , ZHANG YuJie1 , GUO XiaoJie1 & CHEUNG Yiu-Ming2 1School 2Department

of Computer Science and Technology, Tianjin University, Tianjin 300072, China; of Computer Science, Hong Kong Baptist University, Hong Kong SAR, Hong Kong 999077, China Received July 6, 2013; accepted October 21, 2013

Abstract Color conceptualization aims to propagate “color concepts” from a library of natural color images to the input image by changing the main color. However, the existing method may lead to spatial discontinuities in images because of the absence of a spatial consistency constraint. In this paper, to solve this problem, we present a novel method to force neighboring pixels with similar intensities to have similar color. Using this constraint, the color conceptualization is formalized as an optimization problem with a quadratic cost function. Moreover, we further expand two-dimensional (still image) color conceptualization to three-dimensional (video), and use the information of neighboring pixels in both space and time to improve the consistency between neighboring frames. The performance of our proposed method is demonstrated for a variety of images and video sequences. Keywords

color conceptualization, color discontinuity, optimization, color correspondence, video sequence

Citation Cao X C, Zhang Y J, Guo X J, et al. Video color conceptualization using optimization. Sci China Inf Sci, 2014, 57: 072114(11), doi: 10.1007/s11432-013-4934-2

1

Introduction

Images and videos provide visual perceptions. There are many aspects of the content of an image, each providing different information. An important aspect providing much of the visual perception of an image is the composition of colors. Csurka et al. [1] abstracted the concepts of look and feel (e.g. capricious, classic, cool, and delicate impressions) from an image according to color combinations. In practice, one may want to edit an image or video according to different task demands or personal preferences. Generally, altering the color of the image or video is a popular and intuitive way to meet such requirements [2–11]. Reinhard et al. [5] proposed a method of borrowing the color characteristics of an image via simple statistical analysis. Researchers [2–4] have proposed different colorization or color transfer methods that obtain colors from given reference images taking a color correspondence approach. Automatic colorization methods that search for reference images on the Internet using various filtering algorithms have been proposed [6,7]. The success of the methods [1–7] depends heavily on finding a suitable reference image, which can be a rigorous and time-consuming task. The colorization methods employed in [8,9] are based on a set of chrominance scribbles. The process is tedious and does not always provide natural-looking results. Cohen-Or et al. [10] and Tang et al. [11] changed the colors of pictures to give the sense of a more harmonious state using empirical harmony templates of color distribution. ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:2

Fresh 120° 0° Cool

240°

Warm (a)

(b)

Figure 1 (a) Verbal terms extracted by clustering many images into different moods according to their hue distribution. The left three columns are some of the clustered images, while the right column shows the hue distributions of the image; (b) the input image and its hue distribution.

However, this technique cannot change colors to flexibly meet the demands of users. Hou et al. [12] first introduced a novel technique to change the image color intentionally, which is called “image color conceptualization”. In their work [12], prototypes of color distributions were generated by clustering a vast number of images, and the mood of the input image was then changed by transferring the color distributions to it. Xu et al. [13] also proposed a method with which to change the emotion conveyed by images. They used a learning framework for discovering emotion-related knowledge, such as color and texture. They then constructed emotion-specific models from features of the image super-pixels. To change the conveyed emotion, they defined a piece-wise linear transformation to align the feature distribution of the target image to the statistical model. The goal of their method was to change the high-level emotion, while the method that we propose here focuses on changing color using low-level features. The method proposed in this paper is most closely related to the work of Hou et al. [12]. Hou and Zhang designed a clustering model to generate prototypes of color distributions from an input library of natural landscape and architectural images, and labeled each distribution with a verbal term such as “warm” or “cold” (see Figure 1(a)). The main component of each color distribution (i.e., the color concept), which corresponds to the representative color mood of the image, is then extracted. The propagation of a certain color concept to the target image is manipulated by adopting the peak-mapping method. However, since the hue wheel is shifted without the consideration of spatial information, some artifacts may be introduced during the propagation. In this paper, we use an optimization method to solve the problem employing a simple premise: neighboring pixels in space-time with similar intensities should have similar colors [8]. With consideration of the spatial information, the spatial continuity of colors in the generated image is ensured. Moreover, the optimization plays an important role in expanding the color conceptualization technique to three dimensions (i.e., video). Color conceptualization for video is much more attractive and challenging than that for a still image. However, to the best of our knowledge, no such system exists. The most straightforward idea is to apply the color conceptualization technique to each frame individually. However, this does not exploit the coherence between adjacent frames. In fact, a video may contain many different shots and, even in the same shot, there are many significant changes such as different illumination and the movement of objects. Therefore, the result obtained by simple application to individual frames effect of the method is often far from satisfactory. Even in the same shot, adjacent frames probably differ in terms of their color conceptualization, which results in flickering. Another possible solution for color conceptualization of video is video object segmentation [14,15], in which changes are made across frames of the same shot to avoid flickering. Unfortunately, current three-dimensional segmentation techniques are not precise enough. In this paper, we alternatively apply the optimization method to video color conceptualization to ensure the continuity of colors both spatially and temporally, and thus provide a pleasant experience when watching the output video. Levin et al. [8] proposed a method of coloring image sequences

Cao X C, et al.

Figure 2

Sci China Inf Sci

July 2014 Vol. 57 072114:3

The left half of each picture is the original input image, and the right half is the color-conceptualized result.

making the simple assumption that nearby pixels in space time with similar gray levels should have similar colors. This assumption is further employed by us in the most important step of the video color conceptualization. The proposed method is demonstrated to be effective for solving the discontinuity problem in an experiment. The paper is organized as follows. In Section 2, we discuss the existing method for image color conceptualization and formulate the problem of color conceptualization. In Section 3, we propose an optimization method to improve the existing technique, and extend the proposed color conceptualization method to three dimensions (i.e., video). We also detail problems arising in implementation and the corresponding solutions. Various experiments are carried out in Section 4 for both images and video frames. Section 5 presents conclusions.

2

Related work

The main goal of color conceptualization is to extract color concepts by clustering images and to change the mood of an input image via propagating the expected color concept to it (as shown in Figure 2). All the work in this paper is conducted in HSV [16] color space and is based on hue wheel representation. 2.1

Hue wheel representation of image

In HSV color space, the hue is distributed around a circle, starting with the red primary at 0◦ , passing through the green primary at 120◦ and the blue primary at 240◦ , and then wrapping back to red at 360◦ (as shown in Figure 1(b)). Given an input image, we firstly convert it into the HSV color space. The hue wheel HX (i)of the input image is then defined as [12]  HX (i) = S(p) · V (p). (1) (i−1)π iπ 180 H(p) 180

Here H(p), S(p)and V (p) are the hue, saturation and value of pixel p from image X, and i ∈ [1, 360] is an integer. The range of hue is divided into 360 orientations. We thus obtain 360 bins around the hue wheel. Subsequently, by calculating the value of HX (i) for every i, the histogram of hue wheel representation HX is obtained as shown in Figure 1(b). The expression respects the fact that the pixels with high saturation and high brightness always attract more attention. For one image, there might be multiple peaks in the hue wheel. However, only the dominant color is represented by the dominant peak; therefore, we choose the strongest peak as the main color of the image. To cut the main hue peak at the proper position in the hue wheel, we adopt the three following steps (as shown in the upper half of Figure 3). 1. Fit peak P (k)by a Gaussian function Gθ,σ (where θ and σ are the mean and variance of G). 2. Set the left cut position θL = θ − 2.5σ and the right cut position θR = θ + 2.5σ. 3. Save P (k) (θL  k  θR ) as the main hue peak of the image (since about 98.76% of the Gaussian distribution is within θL  k  θR ). 2.2

Clustering images

Numerous color naming models intend to relate a numerical color space with semantic color names in natural language such as “grass green” and “light sky blue” [17–19]. The terms relate to color impressions;

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:4

Hue values undetermined θ'L

θL

Hue values undetermined θR

θ'R

θC −r θLC θC θRC θC +r Figure 3 (Upper) The hue wheel of the input image; (lower) the hue wheel of the color concept. There are two alternatives  , θ ): a shift to the range [θ for the values in [θL L C−r , θLC ] or no change; there are two alternatives for the hue values in  ]: a shift to the range [θ (θR , θR RC , θC+r ] or no change.

e.g., “light sky blue” distinguishes a particular color mood from other color distributions [12]. Most of the images convey an atmosphere by a main color. By clustering images through the Kullback-Leibler (KL) divergence of the distributions of hue wheels, we can extract typical moods. The KL divergence of hue wheels D(X  C) is defined as D(X  C) =

360  i=1

HX (i) log

HX (i) , HC (i)

(2)

where HX is the hue wheel of the input image and HC is the hue wheel of an image category. Given an image library, we use the algorithm proposed in [12] to cluster images into different categories. Images in the same category have the same mood and we label each category with a subjective description such as “warm” or “cold”. For an image category C, the hue wheels of all the images in the category are calculated according to (1), and are then assumed to form the hue wheel of the category, which is represented by HC . The dominant peak of HC represents the current main mood, which we call the “color concept”. 2.3

Propagating the color concept

Color conceptualization is the process of replacing the hue peak of the input image with the desired color concept. Here we normalize the hue peak according to i HX (t) L R(i, HX ) = t=θ , (3) θR t=θL HX (t) and then use the algorithm in [12] (which we call the color mapping algorithm for convenience) to propagate the color concept as follows. 1. For each i ∈ (θL , θR ), calculate R(i, HX ). 2. For each i, find j that satisfies R(i, HX ) = R(j, HC ). 3. Assigni = j. In the color manipulations made using the color mapping algorithm, the peak of a hue wheel is uniformly cut off at i = θL and i = θR . However, in real implementation, this may result in artifacts in the image introduced by the “splitting” of a contiguous region of the image [10]. An example is presented in Figure 4 (middle). The splitting occurs in some regions with similar color, with part of a region falling within the peak and the other part falling outside the peak, which leads to discontinuity of color after color transformation. To solve this problem, Hou et al. [12] used a local minimum position to cut the peak, and achieved good results for most images. However, the method is not always effective (Figure 4 (middle)). In many cases, directly cutting off the hue peak at any position will similarly result in discontinuity. Therefore, it is necessary to explore a new approach that uses spatial information to enforce spatial continuity.

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:5

Figure 4 (Left) The input image. (Middle) The result obtained using the method in [12], with color discontinues on the petal; (right) the result obtained using our method.

3 3.1

Color conceptualization using optimization Spatially consistent image color conceptualization

Inspired by image and video colorization assisted by optimization [8], we combine the cost function and optimization of the hue wheel to solve the peak boundary problem. The main steps are elaborated below. 1. Fit the hue peak P (k) of the input image with a Gaussian function Gθ,σ , as in [12] (see the red fitting line in the upper half of Figure 3). The left cut position is initialized as θL = θ − 2.5σ, and the right cut position as θR = θ + 2.5σ. The hue peak falling in [θL , θR ] is changed to the desired color concept using the color mapping algorithm mentioned above.    = θL − d and θR = θR + d, and keep the hue values falling in [0, θL ) 2. Define two new cut positions, θL    and (θR , 2π] (i.e., to the left of θL and the right side of θR in the upper half of Figure 3) unchanged. The parameter d will be discussed in Section 4.   , θL ) or (θR , θR ] (the parts 3. There are two alternatives for the pixels with hue values falling in [θL below the black curly braces in the upper half of Figure 3): to change the color concept or not. In the  , θL ) are changed to θCL , and case that the color concept is changed, the hue values of pixels falling in [θL  the hue values of pixels falling in (θR , θR ] are changed to θCR . Here θCL and θCR are respectively the left border and right border of the desired color concept (as shown in Figure 3). The optimal scheme B(X) of the given image X is determined by minimizing the following function over choices for all undetermined pixels.  $ $2    $ $ $H(p) − (4) wpq H(q)$$ , B(X) = arg min p∈X $ q∈N (p) where H(p) is the hue value of pixel p in the input image X, and N (p) is the set of eight neighbors of pixel p. Note that wpq is the weight coefficient satisfying [20] wpq ∝ e−d(xp ,xq )/2σ1 , 2

(5)

where d(xp , xq ) is the squared difference between the intensities of pixels p and q. Here σ1 is the variance of intensity in a window around pixel p. Obviously, wpq increases as the difference between intensities  decreases. For a given pixel p, q∈N (p) wpq = 1. The minimization in (4) guarantees that neighboring pixels have similar colors if their intensities are similar. 3.2

Video color conceptualization

Compared with still-image color conceptualization, video color conceptualization is much more attractive and challenging because it involves the issue of ties and changes between adjacent frames. In addition, there may be various scenes in one video, and their theme contents and main colors can vary. If conceptualized uniformly, the video will likely appear awkward and distorted. Moreover, the color conceptualization one desires should be based on video content, rather than being arbitrary. Therefore, scene segmentation is essential.

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:6

(a)

(b)

(c)

(d)

Figure 5 (a) Four successive frames of an input video; (b) color conceptualization results obtained using Hou and Zhang’s method [12] for each frame individually, with discontinuous and varying red regions on a leaf; (c), (d) color conceptualization results obtained using our method.

State-of-the-art shot-detection methods [21, 22] can be used in our framework. To demonstrate the performance of our method, we use a simple and effective method to distinguish different scenes in the video that is based on the square of the absolute difference of the gray value. In practice, we compute the average value of the squared absolute difference between adjacent frames from the first frame: Mf =

n 

2

|If (k) − If +1 (k)| ,

(6)

k=1

where If (k) is the gray value of pixel k in frame f . For a given threshold, the frame f is treated as the beginning of a new scene, if Mf is equal to or greater than a pre-defined threshold. The remaining work is to concentrates on each single scene. Even for the same scene, video color conceptualization cannot be as simple as image color conceptualization. Applying image color conceptualization to each frame individually usually leads to flickering artifacts in the output video; e.g., Figure 5(b). There are two main reasons for this. First, the hue wheels of two adjacent frames are highly unlikely to be exactly the same, which results in different colors needing to be changed in the two frames. Second, the edges of the objects changing during the conceptualized process are unstable because of the absence of a time consistency constraint. Instead of calculating the hue wheel representation of each single frame separately, a hue wheel representation of the whole shot can be computed using (1). Similar to the first two steps in the propagating of the color concept described in Subsection 3.1, the hue peak of the video shot is fitted with a Gaussian function Gθ,σ , and the left and right borders are θL = θ − 2.5σ and θR = θ + 2.5σ respectively. Two   = θL − dv and θR = θR + dv . Subsequently, the hue values falling in additional cut positions are θL [θL , θR ] are changed to [θLC , θRC ]according to the color mapping algorithm mentioned above, while the   hue values falling in [0, θL ) or (θR , 2π] remain unchanged. There are also two options for the pixels with   ]: a shift in the hue value or no shift. However, as opposed to the hue values falling in [θL , θL ) or (θR , θR case of the image color conceptualization, we use both spatial and temporal information to structure the optimization problem so that the best scheme for the whole shot can be obtained. Analogously, according to the principle that neighboring frames with similar intensities are expected to have similar color, the objective function to be minimized can be formalized as B(X) =

%

 p∈V

H(p) −

 q∈N (p)

&2 H(q) ,

(7)

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:7

where H(p) is hue value of the pixel p in the input video, and wpq is the weight coefficient satisfying (5). As opposed to the case of image manipulation, N (p) here represents 26 neighboring pixels in spatial-temporal space [23]. 3.3

Color correspondence

In the case that the hue of a pixel is to be changed, we have previously changed the hue to θLC if the   , θL ), and to θRC if the hue value falls in (θR , θR ]. However, this would still result in hue value falls in [θL artifacts since pixels with different hue values may change to the same value. Consequently, instead of changing all pixels to the same value, we employ a more elaborate scheme [10] to achieve correspondence of color appearance [24]: H  (p) = θC + r(1 − G(|H(p) − θX |)), (8)  ], H  (p) is the hue value that pixel p will change to if where p is a pixel with a hue value falling in [θR , θR it needs to change, H(p) is the original hue value of the pixel p, and r (> θRC ) is a parameter that will be discussed later. θX and θC are the mean values of the Gaussian function fitting the hue peak of the shot and the concept peak, respectively. G is a Gaussian function with mean zero and standard deviation σ 2  and ranges continuously from 0 to 1. From (8), we find that the hue values of pixels falling in (θR , θR ] will become distributed near θRC in the same order as their original values, and become compact (as  shown in Figure 3). The hue values of pixels falling in [θL , θL ) are changed to be near θLC using a similar method.

3.4

Circle problem of hue

The main principle of our method is that neighboring pixels in space-time that have similar intensities should have similar colors. Under this assumption, we decided the hue value of each undetermined pixel according to the weighted sum of its adjacent pixels. However, the distribution of the hue value is a circular ring, where hue = 0 and hue = 2π represent the same color. As an extreme example, if the hue of an undetermined pixel depends on two neighboring pixels with weighting coefficients w1 = 0.5 and w2 = 0.5, and the hue values of the two pixels are H1 = 0 and H2 = 2π, then using the proposed method, the expected hue value of the undetermined pixel is π. This means that the color of a pixel in a pile of red pixels may change to green, which is obviously unreasonable. The simplest solution is to make the hue distribution linear by disconnecting the hue wheel at an appropriate point according to the specific input pictures. The undetermined points and their neighboring points are always in or near the hue peaks of the input image and the color concept. We should find a cutoff point as far from both hue peaks as possible. We can then guarantee that there is only one distance between any two neighboring pixels among the undetermined points and the near hue values will not be put apart. A1 and A2 are the median points of θX and θC , from two directions, respectively, and the one with larger distance to θX is the farthest point to the two main peaks. Therefore, the median point A1 or A2 with larger distance to θX will be selected as the cutoff point.

4

Experimental results

In this section, we present various image and video results obtained using our proposed method. First, we illustrate that color conceptualization is different from color transfer in two respects. First, the main purpose of color conceptualization is to change the mood of a picture while that is not the case for color transfer. Second, color conceptualization generates color concepts by clustering a number of pictures once, while the color transfer method has to find a suitable reference picture for each target image. We experimentally investigate the performance of our method for a variety of pictures and videos. The parameter d (introduced in Subsection 3.1) is crucial because it decides the number of pixels with undetermined hue values. If the value of d is too small, there are quite a few pixels with undetermined hue values (as shown in Figure 6(b), the color of the mountain on the left is not consistent). On the

Cao X C, et al.

(a) Figure 6

Sci China Inf Sci

(b)

July 2014 Vol. 57 072114:8

(c)

(d)

(a) The input image; (b), (c), (d) the resulting image obtained using our method with d = 10, 140, and 60.

(a)

(b)

(c)

(d)

Figure 7 (a) The upper image is the input image. The white areas in the bottom image are undetermined pixels; (b), (c), (d) resulting images obtained using our method with r = 2.5σC , r = 3σC , and r = 4σC , respectively.

Figure 8

The first picture is the input image and other pictures are the output conceptualized images.

other hand, if the value of d is too large, some background pixels are wrongly labeled as undetermined (as shown in Figure 6(c), almost the whole image becomes the same color). In our implementation, we set d = 60 as shown in Figure 6(d). The parameters r and σ2 (introduced in Subsection 3.3) collaboratively decide the closeness of the hue  distribution of the undetermined pixels. The hue values in (θR , θR ] would change to [θRC , θC+r ], where r decides the maximum distribution width and σ2 decides the specific distribution as shown in Figure 7. rmust be larger than 2.5σC because the distribution range must include (θC −2.5σC , θC +2.5σC ] according to our method. On the other hand, the value of r cannot be arbitrarily large. If the value is too large, there may be unexpected colors in the undetermined pixels because the distribution width of the hue is too large. Figure 7 shows results for an image with different values of r. The results show that our method is not sensitive to the parameter r. Even the magnified views show only minor differences with respect to varying r. We user = 3σC throughout our experiments. The value of σ2 should be obtained to guarantee that θR changes exactly to θRC . Therefore, we obtain the value of σ2 by substituting r = 3σC , H  (p) = θRC and H(p) = θR into (8). For image color conceptualization, we should choose a certain color concept from existing concepts (here we cluster six color concepts using the CVCL database [25] as the image library). Figure 2 shows two examples of image color conceptualization. For the first picture, the change in color concept implies a change in season, because leaves can be yellow in autumn and tend to be green in spring. The different colors in the second picture suggest different weather. Figure 8 shows another natural scene conceptualized using our method. As we improve Hou and Zhang’s method [12] by taking spatial information into consideration, our proposed method performs better in some cases, especially when there are color differences in the same region of an object. In Figure 9, the magnified images show the performance

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:9

(a)

(b)

(c)

Figure 9 (a) Three input images; (b) output images obtained using Hou and Zhang’s image color conceptualization [12]; (c) output images obtained using our image color conceptualization method.

(a)

(b)

(c)

Figure 10 (a) The input image of a crocus artwork and its hue wheel representation; (b) the effect of coloring the crocus yellow and the hue wheel representation of the output image; (c) the effect of coloring the crocus green and the hue wheel representation of the output image.

(a)

(b)

(c)

(d)

(e)

Figure 11 (a) Hue wheel representation of the three frames in (b) and the whole video; (b) three frames of the input video; (c), (d) and (e) are the resulting frames obtained using Hou and Zhang’s image method [12], our image method and our video method.

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:10

(a)

(b)

(c)

Figure 12 Three groups of video color conceptualization results. In each group, the upper row shows five frames of the input video, and the lower row shows the output.

improvement over to the existing method. Moreover, this technique is applicable not only to the field of image processing, but also to the previewing of artwork coloring. An example is shown in Figure 10. Experiments further demonstrate that our method performs well for video. Simply applying Hou and Zhang’s image color conceptualization [12] to each frame individually leads to color discontinuity and flickering as demonstrated in Figure 5(b). For a better view, see our supplemental video material. Since we take the temporal information into account, the results obtained using our method as shown in Figure 5 (c) and (d) are significantly better. Figure 11 presents more comparisons not only between our video color conceptualization and Hou and Zhang’s image method for individual video frames, but also between our video color conceptualization and our new image method applied to video frames individually. This comparison allows can help us to observe the role of temporal information in overcoming the flickering problem and shows the advantage of the video method. Figure 11(b) shows frames of an input video, and Figure 11(a) shows the hue wheel representation of the three frames and the whole video. We see a difference in the hue wheel representation between two frames. Then Figure 11 (c), (d) and (e) shows three groups of frames of the resulting video obtained using Hou and Zhang’s image method, our image method only considering the spatial information and our video method considering both spatial and temporal information. Some artifacts are observed in the magnified in views of (c) and (d). Figure 12 shows other video examples. Color conceptualization can be applied in many fields, such as image and video processing, advertising and music television processing, and mood consistency regulation in image cut and paste.

5

Discussion and conclusion

We proposed an image color conceptualization method based on an existing method [12] and an optimization algorithm [8], and expanded it to video processing. Our main contributions include taking the spatial information into account to improve color continuity, and expanding our image-based method to

Cao X C, et al.

Sci China Inf Sci

July 2014 Vol. 57 072114:11

video color conceptualization by enforcing spatio-temporal consistency. Experiments carried out for both images and videos demonstrated the performance of our proposed method. References 1 Csurka G, Skaff S, Marchesotti L, et al. Building look & feel concept models from color combinations. Vis Comput, 2011, 27: 1039–1053 2 Welsh T, Ashikhmin M, Mueller K. Transferring color to greyscale images. ACM Trans Graph, 2002, 21: 277–280 3 Irony R, Cohen-Or D, Lischinski D. Colorization by example. In: Proceedings of the 16th Eurographics Conference on Rendering Techniques. Switzerland: Eurographics Association Aire-la-Ville, 2005. 201–210 4 Charpiat G, Hofmann M, Scholkopf B. Automatic image colorization via multimodal predictions. In: Proceedings of the 10th European Conference on Computer Vision. Berlin/Heidelberg: Springer-Verlag, 2008. 126–139 5 Reinhard E, Ashikhmin M, Gooch B, et al. Color transfer between images. IEEE Comput Graph Appl, 2001, 21: 34–41 6 Liu X P, Wan L, Qu Y G, et al. Intrinsic colorization. ACM Trans Graph, 2008, 27: 152–152 7 Chia A, Zhuo S J, Gupta R, et al. Semantic colorization with Internet images. ACM Trans Graph, 2011, 30: 156–156 8 Levin A, Lischinski D, Weiss Y. Colorization using optimization. ACM Trans Graph, 2004, 23: 689–694 9 Yatziv L, Sapiro G. Fast image and video colorization using chrominance blending. IEEE Trans Image Process, 2006, 15: 1120–1129 10 Cohen-Or D, Sorkine O, Gal R, et al. Color harmonization. ACM Trans Graphics, 2006, 25: 624–630 11 Tang Z, Miao Z J, Wan Y L, et al. Color harmonization for images. J Electron Imag, 2011, 20: 023001 12 Hou X D, Zhang L Q. Colour conceptualization. In: Proceedings of the 15th ACM International Conference on Multimedia. New York: ACM, 2007. 265–268 13 Xu M D, Ni B B, Tang J H, et al. Image re-emotionalizing. In: Jin J S, Xu C S, Xu M, eds. The Era of Interactive Media. Berlin: Springer, 2013. 3–14 14 Lee Y, Kim J, Grauman K. Key-segments for video object segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Barcelona, 2011. 1995–2002 15 Zhang B, Zhao H D, Cao X C. Video object segmentation with shortest path. In: Proceedings of the 20th ACM International Conference on Multimedia. New York: ACM, 2012. 801–804 16 Hanbury A. Constructing cylindrical coordinate colour spaces. Patt Recognition Image Process Group, 2008, 29: 494–500 17 Liu Y, Zhang D S, Lu G J, et al. Region-based image retrieval with high-level semantic color names. In Proceedings of IEEE 11th International Multi-Media Modelling Conference, Melbourne, 2005. 180–187 18 Goldstein E. Sensation and perception, 5th ed. Brooks/Cole, 1999 19 Berk T, Brownston L, Kaufmann A. A new color-naming system for graphics languages. IEEE Comput Graph Appl, 1982, 2: 37–44 20 Weiss Y. Segmentation using eigenvectors: a unifying view. In: Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, 1999. 975–982 21 Lee H, Yu J, Im Y, et al. A unified scheme of shot boundary detection and anchor shot detection in news video story parsing. Multimed Tools Appl, 2011, 51: 1127–1145 22 Amudha J, Radha D, Naresh P. Video shot detection using saliency measure. Int J Comput Appl, 2012, 45: 17–24 23 Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans Patt Anal Mach Intell, 2000, 22: 888–905 24 Morovic J, Luo M. The fundamentals of gamut mapping: a survey. J Imag Sci Technol, 2001, 45: 283–290 25 Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072115:1–072115:19 doi: 10.1007/s11432-014-5112-x

Fractional partial differential equation denoising models for texture image PU YiFei1 ∗ , SIARRY Patrick2 , ZHOU JiLiu1 , LIU YiGuang1 , ZHANG Ni3 , HUANG Guo4 & LIU YiZhi5 1School 2Universit´ e

of Computer Science and Technology, Sichuan University, Chengdu 610065, China; de Paris 12 (LiSSi, E.A. 3956) 61 av. du G´ en´ eral de Gaulle, 94010 CRETEIL Cedex, France; 3Library of Sichuan University, Sichuan University, Chengdu 610065, China; 4Computer Science College, Leshan Normal University, Leshan 614000, China; 5Wu Yuzhang Honors College of Sichuan University, Chengdu 610065, China Received July 9, 2013; accepted September 12, 2013; published online May 9, 2014

Abstract In this paper, a set of fractional partial differential equations based on fractional total variation and fractional steepest descent approach are proposed to address the problem of traditional drawbacks of PM and ROF multi-scale denoising for texture image. By extending Green, Gauss, Stokes and Euler-Lagrange formulas to fractional field, we can find that the integer formulas are just their special case of fractional ones. In order to improve the denoising capability, we proposed 4 fractional partial differential equation based multiscale denoising models, and then discussed their stabilities and convergence rate. Theoretic deduction and experimental evaluation demonstrate the stability and astringency of fractional steepest descent approach, and fractional nonlinearly multi-scale denoising capability and best value of parameters are discussed also. The experiments results prove that the ability for preserving high-frequency edge and complex texture information of the proposed denoising models are obviously superior to traditional integral based algorithms, especially for texture detail rich images. Keywords fractional Green formula, fractional Euler-Lagrange equation, fractional steepest descent approach, fractional extreme points, fractional total variation, fractional differential mask. Citation Pu Y F, Siarry P, Zhou J L, et al. Fractional partial differential equation denoising models for texture image. Sci China Inf Sci, 2014, 57: 072115(19), doi: 10.1007/s11432-014-5112-x

1

Introduction

Integer order partial differential equation based image processing is an important branch in the field of image processing. In the first, it belongs to low level image processing and its results are often taken as intermediate results for futher processing by other image processing approaches. In the second, with deeper studying on the approach and learning more about the essential quality of image and image processing, people intend to improve traditional image processing approaches by strictly mathematical theories. It is no denying that it is a great challenge to practical-oriented traditional image processing methods. ∗ Corresponding

author (email: puyifei [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:2

It is known that image denoising is a significant research subject in integer-order partial differential equation based image processing, which have two kinds denoising approaches: nonlinear diffusion based method and minimum energy norm based variational method [1–4]. In constrast, they have two corresponding models, which are anisotropic diffusion denoising model proposed by Perona et al. [5] (for short: PM denoising) and total variation proposed by Rudin, Osher et al. [6] (for short: ROF denoising). Here, PM model simulaetes the denoising process by thermal energy diffusion and the denoised result is the balance state of thermal diffusion, while ROF model describes the above-mentioned thermal energy by total variation. For further study, some researcher applied PM model and ROF model for color images [7,8], they discussed how to choose the parameters in the models [9–13] and get optimal stopping point in iteration process [14,15]. Rudin [6] proposed variable time step method to solving Euler-Lagrange equation. Vogel et al. [16] proposed to improve the stability of ROF model by fixed point iteration approach. Dobson et al. [17] Vogel proposed to modify total variation form in order to ensuring the convergence of numerical calculation of ROF model. Chambolle [18] proposed dual formula based fast algorithm. Darbon et al. [19–21] decomposed original problems to independent optimal problems of Markov Random Fields by using level set methods and got globally optimal solution by reconstruction. Some proposed to solve total variation by iterative weighted norm for improving computing efficiency [22]. Catte [23] proposed to do Gaussian smoothing in the first and make PM model suitable. The shortcomings of PM model and ROF model lie in that they are easy to loss contrasting information and texture information and produce staircase effects [1,24,25]. Some improved models are proposed for solving the above problems. In order to keep the contrasting information and texture information, for example, someone proposed to replace L2 norm by L1 norm [26–29], Osher [30] proposed an iterative regularization method. Gilboa et al. [31] proposed a denoising method by using numerical adaptive fidelity term that can change with space. Esedoglu et al. [32] proposed to decomposition images by the anisotropic Rudin-Osher-Fatemi model and keep certain edge directional information. In order to removing staircase effect, Blomgren [33,34] proposed to extending total variation denoising model by changing with grads, and someone proposed to introduce high-order derivative to energy norm function [35–40], and someone proposed to integrate high-order deductive to original ROF model [41,42], and someone proposed two stage denoising algorithm—smoothing the corresponding vector field first and then fitting it by curve surface [43,44]. The above-mentioned methods have some improvement on keeping contrasting information and texture information and removing staircase effect, however, they still have some drawbacks. Firstly, the improved algorithms greatly increased calculation complexity. At least, it is in real-time Processing, however, excessive storage and computational will lead it to be infeasible. Secondly, above algorithms are integer-order based approaches in essence, and thus they may lead edge field to be somewhat fuzzy and texture-preserving effect not as well as we expected. So, we propose to introduce a newly mathematical method–fractional calculus to denoising field for texture image and to implement fractional partial differential equations, that is, a set of fractional total variation and steepest descent approach based multi-scale denoising modes are proposed. They can preserve the low-frequency contour feature in the smooth area to the furthest degree, and nonlinearly keep high-frequency edge information and texture information in those areas where grey scale changes frequently, and as well as, nonlinearly enhance texture details in those areas where grey scale does not change evidently. In recent 300 years, fractional calculus has experienced the increasing interest and been an important branch of mathematics analyses [45–48], however, it is still seldom known by many mathematicians and physical scientists in engineering fields both from domestic and overseas. In general, fractional calculus of Euclidean measure extends integer step to fractional step, and Euclidean measure is required in mathematics [49,50]. Moreover, the random variable in physical process can be deemed as displacement of particles random movement. So, in Euclidean measure sense, fractional calculus can be used for analyzing and processing physical state and process [51–59]. The fractional calculus has the obvious feature, that is, most fractional calculus is power function, and the other is the superposition or product of certain function and power function [45–50]. Does such feature pre-show some changing law of nature? Scientific research has proved that fractional order or dimensional approach is the best description for many natural phenomena [60–63]. At present, fractional calculus of Euclidean measure has been used to

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:3

many fields such as diffusion process, viscoelasticity theory, and random fractal dynamics, however, its major application still focuses on describing transient state of physical change and seldom involves the systemic evolution process [51–63]. How to apply fractional calculus to modern signal analyzing and processing [62–75], especially to digital image processing [76–83], is an emerging branch, which has been seldom studied. Guidotti et al. [84], Bai et al. [85] respectively pushed the classical anisotropic diffusion model to fractional field, extended gradient operator of energy norm from first-order to fractional order, and numerical implemented fractional partial differential in frequency domain, which has some effects on image denoising. However, the algorithm still has some drawbacks. In the first, it just took gradient operator of energy norm from first order to fractional order and still did not essentially solve the problem how to nonlinearly keep texture details by energy norm during anisotropic diffusion. So the texture information is not well kept after denoising. In the second, the algorithm has not discussed the effect of fractional power of energy norm and fractional extreme value on nonlinearly keeping texture details. In the third, the method did not deduct the corresponding Euler-Lagrange equation according to fractional calculus features, and directly replaced it according to complex conjugate transpose features of Hilbert adjoint operator. It greatly increased the complex of numerical implementing of fractional partial equation in frequency field. In the last, v transition function of fractional calculus in Fourier transform domain is (iω) . Its form looks simple, but its Fourier’s inverse transformation belongs to the first kind Euler integral that is very difficult for calculation in theory. The algorithm just simply took first-order difference to fractional order difference in form in frequency domain and replaced fractional differential operator, which did not solve the calculation problem of the first kind Euler integral. There is earlier work by Tarasov putting forward fractional Green formula [86], but it is not an one. To the above problem, we study the features of fractional calculus. Contrary to integral differential, fractional differential of direct current or low-frequency signal is usually nonzero. Fractional calculus can nonlinearly enhance the complex texture details of fractallike structure [67,78–83] On the basis of the above feature and from the view of system evolution, we implement a fractional partial differential equation based multi-scale denoising model. When denoising, it can preserve the low-frequency contour feature in the smooth area to the furthest degree, and nonlinearly keep high-frequency edge information and texture information in those areas where grey scale changes frequently, and as well as, nonlinearly enhance texture details in those areas where grey scale does not change evidently.

2

2.1

Implement a set of fractional partial differential equations based denoising models for texture image Deduce fractional Green formula

It is known that fractal theory has greatly changed the traditional measure view, because fractal geometry denies the existence of Newton-Leibniz derivative. Hausdorff measure based fractal theory is still an incomplete mathematical theory though it experiences more than 90 years development. Until now theory construction of fractional calculus of Hausdorff measure is not established and far from perfect. Among the whole family of Hausdorff measure, fractional calculus of Euclidean measure is the comparatively completed. Thus, Euclidean measure is widely used in mathematics. The commonly used Euclidean measure is Gr¨ umwald-Letnikov definition, Riemann-Liouville definition and Caputo definition [45–50]. In order to implement fractional partial differential equation for texture image denoising, we must implement fractional Euler-Lagrange formula first, while the prerequisite for fractional Euler-Lagrange formula is to implement Green formula. Therefore, we should extend traditional Green formula from integer-order to fractional order and deduct fractional Green formula. Suppose Ω is simply-connected plane region taking piecewise smooth curve C as boundary, differintegrable function P(x, y) and Q(x, y) are continuous in Ω and C, and fractional continuous partial derivative for x and y is existed. Suppose D1 is first-order differential operator, Dv represents v-order fractional differential operator, I 1 = D−1 denotes first-order integral operator and I v = D−v denotes

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:4

v-order fractional integral operator when v > 0. Note Ixv Iyv represents v-order curve surface integral Ω

operator in plane Ω, which extends Riemann-Liouville definition (2) of fractional integral from onedimensional to two-dimensional. Iv is v-order curve integral operator in AC1 B section of curve C C(AC1 B)

along the direction of

−→ AC1 B.

Iv C−

is v-order fractional closed curve integral operator in closed curve C

along counter-clockwise direction. So, we can deduct fractional Green formula as  '  ( Ixv2 Iyv2 Dxv1 Q(x, y) − Dyv1 P(x, y) = Ixv2 Dyv1 −v2 P(x, y) − Dyv1 −v2 [P(x, y) − Dy−v1 Dyv1 P(x, y)] C−

Ω

' ( + Iyv2 Dxv1 −v2 Q(x, y) − Dxv1 −v2 [Q(x, y) − Dx−v1 Dxv1 Q(x, y)] . (1) C−

In special case when Dv1 and D−v1 is reciprocal, that is ϕ − D−v1 Dv1 ϕ = 0, it has Dv1 Dv2 ϕ = Dv1 +v2 ϕ. The requirement is high and it is difficult to satisfy. So under the special case, fractional Green formula can be simplified as   Ixv2 Iyv2 Dxv1 Q(x, y) − Dyv1 P(x, y) = Ixv2 Dyv1 −v2 P(x, y) + Iyv2 Dxv1 −v2 Q(x, y). C−

Ω

(2)

C−

  When v1 = v2 = 1, it has Ix1 Iy1 Dx1 Q(x, y) − Dy1 P(x, y) = Ix1 P(x, y) + Iy1 Q(x, y). And when v1 = v2 = C− Ω C−  v  v v v v v v, it has Ix Iy Dx Q(x, y) − Dy P(x, y) = Ix P(x, y) + Iy Q(x, y) [86]. The above evidences show that C−

Ω

C−

traditional integer-order Green formula is only the special case of fractional Green formula. 2.2

Deduce fractional Euler-Lagrange formula for two-dimensional image processing

There are some earlier studies on fractional Euler-Lagrange calculus, but they are not fit for image processing [87,88]. In order to solve this problem, we can further deduct fractional Euler-Lagrange formula for two-dimensional image based on the above fractional Green formula. Suppose differintegrable number distribution function in two-dimensional space is u(x, y) and differintegrable vector function is ∂v ∂v v v v v ϕ(x, y) = iϕx +jϕy . v-order fractional differential operator is Dv = i ∂x v +j ∂y v = iDx +jDy = (Dx , Dy ). v 0 Here, D is a kind of linear operator. When v = 0, D represents neither differential also not integral, which is an identity operator, where i and j respectively means unit vector in x- and y-direction. Thanks to fractional Riemann-Liouville definition and (1) we can get the corresponding fractional Euler-Lagrange formula of Ixv2 Iyv2 Dv1 u · ϕ = 0 as Ω

 −

v1 1





 Γ(1 − v1 )  1  Dx1 ϕx + Dy1 ϕy = Dx ϕx + Dy1 ϕy = 0. Γ(−v1 )

(3)

Furthurly if Φ1 (Dv1 u) is the number distribution function of vector function Dv1 u, and Φ2 (ϕ) is number distribution function of differintegrable vector function ϕ(x, y) = iϕx + jϕy . Similarly, we can get the corresponding fractional Euler-Lagrange formula of Ixv2 Iyv2 Φ1 (Dv1 u)Φ2 (ϕ)Dv1 u · ϕ = 0 as Ω

 −

v1 1



 1  Γ(1 − v1 )  1  Dx (Φ2 (ϕ)ϕx ) + Dy1 (Φ2 (ϕ)ϕy ) = Dx (Φ2 (ϕ)ϕx ) + Dy1 (Φ2 (ϕ)ϕy ) = 0. Γ(−v1 )

(4)

v Since for any v-order fractional calculus, it has Dx−a [0] ≡ 0, fractional Euler-Lagrange formula (3) and (4) is irrelevant to order v2 of fractional surface integral Ixv2 Iyv2 . Therefore, when we discuss below Ω

the energy norm of fractional partial differential equation model for texture image denoising, first-order surface integral Ix1 Iy1 is used instead of fractional surface integral Ixv Iyv . Ω

Ω

Pu Y F, et al.

2.3

Sci China Inf Sci

July 2014 Vol. 57 072115:5

Implement fractional steepest descent approach and discuss its stability and convergence rate

In order to implement fractional partial differential equation—fractional total variation and steepest descent approach based multi-scale denoising models according to previous fractional Euler-Lagrange formula, we need further solve the fractional minimum of fractional variation based energy norm instead of traditional integral minimum. Thus, we must extend integer-order steepest descent approach to fractional field. To demonstrate the difference between fractional steepest descent approach and integral one, we still 2 1 take first-order extreme value to implement energy norm, which is E = Emin + η s1∗ − s , where η is 1 is the constant that controls the degree of surface concavo and convex of parabola, and it has η = 0. Emin first-order extreme value of E, E is secondary energy norm and its performance curve is a parabola that is often unknown. Suppose, we do incremental search along the negative direction of v-order fractional   gradient by fractional steepest descent algorithm when v > 0, it has sk+1 = sk + μ −Dsvk , where k is step or iteration, sk is the current adjusted value of s, sk+1 is the new adjusted value, Dsvk is v-order fractional gradients of energy norm E at the point of s = sk , μ is the constant coefficient parameter that controls the stablity and convergence rate. Suppose first-order extreme point or stationary point 1 , v-order fractional extreme point or fractional stationary point corresponds of E corresponds to Emin v 1∗ v to Emin , s is first-order optimal independent variable that corresponds to Emin , and sv∗ is fractional v optimal independent variable that corresponds to Emin . s0 is initial guess value of s when it is iteratively searched by fractional steepest descent method, which is random choose. So we can deduct that Dsvk is non-constant coefficient and nonlinear equation about sk , which is difficult to get the analytical expression about v directly. When it iteratively searched to stable station, that is sk+1 = sk = sv∗ , it has Dsvk = 0 v and E = Emin . Further we know v-order fractional extreme point of energy norm E may not be the only one, in addition, the two fractional extreme points with same order is often dissymmetry about s1∗ . 1∗ Especially when it has the only v-order extreme value, the relation between sv∗ and s1∗ is sv∗ = Γ(3−v)s 2Γ(2−v) , when v = 1, 2 and 3. Then we can get the dynamic property of search process from initial value s0 2μη to optimal value sv∗ is sk+1 = sk − Γ(3−v) (sk − sv∗ )2 s−v k , when v = 1, 2 and 3. It is a nonlinear and non-ordinary difference equation and its general term of sk can not be deducted by mathematic inductive method from former iterations. In order to simplify nonlinear calculation, we take sk as discrete sampling of continuous function s(t) about time t, then first-order difference can be approximate fistorder differential, that is Dt1 s(t) ∼ = sk+1 − sk . In addition, we can get the power exponent expansion of sv ∞ Γ(1+v)(sv∗ )v−n (s−sv∗ )n v is s = n=0 . For further simplifying nonlinear calculation, we take the sum of v Γ(n+1)Γ(1−n+v) v−1 respectively multiplying by item n = 0 and item n = 1 as the approximate sv , which is sv ∼ s, = v (sv∗ ) and we solve it by separating variables method. Then s(t) takes discrete sampling for time t, we get general term of sk as   −2μηk v∗ ∼ sk = s + exp , when v = 1, 2 and 3. (5) v−1 Γ(3 − v)v (sv∗ ) Eq. (5) shows that in order to make it convergence in iterative search process, it must satisfy limk→+∞ 2μηk v−1 2μη /(Γ(3 − v)v (sv∗ ) ) = +∞, that is Γ(3−v)v(s v∗ )v−1 = χ > 0, where χ is positive constant. So constant coefficient parameterμ satisfies μ∼ =

v−1

χΓ(3 − v)v (sv∗ ) 2η

, when v = 1, 2 and 3.

(6)

It is known that the relation between sk and k is geometric series by integral steepest descent method [89]. According the interative search process is divided into three types: over-damped oscillation, critical damped oscillation and under-damped oscillation [89]. Contrary to integral method, we know from (6) that the relation between sk and k by fractional steepest descent method is not geometric series, which is approximated negative power series relation. When χ = 1, it has μ ∼ = Γ(3 − v)v(sv∗ )v−1 /(2η), when v = 1, 2 and 3, put it into (6) and then it has sk ∼ = sv∗ + exp (−k), when v = 1, 2 and 3. The interative search

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:6

process of fractional steepest descent method converges as first exponential power of exp (−k) . When v−1 χ = 2, it has μ ∼ /η, when v = 1, put it into (6) and then it has sk ∼ = Γ(3 − v)v (sv∗ ) = sv∗ + exp (−2k), when v = 1, 2 and 3. Its interative search process converges as second exponential power of exp (−k). When χ is integer and χ  3, its interative search process converges as high order integer exponential power of exp (−k). When n − 1 < χ < n, its interative search process converges as fractional power exponent of exp (−k). In general, fractional steepest descent method in multi-dimensional quadratic surface expands the single variable to multi-dimension space. Its multi-dimensional energy norm and 1 + η(s1∗ − s)2 incremental search process of fractional steepest descent method is respectively E = Emin v and sk+1 = sk +μ(−Dsk ). Here, the constant coefficient parameter μ is one-dimensional multi-dimensional v−1 vector, which is μ ∼ /(2η)}, when v = 1, 2 and 3. = {χΓ(3 − v)v (sv∗ ) 2.4

Implement the first fractional partial differential equation based denoising model

Guidotti et al. [84], Bai et al. [85] have respectively pushed the classical anisotropic diffusion model to fractional field, extended gradient operator of energy norm from first-order to fractional-order. It has some effects on image denoising by numerical implement of fractional partial differential in frequency domain, however, the algorithm still has some drawbacks. First, it did not discuss the effect of fractional power of energy norm for nonlinearly preserving texture details. Second, the method did not deduct the corresponding Euler-Lagrange equation according to fractional calculus features, and directly replaced it according to complex conjugate transpose features of Hilbert adjoint operator. It greatly increased the complex of numerical implementing of fractional partial equation in frequency field. Third, the algorithm just simply took first-order difference to fractional order in form in frequency domain and replaced fractional differential operator, which did not solve the calculation problem of first kind Euler integral. To solve the above-mentioned problems, we implement fractional power of energy norm and apply properties of fractional calculus to deduct fractional Euler-Lagrange formula, so we implement the first kind fractional partial differential equation based multi-scale denoising model, which can do simple numerical implementation in time domain. Suppose s(x, y) represents the gray value of pixel (x, y), where Ω ⊂ R2 is image region and it has (x, y) ∈ Ω. Suppose s(x, y) stands for degraded image contaminated by noise, and s0 (x, y) denotes the desired clean image. When the noise is multiplicative noise, we can transform it to additive noise by logarithmic transformation; when the noise is convolution noise, we can transform it to additive noise by frequency-domain transformation and logarithmic transformation. Without loss the generality, we assume n(x, y) represents the additive noise, which has s(x, y) = s0 (x, y) + n(x, y). Similar to fractional variation of image s is  δ-cover of Hausdorff measure [90,91], we suppose fractional v v |Dv1 s| 2 = ( (Dxv1 s)2 + (Dyv1 s)2 )v2 , where v2 is any real power and |Dv1 s| 2 is hyper cube measure. As above mentioned, since fractional Euler-Lagrange formula (3) and (4) have no relation with order of fractional surface integral, we still adapt first-order surface integral to implement energy norm. Suppose  v v the energy norm of fractional variation is EFHTV (s) = Ix1 Iy1 [f (|Dv1 s| 2 )] = Ω f (|Dv1 s| 2 ) dxdy, where Ω

Ω ⊂ R2 ((x, y) ∈ Ω) is the image field of s(x, y). We suppose s is first-order extremal surface of EFHTV , test function ξ(x, y) ∈ C0∞ (Ω) is admissible surface neighbouring extremal surface. Merging s, ξ and small parameter β, we have surface family s + βξ, when β = 0 it is extremal surface. So, anisotropic diffusion can be explained as first-order minimized energy diffusion process for solving fractional energy  norm.Suppose Ψ1 (β) = EFHTV (s + βξ) and Ψ2 (β) = En (s + βξ) = Ω λ2 (s + βξ − s0 )2 dxdy, where σ 2 = Ω (s − s0 )2 dxdy is the variance of image noise n(x, y), En (s) = Ω λ2 (s − s0 )2 dxdy = λ2 σ 2 is fidelity item, and λ is regularized parameter. We define fractional total differential based fractional energy norm in surface family s + βξ as % & λ 1 1 v1 v1 v2 2 Ψ(β) = Ψ1 (β) + Ψ2 (β) = Ix Iy f (|D s + βD ξ| ) + (s + βξ − s0 ) 2 Ω & )) % λ = f (|Dv1 s + βDv1 ξ|v2 ) + (s + βξ − s0 )2 dxdy. (7) 2 Ω

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:7

If the first-order derivatives of Ψ1 (β) and Ψ2 (β) exist, we get first-order minimum by traditional first-order steepest descent method, which can be expressed as + Γ(1 − v1 ) * 1 v1 v2 −2 v1 ∂s =− v2 Dx |D s| Dx s + Dy1 |Dv1 s|v2 −2 Dyv1 s − λ(s − s0 ). (8) ∂t Γ(−v1 )   We must compute λ(t). If image noise n(x, y) is white noise, it has Ω n(x, y)dxdy = Ω (s − s0 )dxdy = 0. When ∂s/∂t = 0, Eq. (8) can converge to stability state. We merely multiply by (s − s0 ) at both sides of (8) and integrate over Ω, and the left side of (8) vanishes. We then have λ(t) =

−Γ(1 − v1 ) v2 σ 2 Γ(−v1 )

)) * + v −2 v −2 Dx1 |Dv1 s| 2 Dxv1 s + Dy1 |Dv1 s| 2 Dyv1 s (s − s0 )dxdy.

(9)

Ω

Note we mark fractional partial differential equation based denoising model in (8) and (9) as YiFeiPU-1 . As we know, fractional calculus has the feature that it can preserve the low-frequency contour feature in the smooth area to the furthest degree, and nonlinearly keep high-frequency edge information and texture information in those areas where grey scale changes frequently, and as well as, nonlinearly enhance texture details in those areas where grey scale does not change evidently [78–83]. Therefore, keeping the texture details is contradictory to completely filtering the faint noise left in low-frequency and direct current parties when denoising. YiFeiPU-1 denoising can preserve texture details to the greatest extent, but it is difficult to remove the faint noise in low-frequency and direct parties. In order to remove the faint noise, the most intiutive approach is to reduce its convex in the area where gradient does not change greatly. So, when numerical iterative implementing, we need do low-pass filter at low-frequency and direct current parties. Fractional differential orderv1 and fractional exponential power v2 of fractional total variation model nonlinearly adjusted denoising effects. So we know from (7)–(9) that, when v1 = 1 and v2 = 1 or v2 = 2, YiFeiPU-1 denoising becomes fractional anisotropic diffusion for image denoising model [84,85]; when v1 = 1 and v2 = 1, YiFeiPU-1 becomes traditional PM model and ROF model [5,6]; when v1 = 1 and v2 = 2 or v2 = 4, YiFeiPU-1 denoising becomes high-order PM model and ROF mode [35–42]. 2.5

Implement the second fractional partial differential equation based denoising model

As for those applications with lower requirements for accuracy, we can directly extend first-order EulerLagrange formula to fractional field in form by utilizing the feature that fractional differential operator can be replaced by first-order one. Therefore, we can implement a proximated calculation model for YiFeiPU-1 denoising, that is the second kind fractional partial differential equation based multi-scale denoising model, which has high speed and low accuracy fractional variation is |Dv1 s| =  In vYiFeiPU-1v denoising, suppose v2 = 1 and f (η) = η, we have  1 1 2 2 (Dx s) + (Dy s) and its fractional total variation is EFTV (s) = Ω |Dv1 s| dxdy. Thanks to the theory of Tikhonov Regularization [92], we assume energy norm of fractional variation is ))   λ Ψ(s) = F x, y, s, Dxv1 s, Dyv1 s dxdy = (s − s0 )2L2 (Ω) + EFTV (s) 2  ) )Ω  1  λ (s − s0 )2 + (Dxv1 s)2 + (Dyv1 s)2 2 dxdy, = (10) 2 Ω   where σ 2 = Ω (s − s0 )2 dxdy is the variance of noise n(x, y), λ2 (s − s0 )2L2 (Ω) = Ω λ2 (s − s0 )2 dxdy = λ2 σ 2 is fidelity item and λ is regularized parameter. Since fractional differential is the continuous interpolation of its integer order fractional calculus, fractional differential operator can be linearly expressed by firstn  f (−1)n (x−a)n−v1 Dx−a v1 f = ∞ . Taking Dv1 as order differential operator [46,58,83]. It can prove Dx−a n=0 Γ(−v1 )(n−v1 )n! the function of D1 , that is Dv1 s = ψ(D1 s). Due that Dv1 and D1 are linear operators in essence, function ψ has inverse function, that is D1 s = ψ −1 (Dv1 s). Due that Dv1 and D1 are linear cooperators in essence, function ψ has inverse function, that is D1 s = ψ −1 (Dv1 s). To simplify calculation, we directly extend first-order Euler-Lagrange formula to fractional one in form, then we have the approximated fractional

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:8

Euler-Lagrange formula for Dv1 . It can not avoid errors, but it is simple and convenient. Similarly we solve (10) by first-order steepest descent method, then we have %  v1   v1 & Dy s Dx s ∂s 1 1 = Dx (11) + Dy − λ(s − s0 ). v 1 ∂t |D s| |Dv1 s|   We must compute λ(t). If image noise n(x, y) is white noise, it has Ω n(x, y)dxdy = Ω (s − s0 )dxdy = 0. When ∂s/∂t = 0, it converges to stable state. We merely multiply by (s − s0 ) in both sides of (11) and integrate over Ω, the left side of (11) vanishes. We then have  v1 & ) ) %  v1  Dy s Dx s 1 1 1 + Dy (s − s0 )dxdy. Dx (12) λ(t) = 2 v1 s| v1 s| σ |D |D Ω Note we mark the fractional partial equation based denoising model in (11) and (12) as YiFeiPU-2 denoising. Similar to YiFeiPU-1 denoising, in order to completely filter the faint noise in low-frequency and direct current parties, we also need do low-pass filter when numerical iterative implementation. 2.6

Implement the third fractional partial differential equation based denoising model

From the above discussion, we know that fractional anisotropic diffusion model for image denoising proposed by Guidotti et al. [84] and Bai et al. [85] have the following drawbacks besides that mentioned in YiFeiPU-1 denoising. Firstly, it just takes the gradient operator of energy norm from first order to fractional one and it is still not essentially solving the problem on how to nonlinearly keep texture details during anisotropic diffusion. So the texture information is not well kept after denoising. Secondly, the algorithm has not discussed the effect of fractional extreme value of energy norm on nonlinearly preserving texture details. To solve the above problems, we firstly refer to the properties of fractional calculus and take the gradient operator of energy norm from first order to fractional one, and also implement fractional total variation energy norm by using fractional exponential power and fractional extreme value. Then we get fractional extreme value for the energy norm instead of first-order one. In addition, we deduct the corresponding fractional Euler-Lagrange formula according to the properties of fractional calculus, which greatly simplifies the numerical calculation complexity. Lastly, we implement the third kind fractional partial differential equation based multi-scale denoising model by fractional steepest descent method. If implement energy norm directly by fractional extreme value, we assume curve surface family is s + (β − 1)ξ. When β = 1, it is v3 -order extremal surface s. Suppose Ψ1 (β) = EFHTV [s + (β − 1)ξ]  and Ψ2 (β) = Ω λ[s + (β − 1)ξ − s0 ]s0 dxdy, where Ψ2 (β) is the integrated energy between noise signal [s + (β − 1)ξ − s0 ] and clean signal s0 , which is the measure for similarity between denoised image and original clean image between [s + (β − 1)ξ − s0 ] and s0 . Solving minimum value of Ψ2 (β) process can be deemed as the process for getting the minimal similarity between denoised image and original clean image between noise and clean signal. In denoising Ψ2 (β) has nonlinear fidelity role and λ is regularized coefficient. So fractional total variation based fractional energy norm defined on s + (β − 1)ξ can be given by v

Ψ(β) = Ψ1 (β) + Ψ2 (β) = Ix1 Iy1 [f (|Dv1 s + (β − 1)Dv1 ξ| 2 ) + λ[s + (β − 1)ξ − s0 ]s0 ] )) =

Ω v

[f (|Dv1 s + (β − 1)Dv1 ξ| 2 ) + λ[(s + (β − 1)ξ − s0 ]s0 ] dxdy.

(13)

Ω

If v3 -order fractional derivatives of Ψ1 (β) and Ψ2 (β) exist, we take v3 = 1, 2 and 3 and solve fractional minimum of (13) by v3 -order fractional descent method, which has ∞

∂ v3 s −Γ(1 − v1 )  = v ∂t 3 Γ(−v1 )Γ(−v3 )

2k

+ − τ + 1) * 1 v1 v2 −2k−2 v1 Dx |D s| Dx s + Dy1 |Dv1 s|v2 −2k−2 Dyv1 s (2k)!

τ =1 (v2

k=0

λs0 , − Γ(1 − v3 )Γ(2 − v3 )

(14)

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:9

n n=0 where it has τ =1 (v2 − τ + 1) == 1. We must compute λ(t). When image noise n(x, y) is white noise,   it has Ω n(x, y)dxdy = Ω (s − s0 )dxdy = 0. When ∂ v3 s/∂tv3 = 0, it converges to stable state. We merely multiply by (s − s0 )2 at both sides of (14) and integrate by parts over Ω, the left side of (14) vanishes. We then have ))  ∞ 2k * −Γ(1 − v1 )Γ(1 − v3 )Γ(2 − v3 ) 1 v1 v2 −2k−2 v1 τ =1 (v2 − τ + 1) |D λ(t) = D s| D s x x σ 2 Γ(−v1 )Γ(−v3 )s0 (2k)! Ω k=0 + v −2k−2 Dyv1 s (s − s0 )2 dxdy. (15) +Dy1 |Dv1 s| 2 We note the fractional partial differential equation based denoising model in (14) and (15) as YiFeiPU-3 denoising. Similar to YiFeiPU-1 denoising, in order to completely remove the faint noise of low-frequency and direct current parties, we need do low-pass filter when numerical iterative implementation. By analyzing (14), (15) and (8), (9), we know that comparing to YiFeiPU-1 denoising, YiFeiPU-3 denoising not only increases the nonlinear adjustment effect of order v2 on denoising by continually multiplying  v1 function 2k τ =1 (v2 − τ + 1) and power v2 − 2k − 2 of |D s|, but also increases the nonlinear adjustment effect of v3 by increasing Γ(−v3 ) in denominator. In addition, from (14) we know, when v3 = 0, YiFeiPU-3 is traditional potential equation or elliptic equation; when v3 = 1, YiFeiPU-3 is traditional heat equation or parabolic equation; when v3 = 2, YiFeiPU-3 is traditional wave equation or hyperbolic equation; when 0 < v3 < 1, YiFeiPU-3 denoising is continuous interpolation of traditional potential equation and heat equation; when 1 < v3 < 2, YiFeiPU-3 denoising is continuous interpolation of traditional heat equation and wave equation. Therefore, in mathematical and physical sense, YiFeiPU-3 denoising has pushed traditional heat equation based anisotropic diffusion algorithm to vaster domain. 2.7

Implement the fourth fractional partial differential equation based denoising model

From the above disscusion, we know YiFeiPU-3 denoising implements fractional total variation based energy norm directly by fractional extreme value in (13). In other words, it moves the referring coordinate from first-order extreme value to fractional one first, and then implements the corresponding energy norm, which is different with the above mentioned models that using first-order extreme value to implement energy norm. When we deduct like YiFeiPU-3 but adopting first-order extreme value instead of fractional one to implement energy norm, we can implement the fourth kind fractional partial differential equation based multi-scale denoising model. In the first, we reconstruct Ψ1 (β) in (13). We still take first-order extreme valueto implement energy norm. Accord to fractional steepest descent algorithm, we suppose Ψ1 (β) = Ix1 Iy1 [f (φv2 )] = Ω  3) v1 f (φv2 ) dxdy. We assume vector is φ [Dv1 s, (β − 1)Dv1 ξ] = (β − 1)Dv1 ξ − 2Γ(2−v Γ(3−v3 ) D s and norm Ω of vector φ is , 4Γ(2 − v3 ) − Γ(3 − v3 ) v 1 2 4(β − 1)Γ(2 − v3 ) v 1 (D s) − D s · Dv 1 ξ + (β − 1)2 (Dv 1 ξ)2 φ = Γ(3 − v3 ) Γ(3 − v3 )  = (φ)2 + C, 4Γ(2−v3 )Γ(3−v3 )−Γ2 (3−v3 )−4Γ2 (2−v3 ) (Dv1 s)2 . Γ2 (3−v3 ) v1 2 2

where it has C =

Signal · denotes inner product. For vector  v1 φ, D s and D ξ, they respectively have (φ) = |φ| = ( (φ)2 )2 = φ · φ, (Dv1 s)2 = |Dv1 s|2 =   2 v1 v1 v1 2 v1 2 v 2 v 2 1 1 ( (D s)) = D s · D s and (D ξ) = |D ξ| = ( (D ξ) )2 = Dv1 ξ · Dv1 ξ. In the second, we keep Ψ2 (β) = Ω λ[s + (β − 1)ξ − s0 ]s0 dxdy of (13) unchanged. So fractional total variation based energy norm in curve surface family s + (β − 1)ξ are given by v

Ψ(β) = Ψ1 (β) + Ψ2 (β) = Ix1 Iy1 [f (φ 2 ) + λ[(s + (β − 1)ξ − s0 ](s − s0 )] ))

Ω v2

[f (φ ) + λ[s + (β − 1)ξ − s0 ](s − s0 )] dxdy.

= Ω

(16)

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:10

Similarly if v3 -order fractional deveriative exist, we take v3 = 1, 2 and 3 and get fractional minimal value by v3 -order fractional steepest descent algorithm as ∞ 

2k

τ =1 (v2 − τ + 1)

%-$ $&v2 −2k−2 $ 4Γ(2−v3 )−Γ(3−v3 ) $ $ $ Γ(3−v3 )

∂ v3 s −Γ(1 − v1 ) = ∂tv3 Γ(−v1 )Γ(−v3 )Γ(3 − v3 ) (2k)! k=0 * + v −2k−2 v −2k−2 × Dx1 |Dv1 s| 2 Dxv1 s + Dy1 |Dv1 s| 2 Dyv1 s − where

n

τ =1

λs0 , Γ(1 − v3 )Γ(2 − v3 )

(17)

n=0

(v2 − τ + 1) == 1. Similarly deduct as (70), we have

%-$ $&v2 −2k−2 2k $ 4Γ(2−v3 )−Γ(3−v3 ) $ ))  (v − τ + 1) $ $ ∞ τ =1 2 Γ(3−v3 ) −Γ(1 − v1 )Γ(1 − v3 )Γ(2 − v3 ) λ(t) = 2 σ Γ(−v1 )Γ(−v3 )Γ(3 − v3 )s0 (2k)! Ω k=0 * + v −2k−2 v −2k−2 × Dx1 |Dv1 s| 2 Dxv1 s + Dy1 |Dv1 s| 2 Dyv1 s (s − s0 )2 dxdy. (18) We note the fractional partial differential equation based denoising model in (17) and (18) as YiFeiPU-4 denoising. Similar to YiFeiPU-1 denoising, in order to completely remove the faint noise of low-frequency and direct current parties, we need do low-pass filter when numerical iterative implementation. By analyzing (17), (18) and (14), (15), we know that comparing to YiFeiPU-3 denoising, YiFeiPU-4 denoising  increases the nonlinear adjustment effect of order v3 by increasing |(4Γ(2 − v3 ) − Γ(3 − v3 ))/Γ(3 − v3 )| in numberator and Γ(3 − v3 ) in denominator. In addition, we know from (17), when v3 = 0, YiFeiPU-4 denoising is traditional potential equation or elliptic equation; when v3 = 1, YiFeiPU-4 denoising is traditional heat equation or parabolic equation; when v3 = 2, YiFeiPU-4 denoising is traditional wave equation or hyperbolic equation; when 0 < v3 < 1, YiFeiPU-4 denoising is continuous interpolation of traditional potential equation and heat equation; when 1 < v3 < 2, YiFeiPU-3 denoising is continuous interpolation of traditional heat equation and wave equation. Therefore, like YiFeiPU-3 denoising, YiFeiPU-4 denoising also pushes the basical traditional heat-equation based anisotropic diffusion approaches to vaster domain in mathematical and physical meanings sense.

3

Experiments and result analysis

In order to analyze and explain the good denoised capability of fractional partial equation based model, we choose rather better models—bilateral filtering, wavelet, NLMF , YiFeiPU-3 denoising and YiFeiPU-4 denoising that have good performance in denoising in above experiments to do the contrast experiments about texture-rich metallographic image of iron ball. The chosen fractional order v1 and v3 exponential power v2 in the experiment may not be the optimal value, which is just empirical one. As for the optimal values of v1 , v2 and v3 , we will discuss in end of this paper. In addition, numerical iterative process will stop at the point where peak signal-to-noise ratio is the biggest, see as Figure 1. From subjective view of visual effect, we know the follows from Figure 1. First, the denoising capabilities of bilateral filtering and wavelet denoising are comparatively worse than others, which obviously diffused and smoothed high-frequency edge and texture details. We can see from Figure 1 (e) and (h), the edge and texture details is clear, that is to say, the denoised noise by bilateral filtering and wavelet denoising from noisy image may not be the same as the added noise. In Figure 1 (f) and (i), the denoised image is blurry and the denoising is not completed in Figure 1(i), which shows the capabilities for preserving edge and texture details of bilateral filtering and wavelet denoising are worse. Second, the capability for preserving edge and texture details of NLMF denoising is good, but its denoising capability for edge and texture neighboring is bad. In Figure 1(k), though the edge and texture details of denoised image is weaker than that in Figure 1 (e) and (h), but it still can be seen. That is to say, the denoised noise of NLMF denoising is near to the added noise. In Figure 1(l), the blurry for edge and texture details is small, that is to say, NLMF denoising can well preserve the edge and texture details. In addition, in

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:11

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

(o)

(p)

(q)

(r)

Figure 1 Denoising effect for texture-rich metallographic image of iron ball. (a) Original clean image; (b) noisy image (adds white Gaussian noise to the original clean image, PSNR = 13.6872); (c) partial enlarged details of 1/4 party of (a) gray box; (d) denoised image of bilateral filtering denoising [93–95]; (e) residual plot of bilateral filtering denoising; (f) partial enlarged details of 1/4 party of (d) gray box; (g) denoised image of wavelet denoising [95,96]; (h) residual plot of wavelet denoising; (i) partial enlarged details of 1/4 party of (g) gray box; (j) denoised image of NLMF denoising [97,98]; (k) residual plot of NLMF denoising; (l) partial enlarged details of 1/4 party of (j) gray box; (m) denoised image of YiFeiPU-3 denoising (v1 = 1.75, v2 = 2.25, v3 = 1.05, Δt = 10−10 ); (n) residual plot of YiFeiPU-3 denoising; (o) partial enlarged details of 1/4 party of (m) gray box; (p) denoised Image of YiFeiPU-4 denoising (v1 = 1.75, v2 = 2.25, v3 = 1.05, Δt = 10−10 ); (q) residual plot of YiFeiPU-4; (r) partial enlarged details of 1/4 party of (p) gray box.

Pu Y F, et al.

Table 1

Sci China Inf Sci

July 2014 Vol. 57 072115:12

Comprehensive denoising effect for texture-rich metallographic image of iron ball

Denoising algorithm

PSNR

Correlation coefficient

Contrast

Correlation

Energy

Honogeneity

Noisy image

13.6872

0.9991

5.9118

0.2491

0.0197

0.4580

Bilateral filtering denoising

20.4845

0.9995

1.1200

0.8038

0.0787

0.7092

Wavelet denoising

16.5708

0.9994

2.1387

0.3910

0.0473

0.5833

NLMF denoising

21.0494

0.9996

1.2524

0.7150

0.0728

0.6822

YiFeiPU-3 denoising

21.4826

0.9996

1.4365

0.6876

0.0688

0.6646

YiFeiPU-4 denoising

21.5482

0.9997

1.4371

0.6646

0.0688

0.6608

Figure 1(k), the neighboring of edge and texture is smooth, while in Figure 1(l), the residual noise in edge and texture neighboring is stronger than other party. In other words, the denoised capability of NLMF denoising is rather worse for edge and texture neighboring. In the third, the denoising capabilities of YiFeiPU-3 denoising and YiFeiPU-4 denoising are the best, which well-kept high-frequency edge and texture details, and also completely denoised the noises. In Figure 1 (n) and (q), the edge and texture detail may be seen indistinctly in residual image. That is to say, the denoised noise of YiFeiPU-3 and YiFeiPU-4 is near to the added noise. In Figure 1 (o) and (r), the blurry of edge and texture is small and it is comparatively clear, that is to say, the edge and texture preserving capabilities of YiFeiPU-3 denoising and YiFeiPU-4 denoising is the best. From the view of quantitative analysis, we take PSNR, the correlation coefficients between noisy image or denoised image and original clean image [99], and average gray level concurrence matrix to comprehensively estimate the denoised effect. We calculate gray level concurrence matrix coefficient in 5 pixel distance in Figure 1, and export the typical coefficients—Contrast, Correlation, Energy and Homogeneity respectively taking four directions of 0◦ , 45◦ , 90◦ 135◦ . Here 0◦ represents the projection in positive y-coordinate direction, and 90◦ represents the projection in x-coordinate direction. We then average the above values, see as 1. From Table 1 we know denoising capabilities of the above algorithms are as follows. First, the denoising capabilities of bilateral filtering and wavelet denoising are rather worse, and its PSNR and correlation coefficients are relatively small. It shows that high-frequency edge and texture details are greatly diffused and smoothed and the noise does not completely denoised. The similarity between denoised image and original clean image between denoised image and original clean image is small. In addition, the contrast of average gray level concurrence matrix of bilateral filtering denoising is small, which shows the pixels with great contrast is less and the texture ditch is light and it seems fuzzy. The contrast of average gray level concurrence matrix of wavelet denoising is the biggest. It denotes the pixels with great contrast is more, but we can not say the texture ditch is deeper and the visual effect is clearer because the denoising is not completed. Second, the denoising capabilities of NLMF, YiFeiPU-3 denoising and YiFeiPU-4 denoising are well. Its PSNR and correlation coefficients are comparatively high, which denotes that high-frequency edge and texture details of denoised image is well preserved, the denoising is completed and their similarity between denoised image and original clean image is also great. In which, PSNR and correlation coefficients of YiFeiPU-4 is the highest, that is to say, the denoising is most completed and its similarity between denoised image and original clean image is also the biggest. The contrast of average gray level concurrence matrix of YiFeiPU-4 is the biggest. It denotes the pixels with great contrast is more, the texture ditch is deep and it looks clearer. Its correlation is small, which shows the partial gray correlation is weak and texture details is obvious. Its energy is rather small, which shows the texture change is not uniform and regular and the texture details are obvious. The homogeneity is small too, which shows the regional change is violence and the texture details is obvious. Therefore, we can have the conclusion that YiFeiPU-3 and YiFeiPU-4 denoising are the best denoised algorithm. When Gaussian Noise is very strong, especially when original clean signal is completely drown, we take internal organ texture rich abdomen MRI to do the comparative experiments and respectively take the above-discussed well-performanced algorithms— bilateral filtering, wavelet, NLMF, YiFeiPU-3 denoising and YiFeiPU-4 denoising for further analyzing its denoising capability of fractional partial equation based

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:13

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 2 Denoising effect of internal organ texture rich abdomen MRI, when Gaussian noise is very strong. (a) Original clean MRI; (b) noisy image (adds white Gaussian noise to the original clean MRI, PSNR = 5.4389); (c) denoised image of bilateral filtering denoising [93–95]; (d) partial enlarged details of1/4 party of (c) gray box; (e) denoised Image of Wavelet Denoising [95,96]; (f) partial enlarged details of 1/4 party of (e) gray box; (g) denoised Image of NLMF Denoising [97,98]; (h) partial enlarged details, 1/4 party of (g) gray box; (i) denoised image of YiFeiPU-3 denoising (v1 = 1.75, v2 = 2.75, v3 = 1.05, Δt = 5 × 10−6 ); (j) partial enlarged details of 1/4 party of (i) gray box; (k) denoised image of YiFeiPU-4 denoising (v1 = 1.75, v2 = 2.5, v3 = 1.05, Δt = 5 × 10−6 ); (l) partial enlarged details of 1/4 party of (k) gray box.

model to robust noise. The chosen fractional order v1 and v3 exponential power v2 in the experiment may not be the optimal value, which is just empirical one. As for the optimal values of As for the optimal values of v1 , v2 and v3 , we will discuss in end of this paper. In addition, numerical iterative process will stop at the point where peak signal-to-noise ratio is the biggest, see as Figure 2. From the view of subjective visual effect, we know from Figure 2 the follows, when noise is very strong, especially when MRI is drown completely. Firstly, the denoising capabilities of bilateral filtering and wavelet denoising are comparatively worse than others. We can see from Figure 2 (c)–(f), the contour may be seen indistinctly and the edge and texture details of inner organ can hardly be recognized. Secondly, the denoising capability of NLMF is rather good. We can see from Figure 2 (g), (h), the contour is clearer, but the edge and texture details is still blurry. Thirdly, the denoising capabilitis of YiFeiPU-3 and YiFeiPU-4 denoising are the best. We can see from Figure 2 (i)–(l), it is not only the contour is clearest, but also the edge and texture details can be recognized. For quantitative analysis, we measure denoising effect of the above algorithms in Figure 2 by PSNR, the correlation coefficients between noisy image or denoised image and original clean MRI [99], and average gray level concurrence matrix, see as Table 2. From Table 2 we know denoising capabilities of the above algorithms are as follows, when noise is very strong, especially when MRI is drown completely. First, the denoising capabilities of bilateral, wavelet

Pu Y F, et al.

Table 2

Sci China Inf Sci

July 2014 Vol. 57 072115:14

Denoising effect for internal organ texture rich abdomen MRI, when Gaussian noise is very strong

Denoising algorithm

PSNR

Correlation coefficient

Contrast

Correlation

Energy

Honogeneity 0.4767

Noisy image

5.5925

0.9864

20.6594

0.0038

0.1182

Bilateral filtering denoising

11.7401

0.9905

1.0538

0.2542

0.1150

0.6724

Wavelet denoising

11.6262

0.9903

0.7841

0.1444

0.1886

0.7224

NLMF denoising

12.1964

0.9897

1.2085

0.1906

0.1070

0.6530

YiFeiPU-3 denoising

17.3251

0.9975

0.5958

0.7616

0.0974

0.7583

YiFeiPU-4 denoising

17.4988

0.9976

0.5930

0.6811

0.1193

0.7605

and NLMF denoising are rather bad, and their PSNR and correlation coefficients are relatively small. It shows the noise can not be clearly denoised and the similarity between denoised image and original clean image is small. In addition, the contrast of average gray level concurrence matrix of bilateral, wavelet and NLMF denoising is bigger than the one of YiFeiPU-3 and YiFeiPU-4 denoising, which shows the pixels with great contrast are more, but we can not say the texture ditch is deeper and the visual effect is clearer because the denoising is unclearly. Second, the denoising capabilities of YiFeiPU-3 and YiFeiPU-4 denoising are rather good. Their PSNR and correlation coefficients are comparatively higher than bilateral, wavelet and NLMF denoising, which denotes that the noise is completely denoised and the similarity between denoised image and original clean image is big. In which, PSNR and correlation coefficients of YiFeiPU-4 are the highest. And the correlation, energy and homogeneity of of YiFeiPU-3 and YiFeiPU-4 denoising are all smaller. Therefore, we can have the conclusion that YiFeiPU-3 and YiFeiPU-4 denoising are the best denoised models. When Noise is very strong, especially when original clean signal is completely drown, we take meteorite crater texture rich moon satellite remote sensing image to do the comparative experiments and respectively use the above-discussed well-performanced algorithms—bilateral filtering, wavelet, NLMF, YiFeiPU-3 denoising and YiFeiPU-4 denoising for further analyzing its denoising capability of fractional partial equation based model to robust noise. Here adding noise is composite noise that added together white Gaussian noise, salt & pepper noise and speckle noise. The chosen fractional order v1 and v3 exponential power v2 in the experiment may not be the optimal value, which is just empirical one. As for the optimal values of v1 , v2 and v3 , we will discuss in end of this paper. In addition, numerical iterative process will stop at the point where peak signal-to-noise ratio is the biggest, see as Figure 3. From subjective view of visual effect, we know from Figure 3 the follows, when composite noises are added by white Gaussian noise, salt & pepper noise and speckle noise, especially when meteorite crater texture rich moon satellite remote sensing image is drown completely. Firstly, the denoising capabilities of bilateral filtering and wavelet and NLMF denoising are comparatively worse. We can see from Figure 3 (c)–(h), the contour may be seen indistinctly and the edge and texture details of inner organ can hardly be recognized. Secondly, the denoising capabilities of YiFeiPU-3 and YiFeiPU-4 denoising are the best. We can see from Figure 3 (i)–(l), the contour is not only the clearest, but also the edge and texture details can be clearly recognized. For quantitative analysis, we measure denoising effect of the above algorithm in Figure 3 by PSNR, the correlation coefficients between noisy image or denoised image and original clean image [99], and average gray level concurrence matrix, see as Table 3. From Table 3 we know denoising capabilities of the above algorithms are as follows, when composite noises are added by white Gaussian noise, salt & pepper noise and speckle noise, especially when meteorite crater texture rich moon satellite remote sensing image is drown strongly. First, the denoising capabilities of bilateral filtering, wavelet and NLMF denoising are bad, and their PSNR and correlation coefficients are relatively small. It shows the noise can not be denoised completely and the similarity between denoised image and original clean image is small. Second, the denoising capabilities of YiFeiPU-3 and YiFeiPU-4 denoising are rather good. Their PSNR and correlation coefficients are comparatively high, which denotes that the noise is clearly denoised and the similarity between denoised image and original clean image is big. In which, PSNR and correlation coefficients of YiFeiPU-3 denoising are the highest. In addition, the contrast of average gray level concurrence matrix of YiFeiPU-3 and YiFeiPU-4 denoising are rather big,

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:15

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 3 Denoising effect for meteorite crater texture rich moon satellite remote sensing image, when white Gaussian noise, salt & pepper noise and speckle noise are added together. (a) Original clean image; (b) noisy image (adds white Gaussian noise (its standard variance is 0.02), salt & pepper noise (its nNoise density is 0.2) and speckle noise (its standard variance is 0.1) to the original clean image, PSNR = 8.8564); (c) denoised image of bilateral filtering denoising [93–95]; (d) partial enlarged details of 1/4 party of (c) gray box; (e) denoised image of wavelet denoising [95,96]; (f) partial enlarged details of 1/4 party of (e) gray box; (g) denoised image of NLMF denoising [97,98]; (h) partial enlarged details of 1/4 party of (g) gray box; (i) denoised image of YiFeiPU-3 (v1 = 1.75, v2 = 2.75, v3 = 1.05, Δt = 10−10 ); (j) partial enlarged details of 1/4 party of (i) gray box; (k) denoised Image of YiFeiPU-4 (v1 = 1.75, v2 = 2.5, v3 = 1.05, Δt = 10−10 ); (l) partial enlarged details of 1/4 party of (k) gray box.

which shows the pixels with great contrast is more, and the texture ditch is deeper and the visual effect is clearer. Their Correlation, Energy and Homogeneity are all relatively small. Therefore, we can have the conclusion that YiFeiPU-3 and YiFeiPU-4 denoising are the best denoised models. By contrasting subjective visual effect and objective quantitative analysis in Figures 1–3, we can find the follows. Firstly, fractional partial differential equation based multi-scale model has well denoised capability no matter how strong and which kind the noise is. Its PSNR and correlation coefficients are high, the noise is comparatively denoised, and denoised signal coincides well with original clean image. Secondly, fractional partial differential equation based multi-scale model can preserve the lowfrequency contour feature in the smooth area to the furthest degree, and nonlinearly keep high-frequency edge information and texture information in those areas where grey scale changes frequently, and as well as, nonlinearly enhance texture details in those areas where grey scale does not change evidently while denoising. The reasons for its strong capability for denoising lie in the properties of fractional calculus. The properties of signal’s fractional differential is as follows [67,68]. In the first, fractional differential of constant is non-zero while it must be zero by integral differential. Fractional differential is from maximum in the singular leaping point to zero in those smooth areas where signal is not changed or not changed greatly. Note that, by default, any integral differential in smooth area approximately

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:16

Table 3 Denoising effect of meteorite crater texture rich moon satellite remote sensing image, when white Gaussian noise, salt & pepper noise and speckle noise are added together Denoising algorithm

PSNR

Correlation coefficient

Contrast

Correlation

Energy

Honogeneity

Noisy image

10.9174

0.9975

8.0222

0.2560

0.0405

0.4985

Bilateral filtering denoising

18.8425

0.9987

0.5715

0.9016

0.1042

0.7926

Wavelet denoising

18.6702

0.9987

0.4572

0.7948

0.1628

0.8203

NLMF denoising

19.5439

0.9885

0.6112

0.8540

0.0914

0.7625

YiFeiPU 3 denoising

20.2069

0.9993

0.8024

0.8640

0.0672

0.7304

YiFeiPU 4 denoising

20.1421

0.9992

0.7697

0.8641

0.0695

0.7357

equals to zero, which is the remarkable difference between fractional differential and integral one. In the second, fractional differential in the start point of gradient of signal phrase or slope is non-zero, which nonlinearly enhances singularity signal of high-frequency parties. With the increasing of fractional order, the strengthening for singularity signal is also greater. For example, when 0 < v < 1 the strengthen is less than that when v = 1. Integral differential is the special case of fractional differential. In the third, fractional differential along the slope of signal amplitude is not zero or constant, it is nonlinear curve, while its integral differential is constant. From the above discussion, we can see that fractional differential could nonlinearly preserve the low-frequency contour feature in the smooth area to the furthest degree, and as well as, nonlinearly enhance high-frequency edge information in those areas where grey scale changes frequently, and nonlinearly enhance high-frequency texture details in those areas where grey scale does not change evidently [67,78–83].

4

Conclusion

We propose to introduce a newly mathematical method—fractional differential to the field of denoising of texture image and to implement fractional partial differential equation, that is, a fractional total variation and fractional steepest descent approach based multi-scale denoising model. As we know from the previous researches that fractional differential of direct current or low-frequency signal is usually nonzero and fractional calculus can nonlinearly enhance the complex texture details of fractal-like structure. Therefore, we can get the conclusion by theoretical deduction that it can preserve the low-frequency contour feature in the smooth area to the furthest degree, and nonlinearly keep high-frequency edge information and texture information in those areas where grey scale changes frequently, and as well as, nonlinearly enhance texture details in those areas where grey scale does not change evidently. To prove the above conclusion, we discuss fractional Green formula, fractional Gauss formula and fractional Stokes formula in the first. On the basis of above three formulas, we discuss fractional Euler-Lagrange formula for two-dimensional image processing. In the second, we further discuss fractional steepest descent approach and its stability and convergence rate. In the third, we implement 4 fractional partial differential equations based multiscale denoising models. In the last, we prove the stability and astringency of fractional steepest descent approach and discuss fractional nonlinearly multi-scale denoising capability and best value of parameters. The experiments results show that its ability for preserving high-frequency edge and complex texture information is obviously superior to traditional integral based algorithm when denoising, especially for texture detail rich images.

Acknowledgements The work was supported by Foundation Franco-Chinoise Pour La Science Et Ses Applications (FFCSA), National Natural Science Foundation of China (Grant Nos. 60972131, 61201438), Returned Overseas Chinese Scholars Project of Education Ministry of China (Grant No. 20111139), Science and Technology Support Project of Sichuan Province of China (Grant Nos. 2011GZ0201, 2013SZ0071), Soft Science Project of Sichuan Province of China (Grant No. 2013ZR0010), and Chengdu Administration of Science and Technology for Transfer of Scientific and Technological Achievements (Grant No. 12DXYB255JH-002).

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:17

References 1 Chan T, Esedoglu S, Park F, et al. Recent developments in total variation image restoration. In: Paragios N, Yunmei C, Faugeras O, eds. Handbook of Mathematical Models in Computer Vision. New York: Springer-Verlag, 2005 2 Buades A, Coll B, Morel J M. A review of image denoising algorithms, with a new one. Multiscale Model Simul, 2005, 4: 490–530 3 Weickert J. Anisotropic Diffusion in Image Processing. Stuttgart: Teubner, 1998 4 Aubert G, Kornprobst P. Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. New York: Springer-Verlag, 2006 5 Perona P, Malik J. Scale-space and edge detecting using anisotropic diffusion. IEEE Trans Patt Anal Mach Intell, 1990, 12: 629–639 6 Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D-Nonlinear Phenom, 1992, 60: 259–268 7 Sapiro G, Ringach D. Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Trans Image Process, 1996, 5: 1582–1586 8 Blomgren P, Chan T. Color TV: total variation methods for restoration of vector-valued images. IEEE Trans Image Process, 1998, 7: 304–309 9 Galatsanos N P, Katsaggelos A K. Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation. IEEE Trans Image Process, 1992, 1: 322–336 10 Li S Z. Close-form solution and parameter selection for convex minimization-based edge-preserving smoothing. IEEE Trans Patt Anal Mach Intell, 1998, 20: 916–932 11 Nguyen N, Milanfar P, Golub G. Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Trans Image Process, 2001, 10: 1299–1308 12 Strong D M, Aujol J F, Chan T F. Scale recognition, regularization parameter selection, and Meyer’s g norm in total variation regularization, multiscale model. Multiscale Model Simul, 2006, 5: 273–303 13 Thompson A M, Brown J C, Kay J W, et al. A study of methods of choosing the smoothing parameter in image restoration by regularization. IEEE Trans Patt Anal Mach Intell, 1991, 13: 326–339 14 Mrazek P, Navara M. Selection of optimal stopping time for nonlinear diffusion filtering. Int J Comput Vis, 2003, 52: 189–203 15 Gilboa G, Sochen N, Zeevi Y Y. Estimation of optimal PDE-based denoising in the SNR sense. IEEE Trans Image Process, 2006, 15: 2269–2280 16 Vogel C R, Oman M E. Iterative methods for total variation denoising. SIAM J Sci Comput, 1996, 17: 227–238 17 Dobson D C, Vogel C R. Convergence of an iterative method for total variation denoising. SIAM J Numer Anal, 1997, 34: 1779–1791 18 Chambolle A. An algorithm for total variation minimization and applications. J Math Imaging Vis, 2004, 20: 89–97 19 Darbon J, Sigelle M. Exact optimization of discrete constrained total variation minimization problems. In: Proccedings of 10th International Workshop on Combinatorial Image Analysis, New Zealand, 2004. 548–557 20 Darbon J, Sigelle M. Image restoration with discrete constrained total variation part I: fast and exact optimization. J Math Imaging Vis, 2006, 26: 261–276 21 Darbon J, Sigelle M. Image restoration with discrete constrained total variation part II: levelable functions, convex priors and non-convex cases. J Math Imaging Vis, 2006, 26: 277–291 22 Wohlberg B, Rodriguez P. An iteratively reweighted norm algorithm for minimization of total variation functionals. IEEE Signal Process Lett, 2007, 14: 948–951 23 Catte F, Lions P L, Morel J M, et al. Image selective smoothing and edge detection by nonlinear diffusion. SIAM J Numer Anal, 1992, 29: 182–193 24 Meyer Y. Oscillating Patterns in Image Processing and Nonlinear Evolution Equations: the Fifteenth Dean Jacqueline B. Lewis Memorial Lectures. Boston: American Mathematical Society, 2001 25 Strong D, Chan T. Edge-preserving and scale-dependent properties of total variation regularization. Inverse Probl, 2003, 19: 165–187 26 Alliney S. A property of the minimum vectors of a regularizing functional defined by means of the absolute norm. IEEE Trans Signal Process, 1997, 45: 913–917 27 Nikolova M. A variational approach to remove outliers and impulse noise. J Math Imag Vis, 2004, 20: 99–120 28 Chan T, Esedoglu S. Aspects of total variation regularized l1 function approximation. SIAM J Numer Anal, 2005, 65: 1817–1837 29 Nikolova M. Minimizers of cost-functions involving nonsmooth data-fidelity terms. SIAM J Numer Anal, 2002, 40: 965–994 30 Osher S, Burger M, Goldfarb D, et al. An iterative regularization method for total variation based on image restoration. Multiscale Model Simul, 2005, 4: 460–489 31 Gilboa G, Zeevi Y Y, Sochen N. Texture preserving variational denoising using an adaptive fidelity term. In: Proc-

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:18

cedings of Variational, Geometric and Level Set Methods in Computer Vision, Nice, 2003. 137–144 32 Esedoglu S, Osher S. Decomposition of images by the anisotropic Rudin-Osher-Fatemi model. Commun Pure Appl Math, 2004, 57: 1609–1626 33 Blomgren P, Chan T, Mulet P. Extensions to total variation denoising. In: Proceedings of SPIE, Advanced Signal Processing: Algorithms, Architectures, and Implementations, Franklin, 1997. 267–375 34 Blomgren P, Mulet P, Chan T, et al. Total variation image restoration: numerical methods and extensions. In: Proceedings of International Conference on Image Processing, Santa Barbara, 1997. 384–387 35 Chan T, Marquina A, Mulet P. High-order total variation-based image restoration. SIAM J Sci Comput, 2000, 22: 503–516 36 You Y L, Kaveh M. Fourth-order partial differential equation for noise removal. IEEE Trans Image Process, 2000, 9: 1723–1730 37 Lysaker M, Lundervold A, Tai X C. Noise removal using fourth order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans Image Process, 2003, 12: 1579–1590 38 Gilboa G, Sochen N, Zeevi Y Y. Image enhancement and denoising by complex diffusion processes. IEEE Trans Patt Anal Mach Intell, 2004, 26: 1020–1036 39 Chambolle A, Lions P. Image recovery via total variation minimization and related problem. Numer Math, 1997, 76: 167–188 40 Osher S, Sole A, Vese L. Image decomposition and restoration using total variation minimization and the H 1 norm. Multiscale Model Simul, 2005, 1: 349–370 41 Lysaker M, Tai X C. Iterative images restoration combining a total variation minimization and a second-order functional. Int J Comput Vis, 2006, 66: 5–18 42 Li F, Shen C M, Fan J S, et al. Image restoration combining a total variational filter and a fourth-order filter. J Vis Commun Image Represent, 2007, 18: 322–330 43 Lysaker M, Osher M, Tai X C. Noise removal using smoothed normals and surface fitting. IEEE Trans Image Process, 2004, 13: 1345–1357 44 Dong F F, Liu Z, Kong D X, et al. An improved lot model for image restoration. J Math Imag Vis, 2009, 34: 89–97 45 Love E R. Fractional derivatives of imaginary order. J London Math Soc, 1971, 3: 241–259 46 Oldham K B, Spanier J. The Fractional Calculus: Integrations and Differentiations of Arbitrary Order. New York: Academic Press, 1974 47 McBride A C. Fractional Calculus. New York: Halsted Press, 1986 48 Nishimoto K. Fractional Calculus. New Haven: University of New Haven Press, 1989 49 Samko S G, Kilbas A A, Marichev O I. Fractional Integrals and Derivatives. Yverdon: Gordon and Breach, 1993 50 Miller K S. Derivatives of noninteger order. Math Mag, 1995, 68: 183–192 51 Samko S G, Kilbas A A, Marichev O I. Fractional Integrals and Derivatives: Theory and Applications. Philadelphia: Gordon and Breach Science Publishers, 1992. 75–85 52 Engheta N. On fractional calculus and fractional multipoles in electromagnetism. IEEE Antennas Propag Mag, 1996, 44: 554–566 53 Engheta N. On the role of fractional calculus in electromagnetic theory. IEEE Antennas Propag Mag, 1997, 39: 35–46 54 Chen M P, Srivastava H M. Fractional calculus operators and their applications involving power functions and summation of series. Appl Math Comput, 1997, 81: 287–304 55 Butzer P L, Westphal U. Applications of Fractional Calculus in Physics. Singapore: World Scientific, 2000 56 Kempfle S, Schaefer I, Beyer H R. Fractional calculus via functional calculus: theory and applications. Nonlinear Dyn, 2002, 29: 99–127 57 Richard L M. Fractional calculus in bioengineering, critical reviews in biomedical engineering. CRC Crit Rev Biomed Eng, 2004, 32: 1–377 58 Kilbas A A, Srivastava H M, Trujiilo J J. Theory and Applications of Fractional Differential Equations. Amsterdam: Elsevier, 2006 59 Sabatier J, Agrawal O P, Tenreiro Machado J A. Advances in Fractional Calculus: Theoretical Developments and Applications in Physics and Engineering. Springer, 2007 60 Koeller R C. Applications of the fractional calculus to the theory of viscoelasticity. J Appl Mech, 1984, 51: 294–298 61 Rossikhin Y A, Shitikova M V. Applications of fractional calculus to dynamic problems of linear and nonlinear hereditary mechanics of solids. Appl Mech Rev, 1997, 50: 15–67 62 Manabe S. A suggestion of fractional-order controller for flexible spacecraft attitude control. Nonlinear Dyn, 2002, 29: 251–268 63 Chen W, Holm S. Fractional Laplacian time-space models for linear and nonlinear lossy media exhibiting arbitrary frequency dependency. J Acoust Soc Amer, 2004, 115: 1424–1430 64 Perrin E, Harba R, Berzin-Joseph C, et al. N th-order fractional Brownian motion and fractional Gaussian noises. IEEE Trans Signal Process, 2001, 49: 1049–1059

Pu Y F, et al.

Sci China Inf Sci

July 2014 Vol. 57 072115:19

65 Tseng C C. Design of fractional order digital FIR differentiators. IEEE Signal Process Lett, 2001, 8: 77–79 66 Chen Y Q, Vinagre B M. A new IIR-type digital fractional order differentiator. Signal Process, 2003, 83: 2359–2365 67 Pu Y F. Research on application of fractional calculus to latest signal analysis and processing. Dissertation for the Doctoral Degree. Chengdu: Sichuan University, 2006 68 Pu Y F, Yuan X, Liao K, et al. Five numerical algorithms of fractional calculus applied in modern signal analyzing and processing. J Sichuan Univ (Eng Sci Ed), 2005, 37: 118–124 69 Pu Y F, Yuan X, Liao K, et al. Structuring analog fractance circuit for 1/2 order fractional calculus. In: Proceedings of IEEE 6th International Conference on ASIC, Shanghai, 2005. 1039–1042 70 Pu Y F, Yuan X, Liao K, et al. Implement any fractional order neural-type pulse oscillator with net-grid type analog fractance circuit. J Sichuan Univ (Eng Sci Ed), 2006, 38: 128–132 71 Duits R, Felsberg M, Florack L, et al. α scale spaces on a bounded domain. In: Proceedings of the 4th International Conference on Scale Space Methods in Computer Vision, Isle of Skye, 2003. 494–510 72 Didas S, Burgeth B, Imiya A, et al. Regularity and scale space properties of fractional high order linear filtering. Scale Space PDE Meth Comput Vis, 2005, 3459: 13–25 73 Unser M, Blu T. Fractional splines and wavelets. SIAM Rev, 2000, 42: 43–67 74 Ninness B. Estimation of 1/f noise. IEEE Trans Inf Theory, 1998, 44: 32–46 75 Duits R, Florack L, Graaf J, et al. On the axioms of scale space theory. J Math Imag Vis, 2004, 20: 267–298 76 Mathieu B, Melchior P, Oustaloup A, et al. Fractional differentiation for edge detection. Signal Process, 2003, 83: 2421–2432 77 Liu S C, Chang S. Dimension estimation of discrete-time fractional brownian motion with applications to image texture classification. IEEE Trans Image Process, 1997, 6: 1176–1184 78 Pu Y F. Fractional calculus approach to texture of digital image. In: Proceedings of IEEE 8th International Conference on Signal Processing, Guilin, 2006. 1002–1006 79 Pu Y F. Fractional Differential Filter of Digital Image. China Patent, ZL200610021702.3, 2006 80 Pu Y F. High Precision Fractional Calculus Filter of Digital Image. China Patent, ZL201010138742.2, 2010 81 Pu Y F, Wang W X, Zhou J L, et al. Fractional differential approach to detecting textural features of digital image and its fractional differential filter implementation. Sci China Ser F-Inf Sci, 2008, 38: 2252–2272 82 Pu Y F, Zhou J L. A novel approach for multi-scale texture segmentation based on fractional differential. Int J Comput Math, 2011, 88: 58–78 83 Pu Y F, Zhou J L, Yuan X. Fractional differential mask: a fractional differential-based approach for multiscale texture enhancement. IEEE Trans Image Process, 2010, 19: 491–511 84 Guidotti P, Lambers J V. Two new nonlinear nonlocal diffusions for noise reduction. J Math Imag Vis, 2009, 33: 25–37 85 Bai J, Feng X C. Fractional-order anisotropic diffusion for image denoising. IEEE Trans Image Process, 2007, 16: 2492–2502 86 Vasily E T. Fractional vector calculus and fractional Maxwell’s equations. Ann Phys, 2008, 323: 2756–2778 87 Baleanu D, Machado J, Luo A J. Fractional Dynamics And Control. New York: Springer, 2011 88 Agrawal O P. Generalized Euler-Lagrange equations and transversality conditions for FVPs in terms of the Caputo derivative. J Vib Control, 2007, 13: 1217–1237 89 Snyman J. Practical Mathematical Optimization: an Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. Springer, 2005 90 Halmos P R. Measure Theory. Springer-Verlag, 1974 91 Munroe M E. Introduction to Measure and Integration. Addison-Wesley, 1953. 25–90 92 Tychonoff A N, Arsenin V Y. Solution of Ill-Posed Problems. Washington: Winston & Sons, 1977. 45–78 93 Tomasi C, Manduchi R. Bilateral filtering for gray and color images. In: Proceedings of the IEEE International Conference on Computer Vision, Bombay, 1998. 839–846 94 Zhang M, Gunturk B K. Multiresolution bilateral filtering for image denoising. IEEE Trans Image Process, 2008, 17: 2324–2333 95 Yu H, Zhao L, Wang H. Image denoising using trivariate shrinkage filter in the wavelet domain and joint bilateral filter in the spatial domain. IEEE Trans Image Process, 2009, 18: 2364–2369 96 Chen G Y, Bui T D. Multiwavelets denoising using neighboring coefficients. IEEE Signal Process Lett, 2003, 10: 211–214 97 Buades A, Coll B, Morel J M. A non-local algorithm for image denoising. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, 2005. 60–65 98 Buades A, Coll B, Morel J M. Nonlocal image and movie denoising. Int J Comput Vis, 2008, 76: 123–139 99 Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error measurement to structural similarity. IEEE Trans Image Process, 2004, 13: 600–612

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072116:1–072116:11 doi: 10.1007/s11432-014-5086-8

Content-based image retrieval using high-dimensional information geometry CAO WenMing1 ∗ , LIU Ning1 , KONG QiCong1 & FENG Hao2 1College

of Information Engineering, Shenzhen University, Shenzhen 518060, China; of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

2College

Received November 11, 2013; accepted December 18, 2013; published online April 21, 2014

Abstract In this paper, a new content-based image retrieval approach is proposed based on high- dimensional information theory. The proposed approach overcomes the disadvantages of the current content-based image retrieval algorithms that suffer from the semantic gap. First, we present a new multidimensional information space’s vector angle cosine algorithm of high- dimensional geometry, then, we provide a detailed description of our images retrieval method including proposal of an overlapping image block method and definition of a similarity degree between images on the non-dimensional information subspaces. Finally, experimental results show the higher retrieval efficiency of the proposed algorithm. Keywords space

image retrieval, angel cosine, high-dimensional information, feature extraction, information sub-

Citation Cao W M, Liu N, Kong Q C, et al. Content-based image retrieval using high-dimensional information geometry. Sci China Inf Sci, 2014, 57: 072116(11), doi: 10.1007/s11432-014-5086-8

1

Introduction

All current content-based image retrieval (CBIR) systems, whether commercial or experimental, are based on commonest features of the images to retrieve stored images from a collection by comparing features automatically extracted from the images themselves. The commonest features used are mathematical measures of color, texture or shape. For example, Google1) and Baidu2) are all appended new image retrieval ways and means based on this theory. However, the user can not be satisfied with contentbased image retrieval, and they would prefer to retrieve according their will, i.e. semantic-based image retrieval. It is difficult to infer the image semantic from the image content, because it involves the understanding of images. Because of the semantic gap between image content and semantic, it is hard to find a universal mathematical model, which is mapped to image semantic from image feature. That is, switch from data to knowledge is a difficult thing. In fact, the first general purpose CBIR system, QBIC [1], was developed nearly 10 years ago. Then a number of general purpose CBIR systems have been built [2,3]. Many approaches have been proposed to extract physical features such as color [4], video ∗ Corresponding

author (email: [email protected]) 1) http://www.google.com.hk/imghp?hl=zh-TW&tab=wi. 2) http://image.baidu.com.

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:2

retrieval [5], sketch [6,7], shape [8,9], structure or a combination of two or more such features. Although it’s difficult, scholars also propose some methods, such as image segmentation, relative feedback, and build complex no-linear model. As image segmentation is an open problem, it is difficult to find an image segmentation arithmetic which is commonly used and fit for people’s understanding. What’s more, the image segmentation with large computational load is difficult to realize real-time computation. Moreover, building complex non-linear model is also difficult, because after extracting the commonest features, the data amount decreased dramatically. In fact, building the semantic model is practically impossible. Consequently, based on the current means, it is a good choice to utilize human-computer interaction. Recently, a new pattern recognition methodbiomimetic pattern recognition theory [10–14] was proposed. It breaks a new path for pattern recognition and image processing and has had a great application in face detection, face recognition, speech recognition, and so on. Content-Based Image Retrieval aims at developing techniques that support effective searching and browsing of large image repositories based on image features. However, the current content-based image retrieval (CBIR) algorithms suffer from the semantic gap, which often leads to distraction in the search. To overcome these disadvantages, in this paper, we combine relative high-dimensional information theory, propose a new content-based image retrieval method which is based on one-sided aspect that people are interested in. That is, how to find out the place that interested people from a complex image. Then the high-dimensional information character of the place is used for the image retrieval. The availability of the proposed method is analyzed and confirmed by the example. Experiment results show that our new method can effectively improve the performance of the accuracy and are practicable in image retrieval.

2

The relative theory of high-dimensional geometry

The theory and calculation of the high-dimensional biomimetic information geometry are all based on the concept of high-dimensional geometry. The subspace is defined by the point set on the space based on the theory, so the complex calculation in high-dimensional space could be implemented by means of the simple geometry operation combination in 2D. The purpose of this section is to present the geometry system on high-dimensional space by Linear algebra tool, and propose the classification theory of high-dimensional information geometry property. 2.1

The basic operator of high-dimensional biomimetic information geometry

For convenience, we could use exterior algebra for computations; which makes the results more clear and the proof simpler. In this section we present some basic theories about exterior algebra which are needed in our paper briefly. Let (E n , ·, ·) be an n-dimensional information inner product space, which will be our ambient space throughout this paper. The concept of 2-inner product spaces and angle was first introduced by Diminnie et al. [15] and Hendra Gunawan et al. [16], The inner product of any two vectors x · y = x, y and is defined by Definition 1. The n-dimensional inner product ·, · |·, . . . , ·  on is defined by [15] $ $ $ x , x  x , x  · · · x , x  $ 0 2 0 n $ $ 0 1 $ $ $ x2 , x1  x2 , x2  · · · x2 , xn  $ $ $ x0 , x1 |x2 , . . . , xn  = $ $. .. .. .. .. $ $ . . . . $ $ $ $ $ xn , x1  xn , x2  · · · xn , xn  $ 1/2

(1)

to denote the induced norm on X. The standard n-information norm is given by We use . = ,  x1 , x2 , . . . , xn  = x1 , x1 |x2 , . . . , xn 1/2 for n  2. For n = 1, the expression x0 , x1 |x2 , . . . , xn  is to be understood as x0 , x1 , which denotes nothing but an inner product on X. Geometrically, being the square root of the Gram’s determinant, x1 , x2 , . . . , xn  represents the volume of the n-dimensional parallelepiped spanned by x1 , x2 , . . . , xn .

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:3 ⊥

βR

A

αR



αR

θ

βR

αB

θ

θ

B Figure 1

Projection of α on B.

Figure 2

1-dimensional information subspaces of E n .

Given two nonzero, finite-dimensional, information subspaces A and B of E n with dim(A)  dim(B). We wish to have a definition of the angle between A and B that can be viewed on high-dimensional information space, in some sense, the usual definition of the angle as follows: (a) Between a 1-dimensional information subspace and a q-dimensional information subspace of E n (Figures 1 and 2); (b) Between two p-dimensional information subspaces intersecting on a common (p − 1)-dimensional subspace of E n . For example, (a) If A = span{α} is a 1-dimensional information subspace and B = span{β  1 , . . . , βq } is a q-dimensional information subspace of E n , the length of the α can be given as α = α, α, then the angle θ between A and B is defined by   α, αB  θ = arccos , (2) α αB  where αB denotes the projection of α on B. (b) If A = span{α, γ1 , . . . , γp−1 } and B = span{β, γ1 , . . . , γp−1 } are p-dimensional information subspaces of E n that intersects on (p − 1)-dimensional information subspace R = span{γ1 , . . . , γp−1 } with p  2, then the angle θ between A and B may be defined by  . /  ⊥ α⊥ R , βR " " " θ = arccos " (3) "α⊥ " "β ⊥ " , R R ⊥ where α⊥ R and βR are the orthogonal complement of A and B respectively, on R. One common property among these two cases is the following. In (a), we may write α = αB + α⊥ B, p q Then the angle θ between two information subspaces A and B , p  q, is defined by   αB  θ = arccos , if p < q, (4) α

α, β , if p = q. (5) α · β Here, the p-dimensional angle θ between two decomposable p-vectors α and β on E n is equal to the usual Euclidean angle between α and β as two vectors on the induced Euclidean space. θ = arccos

Theorem 1. We claim above that the cosine of the angle θ between the two p-dimensional information subspaces A = span{α, γ1 , . . . , γp−1 } and B = span{β, γ1 , . . . , γp−1 } defined by (3) is equal to the ratio between the volume of the p-dimensional parallelepiped spanned by the projection of α, γ1 , . . . , γp−1 on information subspace and the volume of the p-dimensional parallelepiped spanned by information α, γ1 , . . . , γp−1 . That is, 2 αB , γ1 , . . . , γp−1  (6) cos2 θ = 2 . α, γ1 , . . . , γp−1  2

Proof. Firstly, observe that θ satisfies cos2 θ = ⊥ α = αR + α⊥ R and β = βR + βR , we obtain 2

α, β |γ1 , . . . , γp−1 

α, γ1 , . . . , γp−1 2 β, γ1 , . . . , γp−1 2

α, β |γ1 , . . . , γp−1  2

α, γ1 , . . . , γp−1  β, γ1 , . . . , γp−1 2

. Indeed, writing

. ⊥ ⊥ /2 . ⊥ ⊥ /2 4 α ,β αR , βR γ1 , . . . , γp−1  = " R"2 " R "2 = " "2 " "2 "α⊥ " "β ⊥ " γ1 , . . . , γp−1 4 "α⊥ " "β ⊥ " R R R R

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:4

as stated. p−1 Suppose now thatαB = ω · β + k=1 ωk γk , In particular, the scalar ω is given by ω=

α, β |γ1 , . . . , γp−1  2

β, γ1 , . . . , γp−1 

.

Then we have 2

αB , γ1 , . . . , γp−1  = α, αB |γ1 , . . . , γp−1  = ω α, β |γ1 , . . . , γp−1  =

α, β |γ1 , . . . , γp−1  β, γ1 , . . . , γp−1 

2

.

Hence, we obtain 2

cos2 θ =

α, β |γ1 , . . . , γp−1 

α, γ1 , . . . , γp−1 2 β, γ1 , . . . , γp−1 2

=

αB , γ1 , . . . , γp−1  α, γ1 , . . . , γp−1 2

2

.

Definition 2. We define the angle θ between a p-dimensional information subspace A = span{α1 , . . . , αp } and a q-dimensional information subspace B = span{β1 , . . . , βq } of E n (with p  q) by " ∗ " "α , . . . , α∗ "2 1,B p,B 2 , (7) cos θ = 2 α1 , . . . , αp  where α∗i,B denotes the projection of information αi on information subspace B. Theorem 2. Suppose that q-dimensional information subspace B = span{β1 , . . . , βq } is orthonormal, the cosine of the angle θ between information subspaces A and B may be given by   det M M T 2 , (8) cos θ = det [αi , αj ] where αi ∈ A = span{α1 , . . . , αp }, i = 1, . . . , p and ⎡ ⎤ α1 , β1  , α1 , β2  , α1 , β3  , . . . , α1 , βq  ⎢ ⎥ ⎢ α2 , β1  , α2 , β2  , α2 , β3  , . . . , α2 , βq  ⎥ ⎥ M =⎢ ⎢ ⎥ ··· ⎣ ⎦ αp , β1  , αp , β2  , αp , β3  , . . . , αp , βq  is a p × q matrix and M T is its transpose. If {α1 , . . . , αp } happens to be orthonormal, then the formula reduces to   (9) cos2 θ = det M M T . Further, if p = q, det M = det M T , so we get cosθ = det |M |. 2.2

Feature extract based cosine measure

Suppose there are n samples in d dimensions, such as x1 , x2 , . . . , xn . If there exists a vector x0 which has the minimum angle of the sum with each sample of xk , k = 1, 2, . . . , n. That is to say, the angle cosine value is as large as possible. Define angle cosine function as follows:   x0 , xi  (10) cos θ0 (xi ) = = e0 , ei  , x0  xi  x0 xi , ei = . x0  xi  Define the error function E0 (x0 ) based on the cosine measure, as follows:

where e0 =

n

E0 (x0 ) =

1 e0 , ei  . n i=1

(11)

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:5

I

ai θi m Figure 3

xi

1-dimensional information subspaces of Samples.

Theorem 3. There is an unit vector e0 in d-dimension that can make E0 (x0 ) be largest. Proof. The proof is as follows: n

1 1 cos θ0 (xi ) = E0 (x0 ) = n i=1 n Suppose m =



n 

x0 , xi  x0  · xi 

i=1



1 = n

 n 

 e0 , ei 

i=1

6 =

n

1 e0 , ei n i=1

7 .

(12)

1 n ei , there is E0 (x0 ) = e0 · m = |e0 | · |m| cos θ0 . n i=1

By the above proving process, we can see that it is not difficult to find that the angle cosine obtains the maximum value if the mean of e0 equals the mean of the sample unit vectors. That is to say, E0 (x0 ) 1 will take the largest value if e0 = ei . So the expression of original sample unit vectors is the mean of n them. Let the sample mean is denoted by l. Thus, we get x = m + al.

(13)

Here, a is a real scalar, it shows the distance from a point on the line to xi (Figure 3) . As shown in the Figure 3, 1-dimensional expression of one sample xi is the vertical projection ai which can be obtained by projecting the samples xi on a straight line l. And x = m + ai l can be considered an approximation value to xi in one dimensional space. From the Figure 3, we can obtain that ai = |xi − m| cos θi .

(14)

Because the θi is the angle between the vector l and xi , and |l| = 1, Eq. (15) can be represented as ai = |xi − m| cos θi = |xi − m| · |l| cos θi = (xi − m) · |l| = lT · (xi − m).

(15)

Theorem 4. There is the optimal direction of l that makes E0 (e0 ) largest. Proof. We give the following derivation process. n

E1 (x0 ) =

n

1 1 cos θ0 (ei ) = e0 , ei  . n i=1 n i=1

(16)

Formula (14) and (15) are deformed and get into (16), then we can get n

E1 (l) =

n

n

/ 1 1 1 . [m + lT (ei − m)l], ei . e0 , ei  = (m + ai l), e = n i=1 n i=1 n i=1

(17)

Because the first item has nothing to do with l in (17), if we want to make E1 (l) largest, the lT (ei − m)l value should be the largest. We use the Lagrange Multiplier method, the constraint is that |l| = 1. The Lagrange multiplier is denoted by λ. Suppose / . (18) y = lT (ei − m)l, ei .

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:6

To seek partial derivatives of l, we get ∂y = 2(ei − m), ei l − 2λl. ∂l

(19)

Suppose the gradient in (20) is zero, there are (ei − m), ei l = λl.

(20)

Therefore, in order to make the maximization, the eigen vectors that are corresponded to the largest eigenvalue of the matrix (ei − m), ei l should be selected as the direction of the projection line l. Thus, if all units samples of e1 , e2 , . . . , en are projected on the eigen vector that is corresponded to the largest eigenvalue of (ei − m), ei l, the one dimensional expression of the n samples can be obtained under the meaning of minimum angle. This conclusion can also be extended to a d -dimensional space mapping (d  d). Eq. (14) can be deformed as: d  x=m+ a i li . (21) i=1

The square function criterion can be deformed as: n

1 Ed (l1 , l2 , . . . , ld ) = n i=1

6



m+

d 

 a i l i , ei

7 .

(22)

i=1

It is easy to prove that Ed can get the maximum value when the vector l1 , l2 , . . . , ld are respectively the eigen vectors that are corresponded to the d largest eigenvalue of matrix (ei − m), ei l. Because the eigen vectors are orthogonal to each other, it is enough to become a space base vector that can represent any unit vector ei . The coefficient ai in the formula (21) is the coefficient value that an unit vector ei corresponds to the space based on li , which is called the principal component of the angle. In this paper, Input data and feature exaction are vector in inner space. 2.3

The interactive image retrieval

In this section, we provide a detailed description of our Image Retrieval method and discuss the algorithm implementation. From the defined Image angle θ, let θA denotes the angle between the Image A and itself, clearly cos2 θA = 1. Let θA,B = 1 denotes the cosine of the angle θ between Image A and image B, if cos θA,B = 1, then we consider the difference between Image A and Image B is minimal. So we use the cosine of the angle θ to define the similarity between images. Moreover, the proposed image retrieval algorithm based on the general rules that the human understanding of things, just as the follows: (a) Interested in the one-sidedness. (b) Independent character of image understanding. (c) The representative of things. The steps of the algorithm are as the follows: Step 1. Take the region of interest Amn in an image AMN , and measure the n-dimensional information inner product as (we use symbol Amn to express both the region of interest as well as n-dimensional information inner product) Figure 4. Step 2. To process the other pictures in the image library, we take any picture from the image library randomly, mark it as BM  N  , then divide it into several parity of blocks using the method of reference [7]. In this paper, we propose overlapping block method. On some occasions, this method can avoid the above problem. The principles of image blocks are as follows: We can use the first step of Amn as a template. The overlapping block method can be applied to BM  N  . We suppose the total number of blocks is q. (q may be greater than or equal to M × N /m × n. Step 3. We could apply the theory of high-dimensional space geometry into each image sub-block. So the geometric mapping can be applied to the image sub-block. That is, feature extraction in Section 2. Each sub-block after feature extracting is marked as Dlw×e , l = 1, 2, . . . , q. At this time, w × e should

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:7

(a11,a12,a13,…,a1n) (a21,a22,a23,…,a2n)



(am1,am2,am3,…,amn)

Take five regions of interest

[

Amn=

Figure 4

Figure 5

a11,a12,a13,…,a1n a21,a22,a23,…,a2n



am1,am2,am3,…,amn

]

Take the region of interest.

Take any picture from the image library random, mark it as BM  N 

be less than m × n. After feature extracting the query image Amn is marked as Aw×e , where w × e for sub-block is ranks of matrix after feature extraction. , l = 1, . . . , q, the Step 4. We could reshape the matrix Aw×e according to column into a vector aw×e l w×e w×e dimension is w × e. Orthonormalized {aw×e , a , . . . , a }, obtained {β , . . . , β }. Each sub-block of 1 q q 1 2 s×t s×t s×t the image BMN can be marked as Bl , l = 1, 2, . . . , p. The Bl can be reshape to bl , l = 1, 2, . . . , p s×t by the same method, that is {bs×t 1 , . . . , bp }, where s × t for sub-block is ranks of matrix after feature extraction (Figures 5 and 6). Step 5. The measurement of similarity. We intercept a main region which can replace the query image for reducing the computing time. We adopt the way of cross vectors to measure the image, and to measure the similarity between a database image and the query image, we proposed the concept of the similarity degree R, which is given by formula (7). We can take an image shareholding δ. If R  δ, then the image is similar to the query image. In other words, there is a region in the database image that the user is interested in it. If R  δ, there is no region in the database image that the user is interested in it. s×t Definition 3. Suppose that {β1 , . . . , βq } is orthonormal, the similarity degree R between {bs×t 1 , . . . , bp } M T) and {β1 , . . . , βq }. may be given by R = arccos detdet(M , where [bis×t ,bs×t ] j ⎡ . s×t / . / . s×t / . s×t /⎤ b1 , β1 , bs×t 1 , β2 , b1 , β3 , . . . , b1 , βq / . s×t / . s×t / . s×t /⎥ ⎢. ⎥ . s×t / ⎢ bs×t , b , b , . . . , b , β , β , β , β 1 2 3 q 2 2 2 2 ⎥ M = bi , βj = ⎢ ⎥ ⎢ ... ⎦ ⎣ . s×t / . s×t / . s×t / . s×t / bp , β1 , bp , β2 , bp , β3 , . . . , bp , βq

is a p × q matrix and M T is its transpose.

Cao W M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072116:8

Take any picture from the Image library B

Read image A

Take the region of interest Amn

Image block BMN

Feature extracting Dlω×e, l=1,2,…,q

Feature extracting Bls×t, l=1,2,…,p

{β1,β2,…,βq}

{a1ω×e,a2ω×e, …,aqω×e} orthonormalized

R = arccos Figure 6

R L2 > L3 and

Wang X, et al.

Sci China Inf Sci

−0.8

−0.5

z

July 2014 Vol. 57 072201:4

−1.0

−1.5

−1.2

z

−1.0

−1.4

−2.0

−1.6 0.5

−2.5 1.0 0.5

0 −0.5 y

−1.0

−1.0

−0.5 x

0

0.5

1.0

0 −0.5

y

(a)

0.5

0

−0.5 x

(b)

−0.6

−0.5

−0.8 −1.0 z

z

−1.0

−1.5

−1.2

−2.0

−1.4

−2.5 1.0

−1.6 0.5

0 y

−0.5 −1.0

−0.5 −1.0 x

0

0.5

1.0

0.5

0 y

(c)

−0.5

−1.0

−1.0

−1.5

z

z

0

0.5

(d)

−0.5

−2.0 1.0

−0.5

−0.5 x

−1.5 −2.0

0.5

0 y

−0.5

−1.0

−0.5 −1.0 x

0

0.5

1.0

1.0

0.5

0 y

(e)

−0.5 −1.0

−0.5 −1.0 x

0

0.5

1.0

(f)

−0.6

−0.5 z

z

−1.0

−1.0

−1.4 −1.5 −1.8 1.0

0.5

0

−0.5 −0.5 −1.0 −1.0 y x (g)

0

0.5

1.0

−2.0 1.0 0.5

0 −0.5 −1.0 y

−1.0

0

−0.5

0.5

x

(h)

Figure 2 Three-dimensional views of the new system (from Chen-like attractor to Lorenz-like attractor), with (a) s = 0.4, (b) s = 0.447, (c) s = 0.5, (d) s = 0.66, (e) s = 0.7, (f) s = 0.95, (g) s = 1, and (h) s = 1.31774.

Wang X, et al.

Table 1

Sci China Inf Sci

July 2014 Vol. 57 072201:5

Equilibria and eigenvalues of three systems in comparison

Systems

Equations

Equilibria

Eigenvalues

x˙ = 10(y − x)

(0, 0, 0)

−22.8277, −2.6667, 11.8277

System (3), α = 0

y˙ = 28x − y − xz z˙ = − 83 z + xy

(±8.4853, ±8.4853, 27)

−13.8546, 0.0940 ± 0.1945i

x˙ = 35(y − x)

(0, 0, 0)

−30.8357, −3, 23.8359

z˙ = −3z + xy

(±7.9373, ±7.9373, 21)

−18.4288, 4.2140 ± 14.8846i

x˙ = −x − y

(0, 0, 0)

−0.1, 0.6544, −1.6044

System (3), α = 1

Family (5),r = 0.05

Family (5), r = 0.7

System (6), s = 0.2

System (6), s = 0.2915

System (6), s = 0.4

System (6), s = 0.66

System (6), s = 1

System (6), s = 1.31774

System (6), s = 1.34

y˙ = −7x + 28y − xz

y˙ = −x + 0.05y − xz z˙ = −0.1z + xy

(±0.3240, ∓0.3240, −1.05)

−1.05, 0 ± 0.4472i

x˙ = −x − y

(0, 0, 0)

−0.1, 1.1624, −1.4624

z˙ = −0.1z + xy

(±0.4123, ∓0.4123, −1.7)

−0.7446, 0.1723 ± 0.6534i

x˙ = −y − 0.2x

(0, 0, 0)

−1.1050, −0.2, 0.9050

z˙ = −0.2z − x2

(±0.4472, ∓0.0894, −1)

−0.8758, 0.2379 ± 0.6326i

x˙ = −y − 0.2915x

(0, 0, 0)

−1.1563, −0.2, 0.8648

z˙ = −0.2z − x2

(±0.4472, ∓0.1304, −1)

−0.9102, 0.2094 ± 0.6290i

x˙ = −y − 0.4x

(0, 0, 0)

−1.2198, −0.2, 0.8198

z˙ = −0.2z − x2

(±0.4472, ∓0.1789, −1)

−0.9549, 0.1774 ± 0.6224i

x˙ = −y − 0.66x

(0, 0, 0)

−1.3830, −0.2, 0.7230

z˙ = −0.2z − x2

(±0.4472, ∓0.2952, −1)

−1.0805, 0.1102 ± 0.5984i

x˙ = −y − x

(0, 0, 0)

−1.6180, −0.2, 0.6180

z˙ = −0.2z − x2

(±0.4472, ∓0.4472, −1)

−1.2863, 0.0431 ± 0.5560i

x˙ = −y − 1.31774x

(0, 0, 0)

−1.8564, −0.2, 0.5386

z˙ = −0.2z − x2

(±0.4472, ∓0.5893, −1)

−1.5177, 0.0 ± 0.5133i

x˙ = −y − 1.34x

(0, 0, 0)

−1.8737, −0.2, 0.5337

(±0.4472, ∓0.5993, −1)

−1.5352, −0.0024 ± 0.5104i

y˙ = −x + 0.7y − xz

y˙ = −x − xz

y˙ = −x − xz

y˙ = −x − xz

y˙ = −x − xz

y˙ = −x − xz

y˙ = −x − xz

y˙ = −x − xz z˙ = −0.2z − x2

Lyapunov dimension are calculated, where the latter is defined by DL = j +

1

j 

|Lj+1 |

i=1

Li ,

j j+1 in which j is the largest integer satisfying i=1 Li  0 and i=1 Li < 0. Note that system (6) is chaotic if L1 > 0, L2 = 0, L3 < 0 with |L1 | < |L3 |. Figure 3 shows the dependence of the largest Lyapunov exponent on the parameter s. Case 1: s ∈ [0.2, 0.4). In this case, the new system trajectory evolves from periodic orbit to a chaotic attractor which is neither Lorenz-like nor Chen-like, as shown in Figure 1 (a)–(d).The largest Lyapunov exponent becomes positive, which convincingly implies that the system is already chaotic starting from s = 0.3 (see Figure 3).

Wang X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072201:6

0.12 0.10 Lyapunov

0.08 0.08 0.08 0.08 0 0.2

Figure 3

0.4

0.6

0.8 s

1.0

1.2

1.4

The largest Lyapunov exponent of system (6) with s ∈ [0.2, 1.4].

0.5

−0.2

0.0 −0.5

−1.0

z

z

−0.6

−1.5

−1.4

−2.0

−1.8 1.0 0.5

0 y

−0.5 −1.0 (a)

Figure 4

−1.0

−0.5 x

0

0.5

−2.5 2

1

0 y

−1

−1.0

−0.5

0

0.5

x

(b)

Three-dimensional views of new system (after Lorenz-like attractor), with (a) s = 1.32, (b) s = 1.34.

Case 2: s ∈ [0.4, 1.317744]. In this case, the new system has one saddle and two saddle-foci. The two saddle-foci have one negative real eigenvalue and two conjugate complex eigenvalues with a positive real part. Moreover, the largest Lyapunov exponent is larger than zero, which is increased as the parameter s is increased from 0.4 to around 1.317744. At the same time, the attractor of this system is changed from Chen-like to Lorenz-like as the parameter s is increased, as can be seen in Figure 2 (a)–(h). Case 3: s ∈ (1.317744, 1.34). There is a critical value of parameter s around 1.317744. For this critical case, the two nontrivial equilibrium points have the same set of eigenvalues: one negative real eigenvalue and two conjugate complex eigenvalues with zero real parts. When s > 1.317744, those two conjugate complex eigenvalues have negative real parts, thus the two nontrivial equilibrium points become stable. Nevertheless, the largest Lyapunov exponent is still positive for a relative narrow range of parameter s as shown in Figure 4 (a) and (b). This is very interesting because the system is chaotic while the corresponding system has one saddle and two stable node-foci (for reference, see [13]).

4

Concluding remarks

This paper reports the finding of another simple one-parameter family of three-dimensional quadratic autonomous chaotic systems. Its algebraic structure is rather simple with only four linear terms and two quadratic terms. There is a very interesting and important question: what is the essential relationship between system (6) and the generalized Lorenz system family given in [14]. The quilibria and eigenvalues of system (6) together with the generalized Lorenz system family are shown with details in Table 1, which

Wang X, et al.

Sci China Inf Sci

July 2014 Vol. 57 072201:7

indicate that they are not topological equivalent because they have different types of eigenvalues. The quadratic terms are different from the Lorenz and the Chen systems. Any system in this new systems family is not smoothly equivalent to any system in the unified chaotic system which contains the Lorenz and the Chen systems as its two extremes. But by tuning the only parameter this new systems family can also generate both Lorenz-like and Chen-like attractors, thus further reveal the close relation between Lorenz-like and Chen-like attractors. The system (6) only has 6 items on the right. It is therefore interesting to ask whether there exists another simpler chaotic system with only 5 items for generating similar Lorenz-like and Chen-like attractors. Yes, there indeed exits such kind of system, but somehow quite different from the system discussed here, for example in the number and stability of equilibrium points. We will discuss this interesting system in future study. This paper also has further revealed that some systems with different structures can literally generate similar-shaped chaotic attractors. Therefore, for 3D autonomous systems with two quadratic terms, the relation between the system algebraic structure and system chaotic dynamics is an important and interesting issue to be further revealed, understood and analyzed. References 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Lorenz E N. Deterministic nonperiodic flow. J Atmos Sci, 1963, 20: 130–141 Chen G, Ueta T. Yet another chaotic attractor. Int J Bifur Chaos, 1999, 9: 1465–1466 Ueta T, Chen G. Bifurcation analysis of chen’s equation. Int J Bifur Chaos, 2000, 10: 1917–1931 Zhou T, Tang Y, Chen G. Chen’s attractor exists. Int J Bifur Chaos, 2004, 14: 3167–3177 Zhou T, Tang Y, Chen G. Complex dynamical behaviors of the chaotic chen’s system. Int J Bifur Chaos, 2003, 13: 2561–2574 L¨ u J, Chen G, Cheng D, et al. Bridge the gap between the Lorenz system and the Chen system. Int J Bifur Chaos, 2002, 12: 2917–2926 ˇ Celikovsk´ y S, Vaneˇ cˇ ek A. Bilinear systems and chaos. Kybernetika, 1994, 30: 403–424 ˇ Celikovsk´ y S, Chen G. On a generalized lorenz canonical form of chaotic systems. Int J Bifur Chaos, 2002, 12: 1789–1812 ˇ Celikovsk´ y S, Chen G. Hyperbolic-type generalized Lorenz system and its canonical form. In: Proceedings of the 15th Triennial World Congress of IFAC, Barcelona, 2002 ˇ Celikovsk´ y S, Chen G. On the generalized lorenz canonical form. Chaos Solit Fract, 2005, 26: 1271–1276 Wang X, Chen J, Lu J A, et al. A simple yet complex one-parameter family of generalized Lorenz-like systems. Int J Bifur Chaos, 2012, 22: 1250116 L¨ u J, Zhou T. The compound structure of Chens attractor. Int J Bifur Chaos, 2002, 12: 855–858 Yang Q, Chen G. A chaotic system with one saddle and two stable node-foci. Int J Bifur Chaos, 2008, 18: 1393–1414 Chen G, L¨ uJ. Dynamics of the Lorenz System Family: Analysis, Control and Synchronization (in Chinese). Beijing: Science Press, 2003

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072202:1–072202:14 doi: 10.1007/s11432-013-5056-6

Further results on state feedback stabilization of stochastic high-order nonlinear systems XIE XueJun1 ∗ , ZHAO CongRan1 & DUAN Na2

2School

1Institute of Automation, Qufu Normal University, Qufu 273165, China; of Electrical Engineering & Automation, Jiangsu Normal University, Xuzhou 221116, China

Received July 29, 2013; accepted October 6, 2013; published online January 16, 2014

Abstract In this paper, a combined homogeneous domination and sign function design approach is presented to state feedback control for a class of stochastic high-order nonlinear systems with time-varying delay. The use of the combined approach relaxes the restriction on nonlinear functions and makes the closed-loop system globally asymptotically stable in probability. Keywords stochastic high-order nonlinear systems, time-varying delay, state feedback, homogeneous domination method, sign function Citation Xie X J, Zhao C R, Duan N. Further results on state feedback stabilization of stochastic high-order nonlinear systems. Sci China Inf Sci, 2014, 57: 072202(14), doi: 10.1007/s11432-013-5056-6

1

Introduction

This paper considers stochastic high-order nonlinear systems with time-varying delay: i (t)dt + fi (t, x ¯i (t), x¯i (t − d(t)))dt + giT (t, x¯i (t), x ¯i (t − d(t)))dω(t), i = 1, · · · , n − 1, dxi (t) = xpi+1

dxn (t) = upn (t)dt + fn (t, x(t), x(t − d(t)))dt + gnT (t, x(t), x(t − d(t)))dω(t),

(1)

where x(t) = (x1 (t), . . . , xn (t))T ∈ Rn and u(t) ∈ R are the system state and control input, respectively, ¯i = (x1 , . . . , xi )T , d(t) : x(t − d(t)) = (x1 (t − d(t)), . . . , xn (t − d(t)))T is the time-delay state vector, x 3 R+ → [0, d] is time-varying delay, pi ∈ Rodd =: {q ∈ R+ : q  3 and q is a ratio of odd integers} is said to be the high-order of the system. ω is an m-dimensional standard Wiener process defined on the complete probability space (Ω, F , P ) with Ω being a sample space, F being a filtration, and P being a probability measure. The mappings fi : R+ × Ri × Ri → R and gi : R+ × Ri × Ri → Rm , i = 1, . . . , n, are assumed to be locally Lipschitz with fi (t, 0, 0) = 0 and gi (t, 0, 0) = 0. When pi = 1 and d(t) = 0, system (1) reduces to the well-known normal form, whose study on feedback control problem has been achieved great development in recent years (see [1–18] and the references therein). When pi > 1, some intrinsic features of (1), such as its Jacobian linearization which is neither controllable nor feedback linearizable, lead to the existing design tools which are hardly applicable to ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:2

this kind of systems. Motivated by the fruitful deterministic results in [19–22], etc., and based on stochastic stability theory in [1,4,12,23], [24–32] discussed different control problems for stochastic highorder systems with different structures. However, all the results [24–32] are based on the strict high-order restriction p1  · · ·  pn  1 and all the pi s are positive odd integers, and the drift and diffusion terms fi , gi are demanded i to satisfy |fi (t, x¯i (t), x ¯i (t − d(t)))|  a1 j=1 (|xj (t)|ν1 + |xj (t − d(t))|ν1 ), |gi (t, x¯i (t), x¯i (t − d(t)))|   a2 ij=1 (|xj (t)|ν2 + |xj (t − d(t))|ν2 ), where ν1 and ν2 are fixed points. Immediately, one may ask an interesting problem: Is it possible to relax the restriction on ν1 and ν2 ? Recently, the homogeneous domination approach was introduced in stochastic nonlinear systems to solve this problem effectively [33–36], i.e., the value ranges of ν1 and ν2 are intervals rather than fixed points, where ν1 , ν2 are functions of (τ, ri ), τ = m1 /n1 , m1 being an even integer, n1 being an odd integer and ri being an odd number. In [35,36], several interesting problems are proposed, one of which is: Is it possible to relax the assumption on τ and ri ? Under the weaker assumption, can one design a stabilizing controller? In this paper, by introducing the combined homogeneous domination and sign function design approach, and overcoming several troublesome obstacles in the design and analysis procedure, we successfully solve this problem. Main contributions are highlighted as follows: (i) In comparison with the existing results [24–36], the high-order restriction is removed, and nonlinear growth condition on τ and ri is relaxed to be any real number. (ii) A well-defined and meaningful C 2 Lyapunov functions is constructed by introducing sign function. (iii) To deal with the uncertainties, we adopt homogeneous domination approach, which will free us from the tedious procedure of handling the nonlinearities during the controller design. (iv) What needs to be stressed is how to combine homogeneous domination theory and sign function approach skillfully and give the rigorous design and analysis of controller. This paper is organized as follows: Section 2 provides some preliminary results. The design and analysis of controller are given in Section 3 and Section 4, followed by a simulation example in Section 5. Section 6 concludes this paper. The proof of several lemmas are given in the appendices.

2

Preliminary results

The following notations, definitions and lemmas are to be used throughout the paper. T R+ denotes its odd = {q ∈ R+ : q is a ratio of odd integers}. For a given vector or matrix X, X transpose, Tr{X} denotes its trace when X is square, and |X| is the Euclidean norm of a vector X. C([−d, 0]; Rn ) denotes the space of continuous Rn -value functions on [−d, 0] endowed with the norm  ·  b defined by f  = supx∈[−d,0] |f (x)| for f ∈ C([−d, 0]; Rn ); CF ([−d, 0]; Rn ) denotes the family of all F0 0 n measurable bounded C([−d, 0]; R )-valued random variables ξ = {ξ(θ) : −d  θ  0}. C i denotes the set of all functions with continuous ith partial derivatives; C 2,1 (Rn × [−d, ∞); R+ ) denotes the family of all nonnegative functions V (x, t) on Rn × [−d, ∞) which are C 2 in x and C 1 in t. K denotes the set of all functions: R+ → R+ , which are continuous, strictly increasing and vanishing at zero; K∞ is the set of all functions which are of class K and unbounded. To simplify the procedure, we sometimes denote X(t) by X for any variable X(t). Consider the following stochastic time-delay system: dx(t) = f (t, x(t), x(t − d(t)))dt + g T (t, x(t), x(t − d(t)))dω,

∀t  0,

(2)

b with initial data {x(θ) : −d  θ  0} = ξ ∈ CF ([−d, 0]; Rn ), where d(t) : R+ → [0, d] is a Borel 0 measurable function, ω is an m-dimensional standard Wiener process defined on the complete probability space (Ω, F , P ), f : R+ × Rn × Rn → Rn and g : R+ × Rn × Rn → Rm×n are locally Lipschitz with f (t, 0, 0) ≡ 0, g(t, 0, 0) ≡ 0. 2,1 system (2), the differential operator Definition 1 ([12]). For any given V (x(t), > 2t) ∈ C? associated with ∂V ∂V 1 ∂ V T 1 ∂2V T L is defined as LV = ∂t + ∂x f + 2 Tr g ∂x2 g , where 2 Tr{g ∂x2 g } is called the Hessian term of L.

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:3

 Definition 2 ([37]). The homogeneous p-norm is defined as xΔ,p = ( ni=1 |xi |p/ri )1/p , ∀x ∈ Rn , where p  1 is a constant, ri > 0 is the homogeneous weight of xi . For simplicity, in this paper, we choose p = 2 and write xΔ for xΔ,2 . Lemma 1 ([12]). For system (2), if there exist a function V (x(t), t) ∈ C 2,1 (Rn × [−d, ∞); R+ ), two class   K∞ functions α1 , α2 and a class K function α3 such that α1 (|x(t)|)  V (x(t), t)  α2 sup−ds0 |x(t + s)| and LV (x(t), t)  −α3 (|x(t)|), then there exists a unique solution on [−d, ∞) for (2), the equilibrium x(t) = 0 is GAS in probability and P {limt→∞ |x(t)| = 0} = 1. Lemma 2 ([37]). Given a dilation weight  = (r1 , . . . , rn ), suppose that V1 (x) and V2 (x) are homogeneous functions of degree τ1 and τ2 , respectively. Then V1 (x)V2 (x) is also homogeneous with respect to the same dilation weight . Moreover, the homogeneous degree of V1 · V2 is τ1 + τ2 . Lemma 3 ([37]). Suppose that V : Rn → R is a homogeneous function of degree τ0 with respect to the dilation weight . Then (i) ∂V /∂xi is homogeneous of degree τ0 − ri with ri being the homogeneous weight of xi ; (ii) there is a constant c such that V (x)  cxτ 0 . Moreover, if V (x) is positive definite, then V (x)  cxτ 0 , where c is a positive constant. Lemma 4 ([38]). Let c, d be positive constants. For any positive number γ¯, |x|c |y|d  c d ¯ − d |y|c+d . c+d γ

c ¯ |x|c+d c+d γ

+

1

Lemma 5 ([38]). For any x, y ∈ R, if p  1 is a constant, then |x + y|p  2p−1 |xp + y p |, (|x| + |y|) p  1 1 1 1 |x| p + |y| p  21− p (|x| + |y|) p . 1

1

1

Lemma 6. For any x, y ∈ R+ , and p  1 is a constant, then: i) |x−y|p  |xp −y p |, ii) |x p −y p |  |x−y| p , iii) xp + y p  (x+ y)p . Moreover, there is a constant c > 0 such that |xp − y p |  c|x− y||(x− y)p−1 + y p−1 |. Proof. See Appendix A.

3 3.1

Design of state feedback controller Assumptions

In this paper, we need the following assumptions: Assumption 1. For i = 1, . . . , n, there exist positive constants a1 , a2 and τ ∈ (dM , +∞) such that |fi (t, x¯i (t), x ¯i (t − d(t)))|  a1

i ri +τ ri +τ  |xj (t)| rj + |xj (t − d(t))| rj , j=1

i 2ri +τ 2ri +τ  |gi (t, x¯i (t), x ¯i (t − d(t)))|  a2 |xj (t)| 2rj + |xj (t − d(t))| 2rj , j=1

where r1 = 1, ri+1 = (ri + τ )/pi , i = 1, . . . , n; dM = max1in {di }, d1 =

2 − p1 ···p1 n−1 , n−1 1 + s=1 ps ···p1 n−1

di =

1+

2 1 p1 ···pi−1 − p1 ···pn−1  i−1 1 2 s=1 ps ···pn−1 − s=1 ps ···pi−1

n−1

,

i = 2, . . . , n.

˙  γ < 1 for a constant γ. Assumption 2. The time-varying delay d(t) satisfies d(t) Remark 1. Assumption 1 includes the assumptions in all the related results [27–31,33–36]. Specifically, when pi = p and d(t) = 0, by choosing τ = p − 1, Assumption 1 reduces to the same assumption i i p+1 |fi (¯ xi )|  a1 j=1 |xj |p , |gi (¯ xi )|  a2 j=1 |xj | 2 as in [27–31]. When d(t) = 0, Assumption 1 is Assumption 1 in [33,34]. When pi = p, Assumption 1 is changed into Assumption 1 in [35,36].

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:4

Remark 2. Let us explain the problem in Section 1 detailedly. In [33–36], it is assumed that τ = m1 /n1 with m1 being an even integer and n1 being an odd integer. Then ri = (ri−1 + τ )/pi−1 is always an odd number. A natural problem is: Under the weaker assumption of τ and ri being any real number, can a state feedback controller be constructed to make the closed-loop system globally asymptotically stable in probability? In this paper, we will introduce the homogeneous domination approach and sign function design technique to solve this problem. Before giving the design of controller, we give two key lemmas. + Lemma 7. For any pi ∈ R3 odd and τ ∈ (dM , +∞), there exist parameters l > 1 and μ ∈ Rodd such that rn + τ  μ > max1in {2ri , (ri + τ )/l}.

Proof. See Appendix B. Lemma 8. Function sgn(s)|s|ν is C 2 for any ν > 2. Proof. See Appendix C. We first introduce the following coordinates transformation: η1 = x1 ,

ηi =

xi , L κi

v=

u , Lκn+1

(3)

where κ1 = 0, κi+1 = (κi + 1)/pi , i = 1, . . . , n, L > 1 is a designed constant. By (3), (1) can be written as   fi gT pi dηi = Lηi+1 + κi dt + iκi dω, i = 1, · · · , n − 1, L L (4)   fn gnT pn dηn = Lv + κn dt + κn dω. L L In the design and analysis, for simplicity, we use [χ]m to denote sgn(χ)|χ|m for any χ ∈ R, m ∈ R+ . 3.2

State feedback controller design

We construct a state feedback controller for system (4) by using the combined homogeneous domination and sign function design approach. The underlying idea of homogeneous domination approach is that the homogeneous controller is first developed without considering the drift and diffusion terms, and then a scaling gain is introduced to state feedback controller to dominate the drift and diffusion terms. The sign function design technique permits removal of the restriction on τ and ri , and is also important for choosing a C 2 , positive definite and proper Lyapunov function. μ μ μ q1 η Step 1: Introduce ξ1 = [η1 ] r1 and choose V1 = η∗1 [[s] r1 − [η1∗ ] r1 ] μ ds, where η1∗ = 0, q1 = 4lμ − τ − r1 . 1

q1

With Definition 1 and (4), it can be verified that LV1 = L[ξ1 ] μ η2p1 + F1 + G1 , where F1 = G1 = 12 Tr{g1 ∂∂ηV21 g1T }. By choosing the virtual controller 2

∂V1 ∂η1 f1 ,

1

r2

r2

η2∗ = −α1μ [ξ1 ] μ ,

μ

r2 p 1 α1 = c11 ,

c11 > 0,

(5)

q1

one has LV1  −Lc11 |ξ1 |4l + L[ξ1 ] μ (η2p1 − η2∗p1 ) + F1 + G1 . Step i (i = 2, . . . , n): With Lemma 7, we can obtain the following property. Lemma 9. Assume that at step i − 1, there exist a C 2 , positive definite and proper Lyapunov function Vi−1 and a set of virtual controllers η1∗ , . . . , ηi∗ defined by η1∗ = 0,

rk

rk

μ ηk∗ = −αk−1 [ξk−1 ] μ ,

μ

μ

∗ ξk−1 = [ηk−1 ] rk−1 − [ηk−1 ] rk−1 ,

k = 2, . . . , i,

(6)

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:5

such that LVi−1  −L

i−1 

ci−1,j |ξj |4l + L[ξi−1 ]

qi−1 μ

j=1

p

∗pi−1

(ηi i−1 − ηi

) + Fi−1 + Gi−1 ,

(7)

where αj , ci−1,j are positive constants, Fi−1

i−1  ∂Vi−1 fj = , ∂ηj Lκj j=1

Gi−1

 @ i−1  gp ∂ 2 Vi−1 gqT 1 Tr , = 2 Lκp ∂ηp ∂ηq Lκq p,q=1

qi−1 = 4lμ − τ − ri−1 .

Then the ith Lyapunov function ) Vi = Vi−1 + Ui ,

Ui =

ηi

ηi∗

* μ + qi μ ∗ ri μ ri [s] − [ηi ] ds r

(8)



∗ is C 2 , positive definite and proper, and there is ηi+1 = −αi i+1 [ξi ]ri+1 /μ such that

LVi  −L

i 

qi

pi ∗pi cij |ξj |4l + L[ξi ] μ (ηi+1 − ηi+1 ) + Fi + Gi ,

j=1

where

i  ∂Vi fj , Fi = ∂ηj Lκj j=1

i  1 Gi = Tr 2 p,q=1



gp ∂ 2 Vi gqT Lκp ∂ηp ∂ηq Lκq

(9)

@ .

Proof. See Appendix D. Hence at step n, we obtain the control law v=

∗ ηn+1

rn+1 μ

= −αn

[ξn ]

rn+1 μ

+ rn+1 * μ μ μ μ rn−1 r1 r n =− α ¯ n [ηn ] + α ¯ n−1 [ηn−1 ] + ···+ α ¯ 1 [η1 ] ,

(10)

such that LVn  −L

n 

cni |ξi |4l + Fn + Gn ,

(11)

i=1

where α ¯ i = αn · · · αi , i = 1, . . . , n, cn1 , . . . , cnn are positive constants,  @ n n   gp ∂ 2 Vn gqT ∂Vn fj 1 Fn = Tr . , Gn = ∂ηj Lκj 2 Lκp ∂ηp ∂ηq Lκq p,q=1 j=1 Systems (4) and (10) can be written as the following compact form: dη = LE(η)dt + F (t, η, η(t − d(t)))dt + GT (t, η, η(t − d(t)))dω,

(12)

p

where η = (η1 , . . . , ηn )T , E = (η2p1 , . . . , ηnn−1 , v pn )T , F = (f1 , f2 /Lκ2 , . . . , fn /Lκn )T , G = (g1 , g2 /Lκ2 , . . . , gn /Lκn ). Introducing the dilation weight  = (r1 , r2 , . . . , rn ), by (6) and (8), one obtains   ! for

Vn (Δε (η)) =

n )  i=1

εri ηi

εri ηi∗

= ε4lμ−τ

*

n ) ηi  i=1

μ

μ

[s] ri − εμ [ηi∗ ] ri

ηi∗

*

μ

+ qμi

μ

[θ] ri − [ηi∗ ] ri

+

η1 ,...,ηn

ds

s=εri θ

=

n )  i=1

qi μ

ηi ηi∗

* + qμi μ μ εμ [θ] ri − εμ [ηi∗ ] ri εri dθ

dθ = ε4lμ−τ Vn (η),

(13)

from which and the definition of weighted homogeneity in [37], we know that Vn (η) is homogeneous of degree 4lμ − τ .

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:6

Remark 3. Let us explain the importance of sign function for choosing a Lyapunov function. Owing to the general form of τ and ri , if we still use the method in [35,36] to choose Lyapunov function V (x), the positive definiteness of V (x) cannot be guaranteed. Consider an example: 5 1 sin x1 (t)|x1 (t − d(t))| 2 dω(t), 12 11 17 1 1 dx2 (t) = u 3 (t)dt + x31 (t) sin x2 (t − d(t))dt + | sin x2 (t)| 8 dω(t), 50 12

dx1 (t) = x32 (t)dt +

(14)

where d(t) = 16 (1 + sin t). By Lemma 7, we have d1 = 54 , d2 = 12 , τ ∈ ( 54 , ∞). Choosing τ = 3, l = 14 11 , r1 +τ 4 44 43 μ = 11 , and from r = 1, one has r = = , q = 4lμ − τ − r = , q = 4lμ − τ − r = . By 1 2 1 1 2 2 3 p1 3 3 3 5

13

13

17

Lemma 4, one gets |f1 | = 0, |g1 |  a2 |x1 (t − d(t))| 2 , |f2 |  a1 (|x1 | 3 + |x2 (t − d(t))| 4 ), |g2 |  a2 |x2 | 8 , 1 1 ˙ = 1 cos t < 1. Introduce Assumption 1 is satisfied with a1 = 50 , a2 = 12 . Assumption 2 holds with d(t) 6 η1 = x1 ,

η2 =

x2 L

1 3

,

v=

u 4

L 11

.

(15)

In [35,36], V1 (η1 ) is chosen as ) V1 (η1 ) =

η1

η1∗

) 11 4 ∗ 11 3 3 s − η1 ds =

η1

44

s 3 ds = 0

3 47 η3, 47 1

which is not a positive definite function, while in this paper, we choose ) η1 * + 11 11 4 47 3 [s] 3 − [η1∗ ] 3 ds = V1 (η1 ) = |η1 | 3 , ∗ 47 η1 whose positive definiteness can be guaranteed. By introducing the sign function in the controller design, Lemma 9 successfully solves the positive definiteness problem. But the introduction of sign function will unavoidably produce other difficulties in the design and analysis of controller. Please see Remark 4.

4

Stability analysis

We state the main result in this paper. Theorem 1. If Assumptions 1 and 2 hold for system (1), under the state feedback controller u = Lκn+1 v and (10), then the closed-loop system has a unique solution on [−d, ∞), and the equilibrium at the origin of the closed-loop system is GAS in probability. Proof. We prove Theorem 1 by four steps. Step 1: We first prove that υ pn (η) in (10) is C 1 . From (10) and (C2), one has −μ $ $ rn +τ μ μ−ri μ μ ∂v pn (η) rn + τ $ $ μ =− α ¯i |ηi | ri $¯ ¯ n−1 [ηn−1 ] rn−1 + · · · + α ¯ 1 [η1 ] r1 $ , αn [ηn ] rn + α ∂ηi ri

(16)

where i = 1, . . . , n. By Lemma 7, one obtains (μ − ri )/ri > 1, μ/ri > 2, (rn + τ − μ)/μ  0, from which and (16), we know that ∂v pn (η)/∂ηi is continuous. Then v pn (η) is C 1 . Since fi (·), gi (·) (i = 1, . . . , n) are assumed to be locally Lipschitz, the system consisting of (4) and (10) satisfies the locally Lipschitz condition. Step 2: Consider the following Lyapunov-Krasovskii functional: V (η(t)) = Vn (η(t)) +

c¯02 + c¯03 1−γ

)

t

t−d(t)

η(σ)4lμ

dσ,

(17)

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:7

where c¯02 and c¯03 are positive parameters to be determined. It is easy to verify that V (η(t)) is C 2 on η(t). Since Vn (η(t)) is continuous, positive definite and radially unbounded, by Lemma 4.3 in [39], there exist two class K∞ functions β1 and α21 such that β1 (|η(t)|)  Vn (η(t))  α21 (|η(t)|).

(18)

By Lemma 3 and Lemma 4.3 in [39], there exist positive constants c and c¯, class K∞ functions α22 and α ¯ 22 , and a positive definite function U (η(t)) whose homogeneous degree is 4lμ such that ¯η(t)4lμ cη(t)4lμ

 U (η(t))  c

,

α22 (|η(t)|)  U (η(t))  α ¯ 22 (|η(t)|).

From d(t) : R+ → [0, d] and (19), it follows that ) ) t c¯02 + c¯03 t η(σ)4lμ dσ  c ˜ α ¯ 22 (|η(σ)|)dσ

1−γ t−d(t) t−d(t)

σ=s+t



)

0

c˜ −d



 c sup α ¯ 22 (|η(s + t)|)  α22 −ds0

(19)

α ¯ 22 (|η(s + t)|)d(s + t)

 sup |η(s + t)| ,

−ds0

(20)

where c˜, c are positive constants, and α22 is a class K∞ function. Since |η(t)|  sup−ds0 |η(s + t)|, α21 (|η(t)|)  α21 (sup−ds0 |η(s + t)|), setting β2 = α21 + α22 , by (17), (18) and (20), one gets   (21) β1 (|η(t)|)  V (η(t))  β2 sup |η(s + t)| . −ds0

Step 3: By Lemmas 2, 3 and (11), there exists a positive constant c01 such that ∂Vn (η(t)) LE(η(t))  −c01 Lη(t)4lμ

. ∂η(t)

(22)

By Definition 2, Assumption 1, (3) and L > 1, one has  i  $ $ i ri +τ ri +τ   $ fi (t, η¯i (t), η¯i (t − d(t))) $ a 1 κ κ r r j j $ $ |L ηj (t)| j + |L ηj (t − d(t))| j $ $ L κi Lκi j=1 j=1  i  i ri +τ ri +τ   1−γ r r i1  δ¯1 L |ηj (t)| j + |ηj (t − d(t))| j j=1

 δ1 L

1−γi1

j=1

(η(t)r i +τ

+ η(t − d(t))r i +τ ),

(23)

i−1 i−1 where δ¯1 , δ1 and γi1 = 1/(1 + s=1 pp1s ···p ···pi−1 τ )  1 are positive constants. According to Lemmas 2, 3, 5 and (23), one obtains $ $ $$ $ n $  $ ∂Vn (η(t)) $ $ ∂Vn (η(t)) $ $ fi (t, η¯i (t), η¯i (t − d(t))) $ $ $ $ $ $ $ $ ∂η(t) F (t, η(t), η(t − d(t)))$  $ ∂ηi (t) $ $ $ κi L i=1  c˜02 L1−¯γ0

n  i=1

 L

1−¯ γ0

η(t)q i (η(t)r i +τ + η(t − d(t))r i +τ )

(c02 η(t)4lμ ¯02 η(t − d(t))4lμ

+c

),

(24)

where c02 , c¯02 , c˜02 and γ¯0 = min1in {γi1 } are positive constants. Similar to (23), there exist positive  p1 ···pi−1 −1  12 such that constants δ2 and γi2 = [2 + 2 i−1 s=1 ps ···pi−1 τ ] $ $ τ τ $ gi (t, η¯i (t), η¯i (t − d(t))) $ $ $  δ2 L 21 −γi2 η(t)ri + 2 + η(t − d(t))ri + 2 . (25)

$ $ κ L i By Lemmas 2, 3, 5 and (25), one has A B 1 ∂ 2 Vn (η(t)) T Tr G(t, η(t), η(t − d(t))) G (t, η(t), η(t − d(t))) 2 ∂η 2 (t)

Xie X J, et al.

 cˆ03

July 2014 Vol. 57 072202:8

$ n $ 2 $ $ g $  $ ∂ Vn (η(t)) $ $$ gi $ $ j $ $·$ $ (t, η ¯ (t), η ¯ (t − d(t))) (t, η ¯ (t), η ¯ (t − d(t))) · $ $ $ i i j j $ ∂ηi (t)∂ηj (t) $ Lκi κj L i,j=1

 c˜03 L1−˜γ0 L

Sci China Inf Sci

1−˜ γ0



n 

q −rj

i,j=1

η(t) i

r +τ r +τ r +τ r +τ η(t) i 2 + η(t − d(t)) i 2 η(t) j 2 + η(t − d(t)) j 2

¯03 η(t − d(t))4lμ c03 η(t)4lμ ,

+c

(26)

where γ˜0 = min1i,jn {γi2 + γj2 }, c03 , c¯03 , cˆ03 , c˜03 are positive constants. By Assumption 2, Definition 1, (12), (17), (22), (24) and (26), one has ∂Vn (η(t)) ∂Vn (η(t)) LE(η(t)) + F (t, η(t), η(t − d(t))) ∂η(t) ∂η(t) A B 1 ∂ 2 Vn (η(t)) T + Tr G(t, η(t), η(t − d(t))) G (t, η(t), η(t − d(t))) 2 ∂η 2 (t)   1 4lμ + (¯ c02 + c¯03 )L1−γ0 η(t)4lμ − η(t − d(t))

1−γ   c¯02 + c¯03  −c01 Lη(t)4lμ L1−γ0 η(t)4lμ

+ c02 + c03 +

1−γ     c¯02 + c¯03 = −L c01 − c02 + c03 + L−γ0 η(t)4lμ

, 1−γ

LV (η(t)) 

γ0 = min{¯ γ0 , γ˜0 } < 1. Since c01 is a constant independent of c02 , c¯02 , c03 and c¯03 , by choosing ⎧ 1 ⎫ ⎨ c02 + c03 + c¯02 +¯c03 γ0 ⎬ 1−γ L > L∗  max ,1 , ⎩ ⎭ c01

(27)

(28)

c0 (27) becomes LV (η(t))  −c0 η(t)4lμ

, from which and (19), one has LV (η(t))  − c¯ α22 (|η(t)|). By steps 1–3 and Lemma 1, then the system consisting of (4) and (10) has a unique solution on [−d, ∞), η(t) = 0 is GAS in probability and P {limt→∞ |η(t)| = 0} = 1. Step 4: Since (3) is an equivalent transformation, the closed-loop system consisting of (1), u = Lκn+1 v and (10) has the same properties as system (4) and (10).

Remark 4. We emphasize four points: 1) One of the main obstacles in the design and analysis of controller is that the appearance of highorder, time-varying delay, sign function and Hessian term will inevitably produce many more nonlinear terms and inequalities. How to deal with them is not a trivial work. 2) It is not easy to choose a C 2 , positive definite and proper Lyapunov function due to the high-order, time-varying delay, the general form of τ and ri , and the appearance of sign function and Hessian term. 3) By reasonably selecting parameters l and μ in Lemma 7, Lemma 9 effectively avoids the zero-division problem of

∂ 2 [ηi∗ ]μ/ri ∂ηj2

= −αi−1 · · · αj

μ(μ−rj ) [ηj ](μ−2rj )/rj . rj2

We need to further emphasize that the nonzero-

division problem and the locally Lipschitz condition (see step 1 in the proof of Theorem 1) need to be guaranteed simultaneously, which significantly increases the difficulty of this work. 4) The rigorous proof of Lemmas 7–9 and Theorem 1 is difficult.

5

A simulation example

In the simulation, we consider the example (14). 11/3 and choosing V1 (η1 ) = Design Defining ξ1 = η1 of controller:

3 47/3 , 47 |η1 |

+ L[ξ1 ]4 η23 − η2∗ 3 + F1 + G1 , where 1

3 η2∗ = −c11 [ξ1 ] 11 , 4

F1 =

∂V1 f1 , ∂η1

G1 =

56/11

we obtain LV1  −c11 Lξ1

A B 1 ∂ 2 V1 T Tr g1 g . 2 ∂η12 1

Xie X J, et al.

Sci China Inf Sci

30

150 u

x1 x2

25

Control input

20 States

July 2014 Vol. 57 072202:9

15 10 5

100 50 0

0 −5

0.0001 0.01 Time (s) Figure 1

1

−50

100

) V2 (¯ η2 ) = V1 (η1 ) +

and defining ξ2 = [η2 ]



1

100

The responses of the closed-loop system (14), (31).

Choosing

11/4

0.0001 0.01 Time (s)

[η2∗ ]11/4 ,

η2

η2∗

* 11 + 43 11 11 [s] 4 − [η2∗ ] 4 ds,

a direct calculation leads to

LV2  −c11 L|ξ1 | 11 + L[ξ2 ] 11 v 3 + L[ξ1 ]4 (η23 − η2∗ 3 ) + F2 + G2 ) η2 $ 11 $ 32 11 $ 11 43 ∂[η ∗ ] 4 $ 11 − L 2 η23 $[s] 4 − [η2∗ ] 4 $ ds, 11 ∂η1 η2∗ 56

with

43

2  ∂V2 fj F2 = , ∂ηj Lκj j=1

11

2  1 Tr G2 = 2 p,q=1



gp ∂ 2 V2 gqT Lκp ∂ηp ∂ηq Lκq

(29)

@ .

By Lemmas 4–6, one has $ 4 3 $ 56 $[ξ1 ] (η2 − η2∗ 3 )$  l211 |ξ1 | 56 11 + σ 21 |ξ2 | 11 , $ $ ) $ 32 $ $ 43 ∂[η2∗ ] 114 3 η2 $$ 11 11 $ 11 56 56 $ ∗ 4 $− 4 11 11 $ 11 ∂η1 η2 ∗ $[s] − [η2 ] $ ds$$  l212 |ξ1 | + σ22 |ξ2 | , η2

(30)

where σ21 =

3 14



11 14ι1

 11  3

1 12 + 2α111 11

 14 3 ,

σ22 =

6 7



1 7ι2

 17 

43 8 2 11 α1 3

 76 +

9 14



5 14ι3

 59 

8 43 23 α 11 2 11 3 1

 14 9 ,

1/11

2 α1 + ι1 , ι2 + ι3 = l212 , l211 , l212 are positive design constants. l211 = 11 In simulation, we choose c11 = 2, c22 = 1.9025, ι1 = 7/11, ι2 = 1/11, ι3 = 1, l212 = 12/11 to 39/121 11/3 39/121 [[η2 ]11/4 + obtain σ21 = 108.9754, σ22 = 472.1221 and η3∗ = −α2 α1 η1 ] . Substituting (30)   56/11 56/11 43/11 11/3 ∗ 11/3 v + F2 + G2 , where + L[ξ2 ] + c22 |ξ2 | − η3 into (29) leads to LV2  −L c21 |ξ1 |

c21 = c11 − l211 − l212 = 0.08, α2 = (c22 + σ21 + σ22 )11/13 , α2 = 218.8674. By (15), one obtains the following controller: 39 121 39 11 4 11 u = −α2121 L 11 α1 η13 + [η2 ] 4 . (31) The value range of the scaling gain L: Exactly following (22)–(27) in Section 4, we choose c01 = 1.9025, c02 = 2, c¯02 = 1, c03 = 0.8624, c¯03 = 0.0428, γ0 = 1/4, γ = 1/6. Then the critical value ⎧ 1 ⎫ ⎨ c02 + c03 + c¯02 +¯c03 γ0 ⎬ 1−γ , 1 = 21.8604. L∗ = max ⎩ ⎭ c01 In the simulation, we choose L = 22, the initial values x1 (0) = −5, x2 (0) = 5 and the sampling period = 1. Figure 1 demonstrates the effectiveness of state feedback controller.

Xie X J, et al.

6

Sci China Inf Sci

July 2014 Vol. 57 072202:10

Concluding remark

In this paper, a combined homogeneous domination and sign function design approach is used to the state feedback control for stochastic high-order nonlinear systems. There still exist some problems to be investigated: For Assumption 1, when 1 < pi < 3, how to determine the range of τ and design a stabilizing state feedback controller? Another problem is how to find a practical example for system (1) with Assumption 1.

Acknowledgements This work was supported by Program for Scientific Research Innovation Team in Colleges and Universities of Shandong Province, National Natural Science Foundation of China (Grant Nos. 61273125, 61104222), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20103705110002), Shandong Provincial Natural Science Foundation of China (Grant No. ZR2012FM018), and Project of Taishan Scholar of Shandong Province.

References 1 Krsti´ c M, Deng H. Stabilization of uncertain nonlinear systems. New York: Springer, 1998 2 Deng H, Krsti´ c M. Output-feedback stochastic nonlinear stabilization. IEEE Trans Automat Contr, 1999, 44: 328–333 3 Pan Z G, Basar T. Backstepping controller design for nonlinear stochastic systems under a risk-sensitive cost criterion. SIAM J Contr Optimizat, 1999, 37: 957–995 4 Deng H, Krsti´ c M, Williams R J. Stabilization of stochastic nonlinear systems driven by noise of unknown covariance. IEEE Trans Automat Contr, 2001, 46: 1237–1253 5 Liu Y G, Pan Z G, Shi S J. Output feedback control design for strict-feedback stochastic nonlinear systems under a risk-sensitive cost. IEEE Trans Automat Contr, 2003, 48: 509–514 6 Liu Y G, Zhang J F. Practical output-feedback risk-sensitive control for stochastic nonlinear systems with stable zero-dynamics. SIAM J Contr Optimizat, 2006, 45: 885–926 7 Wu Z J, Xie X J, Zhang S Y. Stochastic adaptive backstepping controller design by introducing dynamic signal and changing supply function. Int J Contr, 2006, 79: 1635–1646 8 Wu Z J, Xie X J, Zhang S Y. Adaptive backstepping controller design using stochastic small-gain theorem. Automatica, 2007, 43: 608–620 9 Liu S J, Zhang J F, Jiang Z P. Decentralized adaptive output-feedback stabilization for large-scale stochastic nonlinear systems. Automatica, 2007, 43: 238–251 10 Liu S J, Zhang J F. Output-feedback control of a class of stochastic nonlinear systems with linearly bounded unmeasurable states. Int J Robust Nonlinear Contr, 2008, 18: 665–687 11 Liu S J, Jiang Z P, Zhang J F. Global output-feedback stabilization for a class of stochastic non-minimum phase nonlinear systems. Automatica, 2008, 44: 1944–1957 12 Liu S J, Ge S Z, Zhang J F. Adaptive output-feedback control for a class of uncertain stochastic nonlinear systems with time delays. Int J Contr, 2008, 81: 1210–1220 13 Yu X, Xie X J. Output feedback regulation of stochastic nonlinear systems with stochastic iISS inverse dynamics. IEEE Trans Automat Contr, 2010, 55: 304–320 14 Yu X, Xie X J, Duan N. Small-gain control method for stochastic nonlinear systems with stochastic iISS inverse dynamics. Automatica, 2010, 46: 1790–1798 15 Yu X, Xie X J, Wu Y Q. Further results on output-feedback regulation of stochastic nonlinear systems with SiISS inverse dynamics. Int J Contr, 2010, 83: 2140–2152 16 Chen W S, Jiao L C, Li J, et al. Adaptive NN backstepping output-feedback control for stochastic nonlinear strictfeedback systems with time-varying delays. IEEE Trans Syst Man Cybern Part B-Cybern, 2010, 40: 939–950 17 Duan N, Xie X J. Further results on output-feedback stabilization for a class of stochastic nonlinear systems. IEEE Trans Automat Contr, 2011, 56: 1208–1213 18 Zhao C R, Xie X J. Output feedback stabilization using small-gain method and reduced-order observer for stochastic nonlinear systems. IEEE Trans Automat Contr, 2013, 58: 523–529 19 Qian C J. Global synthesis of nonlinear systems with uncontrollable linearization. Dissertation for the Doctoral Degree. Cleveland: Case Western Reserve University, 2001 20 Qian C J. A homogeneous domination approach for global output feedback stabilization of a class of nonlinear system. In: American Control Conference, Portland, 2005. 4708–4715

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:11

21 Polendo J, Qian C J. A generalized homogeneous domination approach for global stabilization of inherently nonlinear systems via output feedback. Int J Robust Nonlinear Contr, 2007, 17: 605–629 22 Lei H. Universal output feedback control of nonlinear systems. Dissertation for the Doctoral Degree. Cleveland: Case Western Reserve University, 2008 23 Mao X R. Stochastic Differential Equations and Their Applications. Chichester: Horwood Publishing, 2007 24 Xie X J, Tian J. State-feedback stabilization for high-order stochastic nonlinear systems with stochastic inverse dynamics. Int J Robust Nonlinear Contr, 2007, 17: 1343–1362 25 Tian J, Xie X J. Adaptive state-feedback stabilization for high-order stochastic nonlinear systems with uncertain control coefficients. Int J Contr, 2007, 80:1503–1516 26 Xie X J, Tian J. Adaptive state-feedback stabilization of high-order stochastic systems with nonlinear parameterization. Automatica, 2009, 45: 126–133 27 Liu L, Xie X J. State-feedback stabilization for stochastic high-order nonlinear systems with SISS inverse dynamics. Asian J Control, 2012, 14: 207–216 28 Xie X J, Li W Q. Output-feedback control of a class of high-order stochastic nonlinear systems. Int J Contr, 2009, 82: 1692–1705 29 Chen W S, Wu J. State-feedback stabilization for a class of stochastic time-delay nonlinear systems. Int J Robust Nonlinear Contr, 2012, 22: 1921–1937 30 Liu L, Xie X J. Output-feedback stabilization for stochastic high-order nonlinear systems with time-varying delay. Automatica, 2011, 47: 2772–2779 31 Xie X J, Duan N, Yu X. State-feedback control of high-order stochastic nonlinear systems with SiISS inverse dynamics. IEEE Trans Automat Contr, 2011, 56: 1921–1926 32 Li W Q, Xie X J. Inverse optimal stabilization for stochastic nonlinear systems whose linearizations are not stabilizable. Automatica, 2009, 45: 498–503 33 Li W Q, Jing Y W, Zhang S Y. Output-feedback stabilization for stochastic nonlinear systems whose linearizations are not stabilizable. Automatica, 2010, 46: 752–760 34 Li W Q, Xie X J, Zhang S Y. Output-feedback stabilization of stochastic high-order nonlinear systems under weaker conditions. SIAM J Contr Optimizat, 2011, 49: 1262–1282 35 Xie X J, Liu L. Further result on output feedback stabilization for stochastic high-order nonlinear systems with timevarying delay. Automatica, 2012, 48: 2577–2586 36 Xie X J, Liu L. A homogeneous domination approach to state feedback of stochastic high-order nonlinear systems with time-varying delay. IEEE Trans Automat Contr, 2013, 58: 494–499 37 Qian C J, Li J. Global output feedback stabilization of upper-triangular nonlinear systems using a homogeneous domination approach. Int J Robust Nonlinear Contr, 2006, 16: 441–463 38 Lei H, Lin W. Robust control of uncertain systems with polynomial nonlinearity by output feedback. Int J Robust Nonlinear Contr, 2009, 19: 692–723 39 Khalil H K. Nonlinear Systems. Beijing: Publishing House of Electronics Industry, 2007

Appendix A

Proof of Lemma 6

Without loss of generality, we assume x  y  0. When y  x  0, the same conclusion can be reached as well. i) Let f (p) = xp −y p −(x−y)p, then f  (p) = (ln x)xp −(ln y)y p −(ln(x−y))(x−y)p  (ln x)(xp −y p −(x−y)p) = (ln x)f (p). By comparison principle, one has f (p)  e(ln x)(p−1) f (1) = 0, that is, (x − y)p  xp − y p . Let x = m1/p , y = n1/p . Then from i), ii) follows directly. iii) is a direct result of i). iv) By Lemma 4 and x  y  0, we have pxp−1 y + (p − 1)y p  (p − 1)xp + py p  (p − 1)xp +pxy p−1 , that is, xp− p y  p(x − y)(xp−1 + y p−1 ). By Lemma 5, we obtain xp−1 = ((x − y) + y)p−1  max{1, 2p−2 } (x − y)p−1 + y p−1 . Hence, there exists a constant c > 0, such that (xp − y p )  c(x − y)((x − y)p−1 + y p−1 ) holds.

Appendix B

Proof of Lemma 7

We first prove that for any pi ∈ R3 odd and τ ∈ (dM , +∞), there is an l > 1 such that  ri + τ  . (B1) rn + τ > max 2ri , 1in l  1 1 (i) From r1 = 1, ri+1 = (ri + τ )/pi , one has ri = i−1 s=1 ps ···pi−1 τ + p1 ···pi−1 , i = 2, . . . , n. When i = 1, by 2 − p1 ···p1 n−1 d1 =  1 1 + n−1 s=1 ps ···pn−1

Xie X J, et al.

and τ > d1 in Assumption 1, we have  rn + τ >

1+

Sci China Inf Sci

n−1  s=1

1 ps · · · pn−1

July 2014 Vol. 57 072202:12

d1 +

1 = 2r1 . p1 · · · pn−1

(B2)

When i = 2, . . . , n, due to pi  3, one gets 2

i−1  s=1

i−1 i−s ∞ s n−1    1 1 1 1 2 2 =1 di in Assumption 1, one has  n−1 i−1 i−1    1 1 2 2 − τ+ di + rn + τ > 1 + p · · · p p · · · p p · · · p p · · · pn−1 s n−1 s i−1 s i−1 1 s=1 s=1 s=1 =

i−1  s=1

2 2 τ+ = 2ri . ps · · · pi−1 p1 · · · pi−1

(B3)

(ii) For any pi ∈ R3 odd and τ ∈ (dM , +∞), one can choose l > 1 to satisfy rn + τ > max1in {(ri + τ )/l}. Combining (i) and (ii), one has (B1). According to the denseness of real number, there exists μ ∈ R+ odd obviously such that rn + τ  μ > max1in {2ri , (ri + τ )/l} holds for any pi ∈ R3 odd and τ ∈ (dM , ∞).

Appendix C

Proof of Lemma 8

Let f (s) = sgn(s)|s|ν . Then for any ν > 2, it is obvious that f (s) is C 1 on (−∞, 0) and (0, +∞), respectively. For s = 0,   f+ (0) = lim sν−1 = 0 = lim (−s)ν−1 = f− (0),

(C1)

s→0−

s→0+

so f (s) is C 1 for any s ∈ R, and f  (s) =



if s  0 νsν−1 , ν(−s)ν−1 , if s < 0

= ν|s|ν−1 ,

∀s ∈ R.

Similar to (C1), it is straightforward to show that f (s) is twice differentiable and ν(ν − 1)sν−2 , if s  0 = ν(ν − 1)sgn(s)|s|ν−2 , f  (s) = −ν(ν − 1)(−s)ν−2 , if s < 0

(C2)

∀s ∈ R.

(C3)

Obviously, for any ν − 2 > 0, f  (s) is continuous; thus f (s) is C 2 for any ν > 2.

Appendix D

Proof of Lemma 9

We first prove that Vi (¯ ηi ) is C 2 . By (6)–(8), (C2), (C3) and Lemma 8, qi ∂Ui = [ξi ] μ , ∂ηi

μ

∂Ui qi ∂[ηi∗ ] ri =− ∂ηj μ ∂ηj

ηi ηi∗

q  μ μ  i −1  ri μ ∗ ds, [s] − [ηi ] ri  μ

μ−ri qi qi ∂ Ui qi ∂ Ui ∂ 2 Ui qi ∂[ηi∗ ] ri −1 ri μ = |η | |ξ | , = = − |ξi | μ −1 , i i ∂ηi2 ri ∂ηi ∂ηj ∂ηj ∂ηi μ ∂ηj  μ 2 μ q q ηi  ηi  μ μ  i −2 μ  i −1 qi (qi − μ) ∂[ηi∗ ] ri qi ∂ 2 [ηi∗ ] ri ∂ 2 Ui μ  rμi ∗ r ∗ r μ ri i i  [s] = − [η ] ds − − [η ] ds, [s] i i ∂ηj2 μ2 ∂ηj μ ∂ηj2 ηi∗ ηi∗ μ μ

q ηi  μ μ  i −2 ∂[ηi∗ ] ri ∂[ηi∗ ] ri ∂ 2 Ui qi qi ∂ 2 Ui μ [s] ri − [ηi∗ ] ri = = ds, (D1) −1 ∂ηj ∂ηk ∂ηk ∂ηj μ μ ∂ηj ∂ηk η∗

2

2

i

Xie X J, et al.

Sci China Inf Sci

July 2014 Vol. 57 072202:13

 μ μ = 0, j = k, and [ηi∗ ] ri = − αi−1 [ηi−1 ] ri−1

μ

where j, k = 1, . . . , i−1, the last equality is obtained by using μ μ  + αi−1 αi−2 [ηi−2 ] ri−2 + · · · + αi−1 αi−2 · · · α1 [η1 ] r1 . By  ri + τ  , 2ri , 1in l

μ > max

μ

∂[ηi∗ ] ri μ = −αi−1 · · · αj |ηj | ∂ηj rj

∂ 2 [ηi∗ ] ri ∂ηj ∂ηk

μ−rj rj

μ

,

∂ 2 [ηi∗ ] ri μ(μ − rj ) = −αi−1 · · · αj [ηj ] ∂ηj2 rj2

μ−2rj rj

,

one gets qi /μ − 2 > 0, (μ − rj )/rj > 1 and (μ − 2rj )/rj > 0, from which and (D1), we know that Ui (¯ ηi ) is C 2 , 2 and then Vi (¯ ηi ) is also C . Next, we prove that Vi (¯ ηi ) is positive definite and proper in two cases. Case I: When ηi∗  ηi : Subcase i): 0  ηi∗  ηi . By (8), |x − y|p  |xp − y p | of Lemma 6 and μ/ri > 2 in Lemma 7, one gets

ηi  μ

ηi q qi μ  i μ Ui (¯ s ri − ηi∗ ri ηi ) = ds  (s − ηi∗ ) ri ds. ηi∗

ηi∗

Subcase ii): ηi∗  0  ηi . By |x + y|p  2p−1 |xp + y p | of Lemma 5 and |x − y|p  |xp − y p | of Lemma 6, one has

0

ηi  μ

ηi q q qi q qi μ μ  i μ  i μ μ − i Ui (¯ −(−s) ri + (−ηi∗ ) ri s ri + (−ηi∗ ) ri ηi ) = ds + ds  2 μ ri (s − ηi∗ ) ri ds. ηi∗

ηi∗

0

Subcase iii): ηi∗  ηi  0. By |x − y|p  |xp − y p | of Lemma 6, one obtains

ηi

ηi qi μ μ qi Ui (¯ ηi ) = (−(−s) ri + (−ηi∗ ) ri ) μ ds  (s − ηi∗ ) ri ds. ηi∗

ηi∗

Combining Subcases i)–iii), one has ηi )  2 Ui (¯

qi q − ri μ i

ηi ηi∗

(s −

q

i ηi∗ ) ri

qi q − i

qi +ri 2 μ ri r i ds  (ηi − ηi∗ ) ri . qi + ri

(D2)

Case II: When ηi  ηi∗ , using the same analysis method as above, one gets ηi )  2 Ui (¯

qi q − ri μ i

ηi∗ ηi

(ηi∗

− s)

qi ri

qi q − i

qi +ri 2 μ ri r i ∗ ds  (ηi − ηi ) ri . qi + ri

(D3)

qj +rj i i ∗ rj ηi ) = Vi−1 (¯ ηi−1 ) + Ui (¯ ηi ) = U (¯ η )  m |η − η | , Combining (D2) and (D3), we have Vi (¯ j j j j j j=1 j=1 ηi ) is positive definite and proper, where mj > 0 is a constant. which implies that Vi (¯ At last, we prove inequality (9). From Definition 1, (6)–(8) and (D1), it follows that

ηi )  −L LVi (¯

i−1 

ci−1,j |ξj |4l + L[ξi−1 ]

j=1



i  ∂Ui fj + Fi−1 + ∂ηj Lκj j=1 qi

pi  L[ξi ] μ ηi+1 −L

i−1 



qi−1 μ

 +

p

∗pi−1

(ηi i−1 − ηi

)+L

i  ∂Ui pj η ∂ηj j+1 j=1

  i  1 gp ∂ 2 Ui gqT Gi−1 + Tr 2 Lκp ∂ηp ∂ηq Lκq p,q=1

ci−1,j |ξj |4l + Fi + Gi + L[ξi−1 ]

qi−1 μ

p

∗pi−1

(ηi i−1 − ηi

)

j=1



μ

ηi  μ q i−1 μ  i −1 qi  ∂[ηi∗ ] ri pj  ri μ ∗ L ηj+1 ds. [s] − [ηi ] ri  μ j=1 ∂ηj η∗

(D4)

i

We concentrate on the last two terms on the right-hand side of (D4). When (ri pi−1 )/μ  1 and ηi ηi∗  0, using (6) and Lemma 6, one obtains  ⎧  μ  ri pi−1  μ  ri pi−1  ⎪ μ μ  ri  ⎪ ∗ ri ⎪ η if ηi  0 and ηi∗  0 − η  , ⎪ i ⎪  ⎨ i   ∗p  pi−1  − ηi i−1  = ηi   ri pi−1 ri pi−1  ⎪    ⎪ μ  μ  ⎪ μ μ   ∗ ⎪ r r ⎪ + (−ηi ) i  , if ηi  0 and ηi∗  0 ⎩ − (−ηi ) i 

(D5)

Xie X J, et al.

 |ξi |

Sci China Inf Sci

ri pi−1 μ

July 2014 Vol. 57 072202:14

.

(D6)

When ri pi−1 /μ  1 and ηi ηi∗  0, by (6) and Lemma 5, one obtains  ⎧ ri pi−1   μ  ri pi−1  μ  ⎪ μ μ  ri  ⎪ ∗ ri ⎪ + (−ηi )  η  , if ηi  0 and ηi∗  0 ⎪ ⎪  ⎨ i   p ∗p   i−1 − ηi i−1  = ηi   ri pi−1 ⎪   μ  ri pi−1  ⎪ μ  ⎪ μ μ   ∗ ⎪ r r ⎪ + ηi i  , if ηi  0 and ηi∗  0 ⎩  (−ηi ) i  r p 1− i μi−1

 2

|ξi |

ri pi−1 μ

.

(D7)

(D8)

When ri pi−1 /μ  1 and ηi ηi∗  0, by (6), (D5) and Lemma 6, there exist positive constants bi1 and ¯bi1 such that p

∗pi−1

|ηi i−1 − ηi

  ri pi−1 ri pi−1 ri pi−1 ri pi−1 |  c|ξi | |ξi | μ −1 + (αi−1 |ξi−1 |) μ −1  bi1 |ξi−1 | μ + ¯bi1 |ξi | μ .

(D9)

When ri pi−1 /μ  1 and ηi ηi∗  0, by (6), (D7) and Lemma 6, one has p

∗pi−1

|ηi i−1 − ηi

ri pi−1 μ

|  |ξi |

.

(D10)

Combining (D5)–(D10), by Lemma 4, one gets  qi−1  qi−1  ri pi−1 ri pi−1   p ∗p  bi1 |ξi−1 | μ + ˜bi1 |ξi | μ  li,i−1,1 |ξi−1 |4l + σi1 |ξi |4l , (D11) [ξi−1 ] μ ηi i−1 − ηi i−1   |ξi−1 | μ   ri pi−1 where ˜bi1 = max 21− μ , ¯bi1 , li,i−1,1 and σi1 are positive constants. By (6), Lemmas 4–6 and the first integral mean value theorem, one gets   μ

ηi  μ q i−1   ∗ μ  i −1   qi  ∂[ηi ] ri pj  ri ∗ r μ ηj+1 ds [s] − [ηi ] i  −   μ ∂ηj η∗ j=1 i

 d¯

i−1 

|ηj |

μ −1 rj

qi −1

|ηj+1 |pj |ξi | μ

j=1

 d˜

i−1  

r 1− μj

|ξj |

r 1− μj

+ |ξj−1 |

|ηi − ηi∗ |



|ξj+1 |

τ +rj μ

+ |ξj |

τ +rj μ

|ξi |

4lμ−τ −1 μ

j=1



i−1 

lij2 |ξj |4l + σi2 |ξi |4l ,

(D12)

j=1

¯ d, ˜ lij2 (j = 1, . . . , i − 1) where the second inequality is obtained by following the same way as (D5) and (D7). d, and σi2 are positive constants. Choosing cij = ri+1

∗ = −αi μ [ξi ] ηi+1 gets the result.

ri+1 μ

ci−1,j − lij2 > 0,

j = 1, . . . , i − 2,

ci−1,i−1 − li,i−1,1 − li,i−1,2 > 0,

j = i − 1,

μ

, αi = (cii + σi1 + σi2 ) ri+1 pi , cii > 0, and substituting (D11) and (D12) into (D4), one

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072203:1–072203:12 doi: 10.1007/s11432-013-4998-z

Sequence memory based on an oscillatory neural network XIA Min1 ∗ , WENG LiGuo1 , WANG ZhiJie2 & FANG JianAn2 1College

of Information and Control Science, Nanjing University of Information Science and Technology, Nanjing 210044, China; 2College of Information Science and Technology, Donghua University, Shanghai 200051, China Received June 3, 2013; accepted July 15, 2013; published online February 27, 2014

Abstract In the brain, the discrete elements in a temporal order is encoded as a sequence memory. At the neural level, the reproducible sequence order of neural activity is very crucial for many cases. In this paper, a mechanism for oscillation in the network has been proposed to realize the sequence memory. The mechanism for oscillation in the network that cooperates with hetero-association can help the network oscillate between the stored patterns, leading to the sequence memory. Due to the oscillatory mechanism, the firing history will not be sampled, the stability of the sequence is increased, and the evolvement of neurons’ states only depends on the current states. The simulation results show that neural network can effectively achieve sequence memory with our proposed model. Keywords

neural network, hetero-association, sequence memory, oscillation, Hebbian rule

Citation Xia M, Weng L G, Wang Z J, et al. Sequence memory based on an oscillatory neural network. Sci China Inf Sci, 2014, 57: 072203(12), doi: 10.1007/s11432-013-4998-z

1

Introduction

Sequence memory is a kind of order information processing, which is a very important part in the function of brain [1–4]. Sequence memory is the function that encodes the discrete elements in a temporal order. In the brain level, the reproducible sequence order of neural activity is very crucial for many functions, such as sensory information processing [5,6], motor coordination and control [7], and animal communication [8,9]. Recently, robustly generating sequence of neural activity is carried out by many intriguing experiments. Neural networks are frequently used to construct the sequence memory model. The conventional associative memory model evolves to stable steady state [10–15], while neural networks based sequence memory model switches orderly from one pattern to another. This function requires an ability to get out of stable state in a neural network [16]. So far much work has been done based on neural networks [17–27] to realize the sequence memory. Kanter and Sompolinsky [17] proposed a temporal association with asymmetric synapses. A two-layer neural network was proposed by Philip [18], where one layer was composed of sensory neurons, and one layer was composed of principal neurons. Rehn used dynamic depression synapses [19] to model the sequence association. A hierarchical self-organizing map model was proposed by Carpinteiro to realize the temporal association [26]. ∗ Corresponding

author (email: xiamin [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:2

For the cognitive neuroscience modeling, an important evaluation standard is the similarity between the real brain and the proposed model. In the real brain function, the stored patterns are always correlative with each other, but in many past neural network models of memory, including Winder’s model, the hetero-association between the stored patterns is ignored. In most traditional memory models, the stored memories would typically tend to be in fixed point attractor states [28–30], which cannot realize the sequence memory. The hetero-association between the stored patterns is crucial in the memory information processing, and is always regarded as a mechanism with an ability to get out of stable state in a neural network [31,32]. The hetero-associations is a very important part in addition to the auto-associative neural network that the temporal association can be archived. In the existing heteroassociations for sequence memory models, the history sampling function samples all firing history steps of neurons such that the sampling result is increased along with the processing of memory, possibly leading to the instability of the sequence [32]. In this work, we propose a mechanism for oscillation in the network to solve this problem. In the real brain, the neurons’ states at time t only relies on the states of time t − 1, if there is no stimulus from outside [4]. In this method, the firing history will not be sampled, and the state of the network at time t is only associative with the state at t − 1. Thus, the proposed model in this paper is more reasonable than existing methods of using history sampling function. Under this mechanism, the oscillation of the network that cooperates with the hetero-association can help the network transfer between the stored pattern, realizing the sequence memory. Meanwhile, the time consumption is less than that of existing methods. Thus, the proposed paper is better than existing methods. Our model gives a new method to model the function of sequence information processing in the real brain. Using the mechanism for oscillation and the hetero-association between the stored patterns, the network can achieve the sequence memory. The remainder of this paper is organized as follows. In Section 2, the temporal association based on oscillatory neural network is described. And the simulation results of proposed model are presented in Section 3. Section 4 concludes this paper. The theoretic analysis for the temporal association is given in Appendix.

2 2.1

Methods Method description

In the neural network, the state of neuron is a binary value {si }N i=1 = {1, −1}, where N is the neurons’ u u u u number in the network. A set of patterns {X = (x1 , x2 , . . . , xN )} are the stored patterns in the neural network, where u = 1, 2, . . . , p is the label of the stored patterns. Each neural state {X u } of the uth pattern is created independently as prob[xui = 1] = 1 − prob[xui = −1] = k, where k is coding level in the network. In order to memorize the p stored patterns, the the Hebbian learning rule is often used to train the synaptic strength wij . In such a rule, connection strengths for network are kept in an N × N weight matrix W , where each weight wij is a real-valued number. The weight is given by wij ∝ (xi − xi )(xj − xj ),

(1)

0 is where · is the node activity. Here, we assume that the rate representations is zero. And the wij defined as ⎧ p  ⎪ ⎨ w0 = 1 xu xu , for i =  j, ij N u=1 i j (2) ⎪ ⎩ 0 wii = 0. 0 In this work, weights on self-connections are not permitted, so wii = 0. The neuron’s state in the neural network is updated as  1, with probability (1 + e−hi (t/T ) )−1 , si (t + 1) = (3) −1, with probability (1 + ehi (t/T ) )−1 ,

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:3

where si (t) is the output for neuron i at time t. T is a temperature parameter; it is 0.1 in our simulations. Updating is done synchronously. hi (t) is the local field of neuron i, hi (t) =

N 

wij (t)sj (t) − θi (t) − βKi (t),

(4)

j=1

where θi (t) is the threshold associated with local field of neuron i leading to the oscillation of network, Ki (t) is a biasing factor that compensates for the unequal numbers of +1 and −1 values among the memory patterns, and β is the scale parameter. Based on the Horn and Usher’s model [33], the factor Ki (t) is given by p  Ki (t) = A · M · xui + 2(M − A), (5) u=1

where A is the average level of activity of all nodes over all memory patterns used, and M is the average activity of the network at a given time step. Unlike traditional fixed-point attractor networks where after learning a nodes threshold is fixed, the threshold values here are changed at each time step t, θi (t) = kr ri (t),

(6)

ri (t − 1) (7) + si (t), c where ri (t) = 0 and c > 1. c is set at 1.5, and kr is the scale parameter for the network’s oscillation. At every time step, when the state of neuron i is +1, the threshold of neuron i can be increased, making it more likely that the state of the neuron will become negative during the next time step. When the neuron’s state i is −1, the threshold can be decreased, making it more likely that the state of the neuron will become positive. Thus, the network state can oscillate and different stored patterns can be explored. In order to present the sequence association during a span task, the connection strengths wij are updated using the weight change rule ri (t) =

0 wij (t) = τ wij −

1 αsi (t)sj (t)(1 − δij ) + mij , N

(8)

where τ (0  τ ) is stability coefficient for auto-association, and δij is Kronecker’s delta (if i = j, then 0 keeps the stability of network in the processing of sequence δij = 1; if i = j, then δij = 0). The part τ wij 0 memory. The initial weight of wij (t) is set at wij (0) = wij . The second part on the right-hand side of this weight change rule (8) follows the Winder’s model [34]. But in this work, only N1 αsi (t − 1)sj (t − 1) can be reduced in the new connection weight. mij of the third part on the right-hand side of this weight change rule (8) is the hetero-association, which is expressed as mij =

p 1  u+1 u x xj . N u=1 i

(9)

The sequence cycle is incorporated by xp+1 = x1i . The hetero-associations between stored patterns is i 0 and the part of mij are necessary for the robust crated by pattern’s interrelation. Both the part of τ wij 0 sequence association. τ wij sets the neural network in a stable equilibrium, and the mij weights trend to drive the network from one stored pattern to another stored pattern. 2.2

Measuring the overlap between network’s states and stored pattern in sequence memory

In order to evaluate the performance of sequence memory, the similarity measure for the stored patterns and neurons’ states should be proposed. Firstly, at time t, the distance between the stored pattern xu and the state of the network s(t) should be computed. In this paper, we use the Hamming distance, which is characterized as follow: N 1 z u (t) = | si (t) − xui | . (10) 2 i=1

Xia M, et al.

Figure 1

Sci China Inf Sci

July 2014 Vol. 57 072203:4

Four stored patterns in neural network, and the networks 100 nodes are described as a 10 × 10 array. Nodes

with state of +1 are marked by a block “■”, and those with state of −1 are in “·”.

The similarity of current state s(t) to the stored pattern X u is then computed as follow: F u (t) = 1 −

z u (t) , N

(11)

which lies in 0.0 and 1.0. If the value of F u (t) is close to 1, then the pattern xu is remembered at t time step, while progressively lower values of F u (t) indicate progressively worse matches. Up to now, there is no unified standard for the threshold of the similarity. In general, the similarity can indicate the recall level for each pattern.

3

Results

In this work, visual interpretation for the networks state is used, the neurons are described as a 10 × 10 array, as illustrated in Figure 1. There are four stored patterns presented in Figure 1, each with 10 × 10 binary pixels for 100 neurons in the neural network. If the state of a neuron is 1, then this neuron is marked by “■”; if the neuron’s state is −1, then this neuron is represented by a dot “·”. According to the Hebb learning rule for synaptic weights in (1), the four patterns in Figure 1 are stored in the neural network. Based on the sequence memory model proposed in Section 2, with appropriate values of τ , α, and kr , the sequence memory can be implemented. Figure 2 shows the effect of parameter τ on the sequence memory with kr = 0.15, α = 0.5, β = 0.2. Simulation of Figure 2(a) indicates that with a small value of τ = 0.25, the sequence association is unstable, the network cannot stay at a stored pattern with small 0 , and the neural network tends to transfer to the next stored pattern in a sequence, causing effect of wij some consecutive patterns to overlap and therefore sequence cannot be realized. Figure 2(b) shows the processing of the sequence memory with parameter values kr = 0.15, α = 0.5, β = 0.2 and τ = 1. Figure 2(b) shows that the transition in the stored patterns is perfect from pattern b. Figure 2(c) gives the simulation result with τ = 1.8. Figure 2(c) shows that the stored patterns cannot be distinguished in the neural network, so that all the patterns are confused, and the network cannot stay at any stored pattern. Thus, if τ has a large value, the sequence memory cannot be realized. Figure 2(d) shows that there is no transition between the stored patterns when τ = 3.5. Under the circumstance of Figure 2(d), if the value of τ is too large, the hetero-association is too small to make the neural network evolve from a stored pattern to the next stored pattern; as a consequence, the neural network always stays at one stored pattern. Figure 3 shows the overlap of the stored pattern with the local field h(t) in the sequence memory in Figure 2. If pattern transition happens from xu to xu+1 at time t, then the h(t − 1)xu+1 must be larger than h(t − 1)xu . Figure 3(a) indicates that some patterns have almost the same overlap between the local field h(t) at the same time, leading to incorrect memory. In contrast, in the simulation of Figure 3(b), only one pattern has the largest overlap to the local field, and it is much larger than others, and so with τ = 1, the network can transfer from this pattern to the next pattern. In Figure 3(c), all the patterns have almost the same overlap with local field h(t) at τ = 1.8, so that the network cannot distinguish the patterns. Figure 3(d) demonstrates that, all the time the pattern b has the largest overlap to the local field. In this case the network is unable to get out of pattern b; as a result, the sequence cannot be realized. Next, we discuss how the parameter α influences the performance of the network. In Figure 4, the parameters kr , τ , and β are with values 0.15, 1, and 0.2 respectively. Figures 4(a) and 4(b) show the

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:5

(a)

(b)

(c)

(d)

Figure 2

The output sequence in the network, starting from 1st time step to 30th time step with parameter values

kr = 0.15, α = 0.5, and β = 0.2. (a) τ = 0.25; (b) τ = 1; (c) τ = 1.8; (d) τ = 3.5.

simulation results for α = 0.25 and α = 1.2 respectively. The ordinal pattern revival cannot be achieved in Figures 4(a) or 4(b), and the network cannot remember any stored pattern several time steps after beginning. According to simulation of Figure 2(b), the sequence can be remembered with α = 0.5, which indicates that the network cannot realize the sequence memory when the value of α is too small or too large. Figure 5 gives the result of overlaps of stored patterns with local field h(t) for Figure 4. From Figure 5(a), it is found that all the patterns have almost the same overlap with local field with α = 0.25. In this condition, the network cannot remember any stored pattern, just like the simulation result of Figure 3(c). When the value of α is too large, the neural network cannot realize the sequence memory either. Figure 5(b) shows the simulation result as α = 1.2, which has the same characteristic as the result of Figure 3(a). In this case, no stored pattern can be recalled. Figure 6 shows why the neural network can transfer from a stored pattern to the next stored pattern with parameter kr . Figure 6 indicates that the value of kr can not be too large or to small. Figure 6 presents the simulation result for overlaps of stored patterns with local field in a sequence memory with parameter values kr = 0.02, kr = 0.3, and kr = 0.6 respectively. Figure 6(a) shows that when kr = 0.02, all the patterns have almost the same overlaps with local field, and the stored patterns cannot be recalled. In this condition, the network always stays at the spurious state. When the value of kr reaches 0.3, Figure 6(b) indicates that not only the stored patterns can be memorized, but the reverse patterns of stored patterns can be memorized. In Figure 6(b), the oscillation is sufficient for making the network evolve to

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:6

1.5 Pattern a

Pattern b

Pattern c

Pattern d

Overlap

1.0 0.5 0 −0.5 −1.0

0

10

20

30

40

50 Time (a)

60

70

80

90

100

1.5 Pattern a

Pattern b

Pattern c

Pattern d

Overlap

1.0 0.5 0 −0.5 0

10

20

30

40

50 Time (b)

60

70

80

90

100

1.5 Overlap

Pattern a

Pattern b

Pattern c

Pattern d

1.0 0.5 0

0

10

20

30

40

50 Time (c)

60

70

80

90

100

3.0 Overlap

2.5 2.0 Pattern a

1.5

Pattern b

Pattern c

Pattern d

1.0 0.5 0

0

10

20

30

40

50 60 70 80 90 100 Time (d) Figure 3 Overlaps of local field h(t) and network’s state s(t) during a run of the network from 1st time step to 100th time step from initial state of pattern b. The parameters kr , α, and β are with values 0.15, 0.5, and 0.2 respectively. (a) τ = 0.25; (b) τ = 1; (c) τ = 1.8; (d) τ = 3.5.

(a)

(b)

Figure 4 The output sequence in the network, starting from 1st time step to 30th time step from initial state of pattern b. The parameters kr , τ , and β are with values 0.15, 1, and 0.2 respectively. (a) α = 0.25; (b) α = 1.2.

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:7

1.5 Overlap

Pattern a

Pattern b

Pattern c

Pattern d

1.0 0.5 0 0

10

20

30

40

50 Time (a)

60

70

80

90

100

1.5 Pattern a

Overlap

1.0

Pattern b

Pattern c

Pattern d

0.5 0 −0.5 −1.0 0

Figure 5

10

20

30

40

50 Time (b)

60

70

80

90

100

The output sequence in the network, starting from 1st time step to 100th time step. τ = 1, kr = 0.15, β = 0.2.

(a) α = 0.25; (b) α = 1.2

1.5 Overlap

Pattern a

Pattern b

Pattern c

Pattern d

1.0 0.5 0

0

10

20

1.5

30

40

Pattern a

50 Time (a)

60

Pattern b

70

80

Pattern c

90

100

Pattern d

Overlap

1.0 0.5 0 −0.5 −1.0

0

10

20

40

Pattern a

1.5 Overlap

30

50 Time (b)

60

Pattern b

70

80

Pattern c

90

100

Pattern d

1.0 0.5 0 −0.5 −1.0

Figure 6

0

10

20

30

40

50 Time (c)

60

70

80

90

100

The output sequence in the network, starting from 1st time step to 100th time step. τ = 1, α = 0.6, β = 0.2.

(a) kr = 0.02; (b) kr = 0.3; (c) kr =0.6.

the reverse patterns. At a value of kr = 0.6 shown in Figure 6(c), if the oscillation is too large for the network to be stable in a stored pattern or a reverse pattern, and the state of neural network is random in space, then the sequence memory cannot be achieved.

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:8

1.1 Pattern1

Overlap

1.0

Pattern2

0.9

Pattern3

0.8

Pattern4

0.7

Pattern5 Pattern6

0.6

Pattern7

0.5

Pattern8 0.4

55

60

65

70

75 Time (a)

80

85

90

95

100

1.1 1.0

Overlap

0.9 0.8 0.7 0.6 0.5 0.4

Figure 7

0

10

20

30

40

50 60 70 80 90 100 Time (b) Overlaps of stored pattern and network’s state s(t) during a run of the network with parameter values kr = 0.15,

α = 0.5, τ = 0.8 and β = 0.2. (a) The number of nodes N = 300, and the number of stored patterns p = 8; (b) the number of nodes N = 400, and the number of stored patterns p = 15.

Based on the simulation results above, in order to realize the sequence memory, the parameter values of kr , τ , and α should be chosen appropriately. With proper parameter values, the oscillation in the network cooperates with the hetero-association, making the network transfer from one stored pattern to the next stored pattern in the sequence, which results in the sequence memory. A comprehensive evaluation is presented for the proposed model. Figure 7 gives more simulation results on 300 neurons and 400 neurons rather than only 100 neurons as well as more stored patterns. In Figure 7, each peak value reflects the recall of a stored pattern. In Figure 7(a), there are 8 stored patterns recalled in a sequence, and different patterns are denoted by different symbols. In Figure 7(b), there are 15 stored patterns recalled in a sequence. At each time step a stored pattern is recalled, and then the next stored pattern is recalled at next time step. The simulation results indicate that the proposed model can realize the sequence memory effectively for different network scales and different stored patterns. In the proposed model, the steady-state period (steady-state period is defined as the time steps when the neural network stays at one stored pattern in a cycle [32]) for each pattern is one time step. However, in the real brain of recognition and information processes in brain systems, the network can stay at a stored pattern for several time steps, and then evolve to the next stored pattern. In order to imitate the brain activities truly and well, the proposed model may be modified as wij (t) =

1.1 + sin(t/3) 0 1 1.1 − sin(t/3) w − αsi (t)sj (t)(1 − δij ) + mij . 1.1 − sin(t/3) ij N 1.1 + sin(t/3)

Figure 8 gives the simulation results for the modified model. In this modified model, the network can stay at a stored pattern for some time steps.

4

Conclusion

Sequence memory is a kind of order information processing, which is a very important part in the function

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:9

Pattern

d c b a 1 Figure 8

50

100

150 Time

200

250

300

The output sequence in the network, starting from 1st time step to 300th time step. The block shows that the

network stays at one stored pattern, and that the states of neural network rotate between four stored patterns.

of brain. This paper proposed a sequence model based on the oscillatory neural network. In the existing hetero-associations for sequence memory models, the history sampling function samples all firing history steps of neurons, possibly leading to the instability of the sequence. This work proposed an oscillatory neural network to simulate the real memory function in the brain. A mechanism for oscillation in the network can drive the neural network from one pattern to the next pattern. In the real brain, the neurons’ states at time t only rely on the states of time t − 1, if there is no stimulus from outside. In this method, the firing history will not be sampled, and the state of the network at time t is only associative with the state at t − 1. Due to the oscillation and the hetero-association, the network can get out of a stable state, and drive the neural network from one state to the next state, leading to the sequence memory. In this work, only the simple sequence is considered for sequence memory, and the extension of multi-sequences model with complex sequence (defined in [23]) will be explored in our future work.

Acknowledgements This work was partially supported by National Natural Science Foundation of China (Grant No. 61105115), and National Department Public Benefit Research Foundation (Grant No. GYHY200806017).

References 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Anderson J R. Learning and Memory. New York: John Wiley & Sons, 1995 Waves L. The wheres and hows of memory. Science, 2004, 27: 1210 Schacter D L, Addis D R. Constructive memory: The ghosts of past and future. Nature, 2007, 445: 27 Branco T, Clark B A, H¨ ausser M. Dendritic discrimination of temporal input sequences in cortical neurons. Science, 2010, 329: 1671–1675 Laurent G. Odor encoding as an active, dynamical process: experiments, computation, and theory. Annu Rev Neurosci, 2001, 24: 263–97 Dupret D, O’Neill J, Bouverie B P, et al. The reorganization and reactivation of hippocampal maps predict spatial memory performance. Nat Neurosci, 2010, 13: 995–1002 Stephanie B, Moriel Z, Ravikumar P, et al. Electrical synapses control hippocampal contributions to fear learning and memory. Science, 2011, 7: 87–91 Hahnloser R H R, Kozhevnikov A A, Fee M S. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature, 2002, 419: 65–70 Bird C M, Burgess N. The hippocampus and memory: insights from spatial processing. Nat Rev Neurosci, 2008, 9: 182–194 Bohland J W, Minai A A. Efficient associative memory using small-world architecture. Neurocomputing, 2001, 38: 489–496 Hopfield J J. Neural networks and physical systems with emergent collective computation abilities. Proc Nat Acad Sci USA, 1982, 79: 2445–2558 Juan I, Francisco A, Sergio A. A scale-free neural network for modelling neurogenesis. Physica A, 2006, 371: 71–75 Huang Z K, Wang X H, Sannay M. Self-excitation of neurons leads to multiperiodicity of discrete-time neural networks with distributed delays. Sci China Inf Sci, 2011, 54: 305–317 Amit D J. Attractor neural networks and biological reality: associative memory and learning. Futur Gener Comp Syst, 1990, 6: 111–119

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:10

15 Xia M, Fang J, Yang T, et al. Dynamic depression control of chaotic neural networks for associative memory. Neurocomputing, 2010, 73: 776–783 16 Sandberg A, Lansner A. Synaptic depression as an intrinsic driver of reinstatement dynamics in an attractor network. Neurocomputing, 2002, 44: 615–622 17 Sompolinsky H, Kanter I. Temporal association in asymmetric neural networks. Phys Rev Lett, 1986, 57: 2861–2864 18 Philip S, Tsimring L S, Rabinnovich M I. Dynamics-based sequential memory: winnerless competition of patterns. Phys Rev E, 2003, 67: 011905 19 Rehn M, Lansner A. Sequence memory with dynamical synapses. Neurocomputing, 2004, 58: 271–278 20 Tank D W, Hopfield J J. Neural computation by concentrating information in time. Proc Nat Acad Sci, 1987, 84: 1896–1900 21 Kleinfeld D. Sequential state generation by model neural networks. Proc Nat Acad Sci, 1986, 83: 9469–9473 22 Gutfreund H, Mezard M. Processing of temporal sequences in neural networks. Phys Rev Lett, 1988, 61: 235–238 23 Lawrence M, Trappenberg T, Fine A. Rapid learning and robust recall of long sequences in modular associator networks. Neurocomputing, 2006, 69: 634–641 24 Ram´ on H, Mikhail R. Reproducible sequence generation in random neural ensembles. Phys Rev Lett, 2004, 93: 238104 25 Kleinfeld D, Sompolinsky H. Associative neural network model for the generation of temporal patterns. Theory and application to central pattern generators. Biophys J, 1988, 54: 1039–1051 26 Carpinteiro O A S. A hierarchical self-organizing map model for sequence recognition. Neural Process Lett, 1999, 9: 209–220 27 Xia M, Wang Z, Fang J. Temporal association based on dynamic depression synapses and chaotic neurons. Neurocomputing, 2011, 74: 3242–3247 28 Hopfield J J. Neural networks and physical systems with emergent collective computation abilities. Proc Nat Acad Sci, 1982, 79: 2445–2558 29 Wickramasinghe L K, Alahakoon L D, Smith-Miles K. A novel episodic associative memory model for enhanced classification accuracy. Pattern Recogn Lett, 2007, 28: 1193–1202 30 Amari S. Characteristics of sparsely encoded associative memory. Neural Netw, 1989, 2: 451–457 31 Sompolinsky H, Kanter I. Temporal association in asymmetric neural networks. Phys Rev Lett, 1986, 57: 2861–2864 32 Xia M, Tang Y, Fang J, et al. Efficient multi-sequence memory with controllable steady-state period and high sequence storage capacity. Neural Comput Appl, 2011, 20: 17–24 33 Horn D, Usher M. Parallel activation of memories in an oscillatory neural network. Neural Comput, 1991, 3: 31–43 34 Ransom K W, James A R, Scott A W, et al. An oscillatory hebbian network model of short-term memory. Neural Comput, 2009, 21: 741–761

Appendix A

Analysis of the model

In this work, a basic sequence memory association is tested, and there are p patterns in the sequence. In order to get the successful sequence recall, the loading rate p/N must be less than the sequence storage capacity [19]. In Section 2, the sequence memory model based on an oscillatory neural network is proposed. Here we firstly find the appropriate values of parameters τ , kr and α to get the reliable results. Thus, with appropriate values of kr , τ and α, the neural network will transfer to the next pattern xu+1 only after the neural network has stayed in the xu for one time step. Theorem A1. τ − α − kr < 1 is the necessary condition for the network to realize the sequence memory. Here, we used mathematical induction to prove Theorem A1. The network is firstly assumed to stay at the pattern x1 without loss of generality. And the neural network can transfer from pattern x1 to the second stored pattern x2 (throughout this paper the probative method is the same, so it is not detailedly discussed here). The sequence association can be well realized for all patterns before time t, and the neural network is assumed to stay at stored pattern xu at (t − 1)th time step. For the neural network’s initialization, prob[xui = 1] = 1−prob[xui = −1], and the correlativity between the stored patterns is not allowed. So, we have u−1 

u xm+1 xm j xj = 0, i

m=1

mij sj (t − 1)

=

wij xuj =

=

1 N

u−1  m=1

q 

u xm+1 xm j xj = 0. i

m=u+1 q 1  m+1 m u x xj · xj N m=1 i u xm+1 xm j xj i

+

xu+1 xuj xuj i

+

q  m=u+1

u xm+1 xm j xj i

=

1 u+1 u u x (xj xj ). N i

Xia M, et al.

Sci China Inf Sci

July 2014 Vol. 57 072203:11

And wij sj (t − 1) is descried as 0 sj (t − 1) wij

=

=

θi (t − 1)

= =

q 1  m m u xi xj · xj N m=1

u−1 q  1  m m u 1 m u xi xj · xj + xui xuj · xuj + xm x · x = xui (xuj xuj ), i j j N m=1 N m=u+1

ri (t − 2) + si (t − 1) kr ri (t − 1) = kr c

si (t − 3) si (t − 2) si (1) kr ri (0) + s , + k (t − 1) + + · · · + r i ct−1 c c2 ct−2

ri (1) where krct−1 = 0. Up to time step t − 1, the sequence completes v =  t−1  cycles. Thus, the θi (t) can be p described as

si (t − 3) kr ri (0) si (t − 2) si (1) + s θi (t − 1) = + k (t − 1) + + · · · + r i ct−1 c c2 ct−2

1 p 2 1 2 xi xi xi xi xi + t−3 + · · · + t−p−1 + t−p−2 + t−p−3 + ··· = kr t−2 c c c c c xpi x1i x2i xu + · · · + t−vp−2 + t−vp−3 + · · · + 0i + t−2p−1 c c c c

 v+1 v+1 v+1 1 2   xi xi xui + + ··· + = kr t−(j−1)p−2 t−(j−1)p−3 t−(j−1)p−u−1 c c c j=1 j=1 j=1 v v v    xu+1 xu+2 xpi i i + + + · · · + ct−(j−1)p−u−2 j=1 ct−(j−1)p−u−3 ct−(j−1)p−p−1 j=1 j=1

u xu−1 1 − c−p(v+1) xi 1 − c−p(v+1) x1i 1 − c−p(v+1) · · = kr + i + · · · + u−1 · −p −p 1 1−c c 1−c c 1 − c−p p p−1 u+1 −pv −pv −pv x 1−c xi xi 1−c 1−c + ui · + u+1 · + · · · + p−1 · . c 1 − c−p c 1 − c−p c 1 − c−p

From the analysis above, the local field hi (t) can be described as hi (t − 1)

=

N 

wij (t − 1)sj (t − 1) − θi (t − 1) − βKi (t − 1)

j=1

=

N

 j=1

0 τ wij −

1 αsi (t − 1)sj (t − 1)(1 − δij ) + mij N

sj (t − 1) − θi (t − 1) − βKi (t − 1)

=

N

 1 1 1 u u (x x ) − θi (t − 1) − βKi (t − 1) τ xui (xuj xuj ) − αxui xuj xuj + xu+1 j j N N N i j=1

=

+ (τ − α)xui − θi (t − 1) − βKi (t − 1). xu+1 i

The state of the neurons of the network at the time t is determined by the local field h(t − 1). If the network transfers from pattern xu to pattern xu+1 at tth time step, h(t)xu+1 must be larger than h(t)xu . h(t − 1)xu

=

N  i=1

=

N  i=1

hi (t − 1)xui =

N    u+1 + (τ − α)xui − θi (t − 1) − βKi (t − 1) xui xi i=1

xu+1 xui + (τ − α) i

N  i=1

xui xui − kr

 N i=1

xui xui xui 1 − c−p(v+1)  xu−1 i · + 1 1 − c−p c i=1 N

N N N  xui 1 − c−p(v+1) x1i xui 1 − c−p(v+1)  xpi xui 1 − c−pv  xp−1 i · + · · · + · + · + 1 − c−p cu−1 1 − c−p cu 1 − c−p cu+1 i=1 i=1 i=1  p N N    xu+1 xui 1 − c−pv 1 − c−pv k i −β · + ··· + · A · M · xi + 2(M − A) xui , 1 − c−p cp−1 1 − c−p i=1 i=1 k=1

Xia M, et al.

h(t − 1)xu+1

=

N 

hi (t − 1)xu+1 = i

Sci China Inf Sci

N   u+1  xi + (τ − α)xui − θi (t − 1) − βKi (t − 1) xu+1 i

i=1

=

N 

i=1

xu+1 xu+1 + (τ − α) i i

N 

i=1

· ·

July 2014 Vol. 57 072203:12

xui xu+1 − kr i

i=1 −p(v+1)

1−c 1 − c−p −pv

+ ··· +

i=1

N  x1i xu+1 i=1

i cu−1

N  xu+1 xu+1

1−c + ··· + 1 − c−p i=1

i

 N

i

cp−1

xu+1 xui xu+1 1 − c−p(v+1)  xu−1 i i i + · −p 1 1−c c i=1 N

N N  xpi xu+1 xu+1 1−c 1 − c−pv  xp−1 i i i + · + −p u −p u+1 1−c c 1 − c c i=1 i=1  p N −pv   1−c k −β · A · M · xi + 2(M − A) xu+1 . i 1 − c−p i=1

·

−p(v+1)

k=1

By the no correlativity between the patterns, h(t − 1)xu and h(t − 1)xu+1 can be calculated as h(t − 1)xu ≈ τ N − αN − kr N

1 − c−p(v+1) − βN AM ≈ τ N − αN − kr N − βN AM, 1 − c−p

h(t − 1)xu+1 ≈ N − kr N

1 cp−1

·

1 − c−pv − βAM ≈ N − βN AM. 1 − c−p

According to the analysis above, at the tth time step, when the neural network transfers from pattern xu to the next pattern xu+1 , h(t − 1)xu+1 must be larger than h(t − 1)xu , viz. τ − α − kr < 1. Thus, τ − α − kr < 1 is the necessary condition for the network to realize the sequence memory.

SCIENCE CHINA Information Sciences

. RESEARCH PAPER .

July 2014, Vol. 57 072204:1–072204:12 doi: 10.1007/s11432-013-4792-y

System lifecycle processes for cyber security in a research reactor facility PARK JaeKwan∗ , PARK JeYun & KIM YoungKi Korea Atomic Energy Research Institute, Daedeok-daero 989-111, Dukjin-dong, Yuseong-gu, Daejeon, Korea Received July 19, 2013; accepted October 24, 2013; published online March 6, 2014

Abstract The digitalization of nuclear facilities has brought many benefits, including high performance and convenient maintainability, in terms of facility operation. However, cyber accidents accompanied by the use of digital technologies have increased, and cyber security has been one of the most important issues in the nuclear industry area. Several guidelines have been published for nuclear power plants, but it is difficult to apply all requirements within the guidelines to research reactor facilities because the characteristics in terms of facility scale, purpose, and system design, are different from those of power plants. To address this emerging topic, this paper introduces system lifecycle processes for cyber security in a research reactor facility. It addresses the integration of activities for securing systems and guarding a facility safely using the practices at a research reactor facility. Keywords

software project management, computer-based safety systems, cyber security, security program

Citation Park J K, Park J Y, Kim Y K. System lifecycle processes for cyber security in a research reactor facility. Sci China Inf Sci, 2014, 57: 072204(12), doi: 10.1007/s11432-013-4792-y

1

Introduction

Instrumentation and control (I&C) systems collect sensor signals installed in plant fields, monitor plant’s performance and status, and generate signals to control instruments for plant operation and protection. For a long time, analog technology has been utilized to sure high reliability of systems as a proven technology. Recently, such analog systems have been replaced with digital systems providing efficient performance, high reliability, and convenient maintainability. However, use of the digital I&C system can introduce the cyber security problem that may compromise important functions such as reactor shutdown or the mitigation of release of radioactive materials. Therefore, protection from cyber attacks has been one of key issues in nuclear facilities. Recently, it has been reported that several plants have been attacked and malfunctioned by outside intruders [1]. On January, 2003, the Slammer worm attacked the I&C system vulnerability at the DavisBesse nuclear power plant, and computer systems and safety parameter display systems were infected. Because of network traffic generated by the worm, plant personnel could not access the safety parameter display system, which would indicate meltdown conditions of the plant. On August, 2006, a shutdown ∗ Corresponding

author (email: [email protected])

c Science China Press and Springer-Verlag Berlin Heidelberg 2014 

info.scichina.com

link.springer.com

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:2

of Unit 3 at the Browns Ferry nuclear power plant shows that even critical reactor components can be disrupted and disabled by a cyber attack. Unit 3 was manually shutdown after a failure of controllers with embedded microprocessors and Ethernet communication capabilities. On July, 2010, the Stuxnet worm virus was detected in the Bushehr nuclear power plant. The virus caused by a vulnerability of Microsoft Windows tried to infect systems adopting Siemens control software. A lesson learned from these incidents is that protection should be employed in I&C systems. To cope with those cyber attacks, various studies have been proposed in information technology (IT) and plant industries. Ref. [2] suggested a security risk assessment framework in the IT industry. The framework includes processes for a security risk and vulnerability assessment. As efforts in the plant industry, Ref. [3] presented outcomes of information and communication technologies (ICT) for a security assessment, by targeting an operational power plant. The results show that the vulnerability of a plant from malicious attacks is severe. Ref. [4] introduced a practice for cyber security risk assessment in power plants. The assessment consists of a target system analysis, asset analysis, threat analysis, vulnerability analysis, risk analysis, and intrusion tests to identify the risks. In addition to research fields, national laboratories, utilities, and regulatory governments have tried to find the best way to cope with not only attacks by intruders from outside but sabotage from inside. From 2006, regulatory guides (RG) 1.152 [5] and 5.71 [6] for cyber security have been published. RG 1.152 specifies regulatory requirements for the safety systems during the development phase, and RG 5.71 describes the guides for the operation and maintenance phases in power plant sites. These guides should be mainly considered in the development process of digital systems in nuclear power plants. Even though guides of RG 5.71 are required during the operation and maintenance phases, it is acceptable that its cyber security elements be designed and implemented during the development phase before a site application of the systems, as any later treatment of systems for cyber security may cause unpredicted defects in the systems, or may be implemented with less effective security measures. This means that the security controls in RG 5.71 should also be planned, designed, and implemented in the system development phase. This design consideration is incorporated into a cyber security lifecycle process suggested by this paper. As a specific guide, the RG 5.71 includes main elements such as establishing a defensive strategy, applying security controls, and maintaining a cyber security program and continuous monitoring. For a defensive strategy, a layered I&C architecture and the policies for each layer are required to protect the safety systems. To prevent cyber attacks, implementation of about 150 security controls are enforced. Also, a cyber security program including the above is required for a continuous monitoring and assessment. All of these elements are essential for the development and operation of nuclear power plants to comply with the regulatory guidelines. The focus of this paper is the digital I&C systems of a research reactor that has different characteristics from a nuclear power plant. Table 1 shows the characteristics of research reactors in terms of cyber security. A research reactor has a lower probability to be an attack objective from cyber terrorists than a nuclear power plant in terms of facility importance and purpose. A direct impact caused by a cyber attack is relatively low because it has a far lower power rating and less radioactive materials in the reactor core that can be dispersed in a worst-case accident. The economic feasibility of a facility is one of the important considerations in the process of a system design, mainly in the use of a research reactor for radioisotope production. As learned from previous design practices, many control and monitoring systems are classified as relatively lower quality class than those of a power plant. This leads digital systems to have a weakened design. Generally, a research reactor is located near a town for convenience. Thus, severe accidents (e.g., release of radioactive materials) caused by cyberattacks can highly impact a public safety directly and quickly. Therefore, applying cyber security protection design to research reactor is one of the key issues. The problem is a lack of studies, practices, and guidelines associated with the cyber security of a research reactor. Actually, this makes system designers hesitate with cyber security in their facilities. Intuitively, all requirements prepared for a nuclear power plant can be used for a research reactor. However, this requires a very high cost to a research reactor, which is small scale facility, and may also result in a blind excessive design of the research reactor. Generally, many design requirements for a power

Park J K, et al.

Table 1

Sci China Inf Sci

July 2014 Vol. 57 072204:3

Comparison of characteristics between research reactor and power plant Power plant

Research reactor

Accident probability and impact by cyber security

Purpose

Electricity generation

Radioisotope production, R&D using neutron

Low probability

Power

∼ 3000 MW (thermal)

0 ∼ 30 MW (thermal)

Low impact

Operation condition

High pressure and high temperature

Low pressure and low temperature

Low impact

System design

Complicated and conservative design

Relatively simple and design efficiency for cost-down

High probability

Site

Far from a town

Near to a town

High probability, high impact

plant have been considered for the system design of a research reactor in a graded approach by both system designers and regulatory bodies. Therefore, it is important to propose and discuss various graded approaches for the cyber security of research reactors. Based on a graded approach for the common digital I&C systems of research reactors designed in Korea, this paper proposes a cyber securitydevelopment framework which is an overall proactive plan for implementing protective means to prevent and mitigate cyber attacks. The proposed method contains the overall design lifecycle, security considerations within the system, and cyber security activities for the facility. First, a cyber security lifecycle process is established to define additional cyber security activities to the development and operation phases. Next, security activities for securing digital systems in terms of system features are defined, and security activities for protecting digital systems in terms of facility safeguards are proposed. Finally, the security controls for digital I&C systems are analyzed and assessed qualitatively. The application of selected security controls is recommended based on the analysis results. To describe the research aim and results, this paper consists of the following sections. Section 2 briefly reviews the current status and history of cyber security requirements for nuclear power plants. Section 3 suggests a cyber security lifecycle process for the integration of the system development activities and the overall security activities. Section 4 proposes an integrated development framework of digital systems. Finally, the last section concludes this paper and discusses some avenues for future work.

2

Cyber security requirements in nuclear power plant

10 CFR (Title 10, Code of Federal Regulations) and regulatory guidelines published by the United States nuclear regulatory commission (US NRC) are internationally referenced for the design and construction of nuclear plant facilities. For cyber security, the NRC has published several important codes, such as 10 CFR 50.55a(h) [7], 10 CFR 73.1 [8], and 10 CFR 73.54 [9]. In addition, the NRC has provided two specific guides, RG 1.152 and RG 5.71, in compliance with the regulations. Figure 1 shows history and current state of cyber security requirements in terms of regulation codes, guidelines, and industrial standards. The 10 CFR 50.55a(h) requires that equipment including software and hardware of digital safety systems shall be protected. The RG 1.152 [10] published in 2006 describes a method that the staff of the NRC deems acceptable for complying with the regulations for promoting high functional reliability, design quality, and cyber security for the use of digital computers in the safety systems of nuclear power plants. The regulatory body provides specific guidance concerning safety system security. It uses the waterfall lifecycle phases as a framework for describing the specific guidance. The framework waterfall lifecycle consists of nine phases, (1) concepts, (2) requirements, (3) design, (4) implementation, (5) test, (6) installation, checkout, acceptance testing, (7) operation, (8) maintenance, and (9) retirement. It requires that system features and development activities for cyber security be implemented and performed through these phases. In 2007, the NRC updated the design basis threat (DBT) of 10 CFR 73.1 as the DBT includes cyber security threats. Thus, the 10 CFR 73.1 requires ensuring that digital computers, communication systems, and networks be protected from cyber attacks. In 2009, the NRC revised 10 CFR 73.54 to require plant

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:4

Endorse for safety system

10 CFR 50.55a(h) (protection & safety systems)

IEEE-603 (safety system)

Need to protect HW & SW Computer-specific requirements Endorse

RG 1.152 Rev.2 (2006)

IEEE 7-4.3.2 (digital system)

NRC interim staff guidance (DI&C-ISG-01, 2007~2009)

10 CFR 73.1 Revision for cyber security

Address the issue of computer security

10 CFR 73.54 Revision for cyber security 10 CFR 73.55 (Physical & cyber security) RG 5.74 (New) (Guidance)

10 CFR 73.58 (Safety & security interface)

IAEA Series No. 17 (2011)

RG 5.71 (2010 new)

IEEE 7-4.3.2 (2010)

RG 1.152 Rev.3 (2011) Figure 1

History and current state of cyber security requirements.

designers to develop cyber security plans and programs to protect critical assets including safety systems from cyber attacks. For specific guidance to the 10 CFR 73.54, the NRC published a new regulatory guide 5.71 in January 2010. It describes an approach complying with 10 CFR 73.54 and 73.1 and provides a template for a cyber security plan including a defensive architecture and a set of security controls. IEEE Std 7-4.3.2-2010 [11], which was recently updated from the 2003 version, also mentions that the digital safety system/equipment development process shall address potential security vulnerabilities in proper phase of the digital safety system lifecycle, and system security features should be addressed appropriately in the lifecycle phases. The development process is almost the same as that of RG 1.152 (Rev.02). IAEA Series No. 17-2011 [12] aims to create awareness of the importance of incorporating computer security as a fundamental part of the overall security plan for nuclear facilities. The publication provides guidance specific to nuclear facilities on implementing a computer security program.The content of the publication is similar tothat of RG 5.71. In July 2011, the NRC published RG 1.152 (Rev.03) to provide consistency with RG 5.71. First, the point of view in the guide was changed from cyber security into a secure development and operational environment. Second, phases from installation to retirement were eliminated because the RG 5.71 provides guidance for the phases. Currently, nuclear power plants directly or indirectly regulated by NRC codes and guides are enforced in conformance with above requirements. This paper refers to these requirements as the research basis and develops an application method for a research reactor from the basis for power plants.

3

A cyber security lifecycle process overview

This paper establishes an integrated cyber security lifecycle process as shown in Figure 2. The process contains the primary elements of a cyber security plan for a Korean research reactor. The contribution is that it makes a cyber security strategy be designed and maintained consistently by clarifying the

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:5

Cyber security lifecycle process Concepts

Requirements

Design

Implementation

Test

Operation and maintenance

Designing & implementing secure development and operational environment

Figure 2

Records retention

Incorporating into physical security

Change control

Defense-in-depth model & architecture

Cyber security assessment

Cyber security program review

Designing & implementing security controls

Continuous monitoring

Implementing SOE req. as system features (access, use of system, data comm.)

Digital assets analysis

Cyber security activities

Implementing SDE req. as development activities (procedures, etc.)

Establishing cyber security policy & plan

System development

An integrated cyber security lifecycle process.

interface activities between the system development framework and cyber security program. That is, technical security controls required by a cyber security program are implemented within the system development phase and protective guidelines defined by the program are referred by the development phase. In addition to the collaborating works, the system development framework includes activities for a secure development environment (SDE) and a secure operational environment (SOE) to prevent any potential threats (e.g., back door, virus) from residing within the digital system during the development process. Furthermore, the proposed processes guide a way to establish a cyber security program containing activities for cyber security in terms of a facility safeguard. The cyber security program progresses through establishing and maintaining phases, and the program consists of several main elements, digital asset identification, defense-in-depth strategy, security controls application, and continuous monitoring and assessment. The points of distinction of the proposed overall process from the regulatory guidelines are to integrate and arrange whole activities within the lifecycle process. In detail, alignments between the system development and cyber security program are defined, and interfaces between them are guided in this paper.

4

An integrated development framework of digital systems

This paper proposes a development framework for digital systems including cyber security considerations as a modification of a well-known waterfall design model, which consists of concept, requirement, design, implementation, test, and operation and maintenance phases. It also includes additional activities which are acceptable to be considered in the development process to support cyber security program required in operation phase. As discussed above, establishing and maintaining the cyber security program is very important to prevent or mitigate cyber attacks. Generally, a cyber security program includes several primary elements configuration of a cyber security team, critical digital assets identification, a defense-indepth protective strategy, application of security controls continuous monitoring and assessment, change control, and cyber security program review. As a practice of a graded approach for a research reactor, this paper proposes a cyber security program by introducing the implementing method of the primary elements appropriately. The elements are described through the system development processes. Our framework incorporates cyber security considerations, such as interface activities with a defense-in-depth strategy for security consistency and implementation of technical security controls required in a cyber security program, into the development lifecycles. This paper explains the framework by focusing on cyber security activities because activities for the implementation of system functions are the same as original waterfall model.

Park J K, et al.

4.1

Sci China Inf Sci

July 2014 Vol. 57 072204:6

Concept phase

During the concept phase, functional concepts and conceptual architectures for all I&C systems are defined. This phase includes several cyber security activities. Initially, an analysis of a site operation environment is performed and functional concepts required to establish a secure operational environment for digital systems are identified. The identified concept features become the inputs for the design requirements in the requirement phase. Also, an assessment is performed to identify potential challenges to maintaining a secure operational environment for the system and a secure development environment during the development process. The results of the analysis are used to establish secure requirements for both hardware and software. Furthermore, it is necessary that a cyber security plan be prepared for aligning with the concept phase. The reason for this is that the cyber security scope, policy, team, and implementation schedules should be referred continuously during entire lifecycles. Also, cyber security team is organized in this phase. In case of a Korean research reactor, two cyber security teams for the development and operation phases are made as they are different organizations. During the development phase, there is no dedicated cyber security team, but the system development members are responsible for the supervision and implementation of security controls as a cyber security team (CST). The I&C system design manager concurrently holds the position of cyber security manager because most digital systems belong to the I&C area. Cyber security specialists within the CST are responsible for (1) performing cyber security evaluations of digital systems and (2) maintaining expert skill and knowledge in the area of cyber security. System developers are responsible for (1) establishing cyber security requirements of systems, (2) designing cyber security items of systems, (3) implementing and testing I&C systems, and (4) applying a cyber security plan and policy for the implementation, testing, and installation of the systems.The cyber security team for operation phase is organized in the operation and maintenance phase. 4.2

Requirement phase

In this phase, design requirements of digital systems are established and the results of the previous phase are carefully addressed for cyber security. System features required to maintain a secure operating environment and ensure reliable system operation are defined as part of the overall system requirements. The system design requirements also include well-known cyber security requirements, such as blocking the external interface, communication networks, high reliable modification procedures, the exclusion of remote access, and access control. Activities to identify critical systems (CSs) and critical digital assets (CDAs) are performed together with this phase. It is useful that functional requirements and security requirements of systems be considered together. If the potential vulnerabilities of digital assets are induced by their functional requirements, the results are fed back to developers, and the design problems are resolved directly in the same phase. In the same manner, a defensive architecture is also drawn in this phase. 4.2.1 Critical digital assets identification The common I&C systems of a Korean research reactor consist of computer-based systems with digital communication networks as shown in Figure 3. There are two control rooms; main control room (MCR) and supplementary control room (SCR). All monitoring and control actions are performed in the MCR and a reactor shutdown action is performed in the SCR under situations in MCR inaccessible. There are two safety grade systems, a reactor protection system (RPS) and a post-accident monitoring system (PAMS). The RPS is a safety control system, which is the most important to protect reactor safety, and the PAMS is a safety monitoring system for monitoring the reactor facility continuously under normal and abnormal conditions. Other monitoring and control systems are classified as non-safety systems. Based on the conceptual design, system developers must identify critical digital assets (CDAs) because not all digital systems can be protected from cyber attacks. However, it is difficult to identify CDAs without first conducting a wider assessment of all of the systems within the facility. Thus, a qualitative consequence analysis of the systems is conducted, as shown in Figure 4.

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:7

Supplementary control room

Main control room Operator workstation Operator workstation Display 2

Display 1

Computer

Computer

Display 1 Computer

Operator workstation

Large display panel

Display 2 Display 1 Display 2 Display 3

Computer

Display 1

Display 2

Computer

Computer

Computer Computer Computer

Large display panel

Display Computer

Non-safety communication nework

Isol.

Isol.

Isol.

RPS A

RPS B

RPS C

Isol.

Isol.

PAMS A

PAMS B

RRS A,B

PICS A,B

APS A

APS B

IPS A,B

RMS

Safety network Figure 3

An example of fully-digitalized systems of a research reactor.

Table 2

Results of CDA identification in a research reactor

Critical systems

Critical digital assets

Post-accident monitoring system

Bistable processor, maintenance and test processor, interfaces and test processor Signal processor, maintenance and test processor

Safety network

Isolator, optical communication cable

Reactor protection system

Non critical systems

Non critical digital assets

Reactor regulating system

Control computer, maintenance computer

Alternate protection system

Bistable controller, maintenance computer

Information processing system

Processing server

Process instrumentation and control system

Control computer, maintenance computer

Radiation monitoring system

Monitoring computer, maintenance computer

Operator workstation/large display panel

Workstation computer, display control computer

Non-safety network

Network switch, communication cable

Generally, the CST identifies and selects critical systems (CSs), such as digital systems, equipment, communication systems, and networks that are associated with safety, security, and emergency preparedness (SSEP) functions. The CST also identifies CDAs that have a direct, support, or indirect role in the proper functioning of CSs. Additionally, one more condition, the quality class of the system, is used during decision making process. The reasons for this are that the quality class is determined by its importance to safety during the initial concept phase, and it helps to distinguish between CDA and non-CDA more precisely. Table 2 shows results of critical systems and critical digital assets identification, through the decision making mechanism, in the digital systems of a Korean research reactor. In the analysis results, digital safety systems and a safety network are determined as critical systems. That is, a compromise of these systems can result in radiological sabotage (i.e. core damage) and therefore has the potential to adversely impact the public health and safety. Other systems are classified into non critical systems because the reactor facility is assured of safety status under the failure of these systems.

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:8

Digital systems

NO

Safety class

YES

Performs SSEP functions

YES

Performs SSEP functions

NO

NO

Affects critical systems/functions & pathways

Affects critical assets/functions & pathways

YES

NO

YES

NO

Supports or protects YES critical systems

Supports or protects critical assets

NO

YES

NO

Non critical systems

Figure 4

YES

Critical systems

Non critical digital assets

Critical digital assets

Decision making mechanism of critical systems and digital assets.

Security level 3

Security level 2

Security level 1

Security level 0

Safety systems Reactor protecIsol. tion system

Non-safety control and monitoring systems

Post-acccident monitoring system

ex. IPS, RRS, PICS, OWS, LDP

Isol.

Support systems (ex. auxiliary systems)

Office systems

*Isol.: isolation device Figure 5

A cyber security defensive architecture.

4.2.2 Defense-in-depth strategy After CDAs identification, a defense-in-depth strategy is prepared to establish multiple layers of protections to guard CDAs safely. Its purpose is that the failure of a single layer should not result in the compromise of CDAs. This paper proposes a strategy composed of a defensive architecture and associated policies for control access to multiple layers. Figure 5 briefly shows a defensive architecture for Korean research reactor. In this practice, there are 4 security levels (or layers), and the CDAs are located at the highest level. Control and monitoring systems reside in security level 2, other auxiliary support systems in security level 1, and office systems in security level 0. In addition to the decomposition of security levels, several

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:9

policies are defined to protect the CDAs. Basically, only one-way data flow is allowed between levels 3 and 2. In addition, data flow is only allowed through dedicated devices having security checking capability between levels. Also, the initiation of communications from digital assets at lower security levels to digital assets at higher security levels is prohibited. These policies are implemented using both technical controls (e.g., the boundary devices) and management controls (e.g., limitation of communication software). 4.3

Design phase

Based on the requirement documents, a detailed design is provided in this phase. The design features are translated into specific design configuration items. From the SDOE point of view, measures are taken to prevent the introduction of unnecessary design features or functions. For the measures, specific procedures for the development environment are applied. Security controls for the CDAs or the defensein-depth architecture are also mapped into specific design items as part of the systems in this phase. For example, password, session lock, and isolation devices become design items for the control of access requirements. To decide whether the design is acceptable, a security assessment to identify potential cyber security vulnerabilities to CDAs is performed using detailed design documents. One way to address potential cyber risks of CDAs at the highest security level is to develop cyber security controls described in RG 5.71. The guide includes about 150 security controls classified as technical controls, operational controls, and management controls. Technical controls are safeguards or protective measures that are executed through automatic mechanisms contained within the hardware or software, and controls within this class include access controls, accountability, system protection, and so on. Operational controls are protective measures typically performed manually rather than automated means, and controls within this class include activities involving media protection, physical protection, maintenance, and training. Management controls are to concentrate on risk management and a security policy, and controls within this class cover activities involving system acquisitions, security assessment, and the modification of digital assets. Nuclear power plants have been enforced to apply all of the cyber security controls to their facilities. A selection method of mandatory security controls based on characteristics analysis of a research reactor can be an acceptable method as a graded approach. This paper introduces a practice for the application of security controls to a Korean research reactor.It has been analyzed that about half of the controls can be applied to the research reactor completely. For the remaining controls, the environmental conditions of the research reactor are additionally considered. In a research reactor, the number of operation and maintenance staffs is fewer than power plants, and jobs for access control are also relatively simple. Furthermore, digital systems and CDAs are very few in number compared with power plant facilities. These enable the use of manual controls as a substitute for security controls that are hard to implement an automatic mechanism. Under these considerations, several controls, account management, system use notification, supervision and review, access control for portable and mobile devices, can be sufficiently implemented manually through the procedure. For a similar reason, the security control, separation of functions, can be eliminated. Furthermore, related controls, access enforcement and least privilege, can be integrated into account management control. Some controls, information flow enforcement, network access control, wireless access restriction, and use of external systems, can be partially applied because attack paths do not exist or partially exist. In addition to the access control, other types of security controls have also been analyzed in similar manner. As a result, about 70 security controls are selected for application to Korean research reactor through the analysis. The security controls are summarized in Table 3. 4.4

Implementation phase

During this phase, all design items are transformed into specific hardware and software representations. System developers implement secure development environment procedures to minimize and mitigate any inadvertent or inappropriate alterations of the developed system. The procedures include testing to address undocumented codes or functions. Security controls are also implemented and verified by aligning with this phase.

Park J K, et al.

Table 3 Control category

Sci China Inf Sci

July 2014 Vol. 57 072204:10

A list of selected security controls for a research reactor Security controls applied to research reactor

Access control

Access control policy and procedure, account management, information flow enforcement, unsuccessful login attempts, system use notification, previous login notification, session lock, supervision and review-access control, permitted actions without identification or authentication, automated marking/labeling, network access control, open/insecure protocol restrictions, wireless access restriction, insecure rogue connections, access control for portable and mobile devices, proprietary protocol visibility, third party products and controls, use of external systems, publicly accessible content

Audit

Audit and accountability policy and procedure, auditable events, content of audit records, audit review, analysis, and reporting

CDA and communication protection

CDA and communication protection policy and procedure, shared resources, DoS protection, transmission integrity/confidentiality, trusted path, mobile code, fail in known state

Identification and authentication

Identification and authentication policy and procedure, user identification and authentication, password requirements, identifier management, authenticator management

System hardening

Removal of unnecessary services and programs, host intrusion detection system, hardware configuration, installing OS, applications, and third-party software updates

Media protection

Media protection policy and procedure, media access, media labeling/marking, media storage, media transport, media sanitation and disposal

Personnel security

Personnel security policy and procedure, personnel termination or transfer

Physical and environmental protection

Physical and environmental protection policy and procedure, third party/escorted access, physical and environmental protection, physical access authorization, physical access control (transmission/display medium), visitor control access records

Incident response

Incident response policy and procedure, incident response training, incident response testing and drills, incident handling, incident monitoring, incident reporting

Configuration management

Configuration management policy and procedure, baseline configuration, configuration change control, security impact analysis of changes and environment

System and service acquisition

System and service acquisition policy and procedure, supply chain protection, trustworthiness, integration of security capabilities, developer security testing, licensee/applicant testing

Security assessment and risk management

Threat and vulnerability management, risk mitigation

4.5

Test phase

During this phase, it is verified that the implementation results meet the design requirements completely. Through tests on the system hardware, software, and communication devices, it is validated that the system is implemented appropriately in terms of system functions, the effects of the connected system, and SDOE. Additionally, penetration tests are prepared to identify the remaining potential security vulnerability. These tests are only focused on important assets such as CDAs, CSs, and communication networks. 4.6

Operation and maintenance phase

The cyber security team for the operation phase is responsible for maintaining of the cyber security program established in the development phase. The security team conducts (1) continuous monitoring and assessment, (2) change control, and (3) cyber security program review. The team members receive cyber security training in order to manage the defense-in-depth strategy and security controls. In the operation and maintenance phase, a continuous monitoring and assessment strategy is prepared. This is performed to maintain high security capabilities from cyber attacks. The strategy for Korean research reactor is summarized as (1) an assessment to verify that the security controls remain in place, (2) verification of whether the rogue assets are connected to the I&C network infrastructure, and (3) a periodic assessment of the effectiveness of the security controls.In particular, a periodic assessment

Park J K, et al.

Risk identification

Sci China Inf Sci

July 2014 Vol. 57 072204:11

Risk analysis

Cyber threat identification

Likelihood analysis

Risk treatment Introducing of security controls

Vulnerability identification

Consequence (impact) analysis Cyber security assessment report

Security controls analysis Figure 6

Risk assessment A cyber security assessment process for a research reactor.

includes an analysis of the system features, identification of the vulnerabilities, and a risk assessment. The overall process of assessment is shown in Figure 6. In risk identification, the CST identifies new cyber threats and vulnerability to such threats, and then analyzes the defense capabilities of existing controls against these threats. In risk analysis, the CST analyzes the incident likelihood from a cyber threat, the consequences to the facility and public safety, and assesses the total risk by combining the possibility and impact. Finally, the CST proposes new security controls to prevent or mitigate cyber threats and reports the assessment results to the supervisor. The change control of Korean research reactor is exhaustively managed under specific procedures because it can generate a new vulnerability in the existing configuration. Changes to the environment of the CDAs, such as addition, deletion, or modification, are planned, approved, tested, and documented to sure that the CDAs are protected from cyber attacks. The changes are made to the CDAs according to the configuration management procedures. During the retirement phase, the configuration management procedures include safety, reliability, and security engineering activities. The elements of a cyber security program for Korean research reactor are periodically reviewed by the CST. The review is performed (1) when a change occurs to the personnel, procedures, equipment, or facilities that can adversely affect security, or (2) as necessary based upon site-specific analyses, assessments, or other performance indicators conducted. The CST documents the results and recommendations of program reviews, management findings regarding program effectiveness, and any actions taken as a result of recommendations from a prior program review.

5

Conclusion

In this paper, we introduced the cyber security application issues and problems related to a research reactor facility, and surveyed the current state of references, which are a lack of research works, reports, and practices. As a reasonable solution, we proposed an integrated development framework forthe cyber security establishment of a research reactor facility based onthe practices experienced ina research reactor. Based on international guides, we clarified the cyber security lifecycle process integrating the activities required at the system and facility levels. Furthermore, we suggested a development framework that incorporates security considerations, implementation of technical controls and interface activities with a protective strategy, into the development lifecycle phases in terms of system development.As important security considerations, we discussed the identification of critical digital assets, establishments of defensive architecture, and application of security controls on the development lifecycle processes using practices at a research reactor. We expectthe system lifecycle processes of this paper to be useful to researchers and practitioners with an interest in designing and creating cyber security for digital systems in nuclear area. Our work is on-going and this is focused on the elements of a cyber security plan. Thus, the issue of incorporating a cyber security program into a physical protection program, development of a security assessment model for the feasibility study of security controls, and a method of revising a cyber security program at the site remain as further studies.

Park J K, et al.

Sci China Inf Sci

July 2014 Vol. 57 072204:12

References 1 Kesler B. The vulnerability of nuclear facilities to cyber attack. Strategic Insights, 2011, 10: 15–25 2 Saleh Z I, Refai H, Mashhour A. Proposed framework for security risk assessment. J Inf Secur, 2011, 2: 85–90 3 Nai Fovino I, Guidi L, Masera M, et al. Cyber security assessment of a power plant. Elec Power Syst Res, 2011, 81: 518–526 4 Lee C K, Park G Y, Kwon K C, et al. Cyber security design requirements based on a risk assessment. In: Proceedings of the Nuclear Plant Instrumentation, Control, and Human-Machine Interface Technologies, Knoxville, 2009. 1638–1646 5 USNRC. Regulatory Guide 1.152 Revision 3. Criteria for use of computers in safety systems of nuclear power plants, 2011 6 USNRC. Regulatory guide 5.71. Cyber security programs for nuclear facilities, 2010 7 USNRC. 10 CFR 50.55a(h). Protection and safety systems, 1971 8 USNRC. 10 CFR 73.1. Physical protection of plants and materials, 2007 9 USNRC. 10 CFR 73.54. Protection of digital computer and communication systems and networks, 2009 10 USNRC. Regulatory guide 1.152 revision 2. Criteria for use of computers in safety systems of nuclear power plants, 2006 11 IEEE. IEEE Std. 7-4.3.2-2010. Criteria for digital computers in safety systems of nuclear power generating stations, 2010 12 IAEA. Nuclear Security Series No. 17. Computer Security at Nuclear Facilities, 2011

Information for authors SCIENCE CHINA Information Sciences (Sci China Inf Sci), cosponsored by the Chinese Academy of Sciences and the National Natural Science Foundation of China, and published by Science China Press, is committed to publishing highquality, original results of both basic and applied research in all areas of information sciences, including computer science and technology; systems science, control science and engineering (published in Issues with odd numbers); information and communication engineering; electronic science and technology (published in Issues with even numbers). Sci China Inf Sci is published monthly in both print and electronic forms. It is indexed by Academic OneFile, Astrophysics Data System (ADS), CSA, Cabells, Current Contents/Engineering, Computing and Technology, DBLP, Digital Mathematics Registry, Earthquake Engineering Abstracts, Engineering Index, Engineered Materials Abstracts, Gale, Google, INSPEC, Journal Citation Reports/Science Edition, Mathematical Reviews, OCLC, ProQuest, SCOPUS, Science Citation Index Expanded, Summon by Serial Solutions, VINITI, Zentralblatt MATH. Papers published in Sci China Inf Sci include: REVIEW (20 printed pages on average) surveys representative results and important advances on well-identified topics, with analyses and insightful views on the states of the art and highlights on future research directions. RESEARCH PAPER (no more than 15 printed pages) presents new and original results and significant developments in all areas of information sciences for broad readership. BRIEF REPORT (no more than 4 printed pages) describes key ideas, methodologies, and results of latest developments in a timely manner. Authors are recommended to use Science China’s online submission services. To submit a manuscript, please go to www.scichina.com, create an account to log in http://mco3. manuscriptcentral.com/scis, and follow the instructions there to upload text and image/table files. Authors are encouraged to submit such accompanying materials as short statements on the research background and area/subareas and significance of the work, and brief introductions to the first and corresponding authors including their mailing addresses with post codes, telephone numbers, fax numbers, and e-mail addresses. Authors may also suggest several qualified experts (with full names, affiliations, phone numbers, fax numbers, and e-mail addresses) as referees, and/or request the exclusion of some specific individuals from potential referees. All submissions will be reviewed by referees selected by the editorial board. The decision of acceptance or rejection of a manuscript is made by the editorial board based on the referees’ reports. The entire review process may take 90 to 120 days, and the editorial office will inform the author of the decision as soon as the process is completed. If the editorial board fails to make a decision within 120 days, please contact the editorial office. Authors should guarantee that their submitted manuscript has not been published before and has not been submitted elsewhere for print or electronic publication consideration. Submission of a manuscript is taken to imply that all the named authors are aware that they are listed as coauthors, and they have agreed on the submitted version of the paper. No change in the order of listed authors can be made without an agreement signed by all the authors. Ethical responsibilities of authors: Authors should refrain from misrepresenting research results which could damage the trust in the journal and ultimately the entire scientific endeavour, and follow the COPE guidelines on how to deal with potential acts of misconduct. Disclosure of potential conflict of interests: Authors must disclose all relationships or interests that could influence or

bias the work. The corresponding author will include a summary statement in the text of the manuscript in a separate section before the reference list. Once a manuscript is accepted, the authors should send a copyright transfer form signed by all authors to Science China Press. Authors of one published paper will be presented one sample copy. If more sample copies or offprints are required, please contact the managing editor and pay the extra fee. The full text opens free to domestic readers at www.scichina.com, and is available to overseas readers at www.springerlink.com.

Subscription information ISSN print edition: 1674-733X ISSN electronic edition: 1869-1919 Volume 57 (12 issues) will appear in 2014 Subscription rates For information on subscription rates please contact:Customer Service China: [email protected] North and South America: [email protected] Outside North and South America: [email protected] Orders and inquiries: China Science China Press 16 Donghuangchenggen North Street, Beijing 100717, China Tel: +86 10 64015683, Fax: +86 10 64016350 Email: [email protected] North and South America Springer New York, Inc. Journal Fulfillment, P.O. Box 2485 Secaucus, NJ 07096 USA Tel: 1-800-SPRINGER or 1-201-348-4033 Fax: 1-201-348-4505 Email: [email protected] Outside North and South America: Springer Distribution Center Customer Service Journals Haberstr. 7, 69126 Heidelberg, Germany Tel: +49-6221-345-0, Fax: +49-6221-345-4229 Email: [email protected] Cancellations must be received by September 30 to take effect at the end of the same year. Changes of address: Allow for six weeks for all changes to become effective. All communications should include both old and new addresses (with postal codes) and should be accompanied by a mailing label from a recent issue. According to § 4 Sect. 3 of the German Postal Services Data Protection Regulations, if a subscriber’s address changes, the German Federal Post Office can inform the publisher of the new address even if the subscriber has not submitted a formal application for mail to be forwarded. Subscribers not in agreement with this procedure may send a written complaint to Customer Service Journals, Karin Tiks, within 14 days of publication of this issue. Microform editions are available from: ProQuest. Further information available at http://www.il.proquest.com/uni Electronic edition An electronic version is available at springerlink.com. Production Science China Press 16 Donghuangchenggen North Street, Beijing 100717, China Tel: +86 10 64015683, Fax: +86 10 64016350 Printed in the People’s Republic of China Jointly published by Science China Press and Springer

Suggest Documents