Delay and Communication Tradeoffs for Blockchain Systems

0 downloads 0 Views 2MB Size Report
Jul 17, 2018 - and analyze a novel scheme, implemented by the nodes of the ... transactions involving these accounts, the BNs transmit the .... The system model is introduced in Section III and ... The state of an account can be changed by transactions, either ..... The formula directly follows from the assumption of expo-.
SUBMITTED FOR PUBLICATION (2018)

1

Delay and Communication Tradeoffs for Blockchain Systems with Lightweight IoT Clients

arXiv:1807.07422v3 [cs.DC] 17 Oct 2018

Pietro Danzi, Student Member, IEEE, Anders E. Kalør, Student Member, IEEE, ˇ Cedomir Stefanovi´c, Senior Member, IEEE, Petar Popovski, Fellow, IEEE

Abstract—The emerging blockchain protocols provide a decentralized architecture that is suitable of supporting Internet of Things (IoT) interactions. However, keeping a local copy of the blockchain ledger is infeasible for low-power and memoryconstrained devices. For this reason, they are equipped with lightweight software implementations that only download the useful data structures, e.g. state of accounts, from the blockchain network, when they are updated. In this paper, we consider and analyze a novel scheme, implemented by the nodes of the blockchain network, which aggregates the blockchain data in periodic updates and further reduces the communication cost of the connected IoT devices. We show that the aggregation period should be selected based on the channel quality, the offered rate, and the statistics of updates of the useful data structures. The results, obtained for the Ethereum protocol, illustrate the benefits of the aggregation scheme in terms of a reduced duty cycle of the device, particularly for low signal-to-noise ratios, and the overall reduction of the amount of information transmitted in downlink (e.g., from the wireless base station to the IoT device). A potential application of the proposed scheme is to let the IoT device request more information than actually needed, hence increasing its privacy, while keeping the communication cost constant. In conclusion, our work is the first to provide rigorous guidelines for the design of lightweight blockchain protocols with wireless connectivity. Index Terms—Internet of Things, data structures, blockchain.

I. I NTRODUCTION INCE the advent of the Bitcoin protocol in 2008 [1], a large wave of blockchain protocols has emerged, aiming to support the implementation of decentralized applications, or dApps, that reduce the need of a central authority to supervise the interactions in multi-agent systems [2]. A promising use of dApps is in the Internet of Things (IoT) services, e.g. for executing economic transactions in smart grid applications in which devices interact through a blockchain to trade energy [3], [4] or to solve distributed optimizations [5]. Storing the entire blockchain and processing every transaction require a remarkable amount of storage memory and computations. This is not feasible for IoT devices, as they are often constrained with respect to memory, computation, communication and power. Instead, the IoT devices may act as lightweight clients, which only store a subset of the blockchain data, possibly encoded [6], needed to verify certain events of interest. The IoT devices, configured as lightweight clients,

S

Authors are with the Department of Electronic Systems, Aalborg University, Denmark, Email: {pid, aek, cs, petarp}@es.aau.dk. The work was supported in part by the European Research Council (ERC Consolidator Grant no. 648382 WILLOW) within the Horizon 2020 Program.

Blockchain network

Lightweight IoT clients

BS

BNs

Fig. 1. Communication architecture for the interaction of IoT devices with a set of blockchain nodes (BNs) via wireless links provided by a base station (BS).

are connected to a set of regular blockchain nodes (BNs), e.g. via a wireless base station, see Fig. 1. These devices are constantly synchronizing with the BNs [7], receiving a minimal amount of information, namely the block headers. In addition, when certain events that are of interest to a specific device occur, e.g. modification of specific accounts’ state or transactions involving these accounts, the BNs transmit the updates to the device, including the proof of their inclusion (PoIs) in the blockchain. While the architecture with lightweight clients reduces the processing and memory requirements, there is still a need for a remarkable amount of downlink traffic in order to maintain synchronization to the global blockchain [7]. This type of operation challenges the common assumption that IoT devices mostly generate uplink traffic [8], [9], urging the investigation in accurate models for blockchain traffic. Schemes that reduce the amount of traffic exchanged between the lightweight clients and the BNs have previously been proposed, either by modifying the block structure [10], [11], by leveraging on the characteristics of account-based blockchains, e.g. Ethereum [12], or by backing the authenticity of the transmitted information with a deposit of credit [13], [14]. This work is motivated by the observation that the blockchain synchronization process can be tailored to the actual requirements of timely information updates to the IoT devices. That is, the ultimate target is not to keep the devices always synchronized, but to synchronize them according to the needs of the underlying dApp. Hence, we replace the legacy scheme, in which the BNs transmits the information to the devices whenever available, with a novel approach in which the information is accumulated and pushed only when needed by the end IoT devices. Among the multitude of blockchain protocols, we focus on the Ethereum specification [15], but the overall principle of aggregation can be applied to other “account-based” blockchains. The contributions of this work can be summarized as

SUBMITTED FOR PUBLICATION (2018)

Block b-1

2

Block b+1

Block b Transactions tree root State tree root h(x)

Prev. block

0:

Body

h(x)

...

1: h(4∥∥5) 2: {acc. 2} 3: 4: {acc. 4} 5:

Headers

Transaction 1

h(x)

Transaction 2

h(x)

Transaction t

...

h(x)

Transactions tree

h(1∥2∥3) :{h(4∥∥5),h(2),h(6∥7∥8)} h(4∥∥5) :{h(4),h(9∥10∥)} h(2) :{key,account 2} h(6∥7∥8) :{h(4∥∥5),h(2),h(6∥7∥8)}

... Database with states at block b

h(1∥2∥3) h(6∥7∥8)

h(9∥10∥) 6: {acc. 6} 8: {acc. 8}

9: {acc. 9}10: {acc. 10}

7:

h(∥11∥12)

key value 11: {acc. 11}12: State tree at block b

{acc. 12}

Fig. 2. Example structure of a blockchain. h(x) is the hash value of node x and | | is the concatenation operation. The t transactions included in block b apply modifications to the accounts’ states, which are stored in a database. The state tree depicted on the righthand side, ternary in this example, is build from this database, and its root included in the block header. Branch nodes are colored in blue and extension nodes in brown. Leaf nodes (there are eight of them in the example) are composed by key and value, and colored in green.

follows: 1) We propose and analyze an aggregation scheme, implemented at the BNs, that reduces the duty cycle of the device and the amount of transferred data, at the cost of an increased information delay at the IoT device. The reduction is achieved when events of interests to the device occur multiple times within an aggregation period, and is mainly caused by avoiding transmission of temporary states, but also because the size of the proof of inclusion increases sublinearly with the number of events. 2) We extend our previous model [7] by including the possibility for the IoT device to observe multiple accounts, and for the base station to select the transmission rate. The result is a model for lightweight clients that is rich, but simple to analyze. We show its potential application by constructing a set of observed accounts that increases the privacy of the IoT device, while keeping the communication cost low. 3) We study the cost of transmitting the proof of inclusion for the updated data, namely the Merkle-Patricia tree data structures, and provide experimental results obtained for Ethereum protocol. The remainder of the paper is organized as follows. Section II provides an introduction to blockchain protocols, focusing on Ethereum, and describes the lightweight protocol variants. The system model is introduced in Section III and analyzed in Section IV. Section V presents the evaluation and Section VI concludes the paper.

memory; in this case, the account is called smart contract. The state of an account can be changed by transactions, either directly or through the invocation of a procedure in a smart contract. We shall refer to these modifications of accounts as events. Transactions are signed by devices using an asymmetric cipher, and identified by their hash values1 , as in the Bitcoin specification [1]. The transactions are organized in a chain of blocks. Besides a set of transactions, each block contains cryptographic signatures of the current states of the accounts and a pointer to the preceding block in the chain, which defines a causal relationship between blocks. When a block is appended to the blockchain, the transactions that it includes are considered valid. The Ethereum database is replicated at multiple nodes that are interconnected by a communication network. Each node can append new blocks to the blockchain and inform the rest of the network about a new state of the database. To avoid uncontrolled generation of blocks, so as to keep the database replications consistent, the nodes establish a leader election mechanism that (i) reduces the probability that more than one node generates a valid block at the same time and (ii) keeps the block generation rate lower than the propagation delay of the communication network. Notable examples of leader election mechanisms in blockchains are Proof of Work [1] and Proof of Stake [19]. In order for a block to be considered valid, it must provide information that can be used to verify that it has been generated in accordance with the leader election mechanism. A. The block data structure

II. B LOCKCHAIN PROTOCOL This section introduces the main components of the Ethereum protocol [15], that is a popular choice for blockchain systems tailored for IoT applications [3], [16], [17]. The common trait of Ethereum with other blockchain protocols can be found in [18]. The Ethereum blockchain is a database that records of history of the states of accounts in a chain of blocks. An account is a data structure that contains an amount of credit and a general-purpose memory block. The account may also contain a set of predefined procedures that can read and write to the

A block is composed of a header and a body, see Fig. 2. The block header has a fixed size, while the rest of the block contains the actual transactions and has a variable size. When the number of transactions in a block is high, the variablesize part takes a dominant portion of the total block size. The information specified in the header includes: the block hash value, an incremental counter, the cryptographic signature of the node that generated it, the proof that the block is valid, e.g., Proof of Work solution, and one or more hash values that 1 The hash value of some input data x is the output of a hash function defined by the blockchain protocol, and is indicated as h(x).

SUBMITTED FOR PUBLICATION (2018)

3

Block ready

0: h(1∥2∥3) 3

2

5 (a)

New block

2

6

7 (b)

8

1: h(4∥5)

Header

Data of interest

x

4

6

5

7

8

B. Proof of inclusion (PoI) via Merkle-Patricia trees The Ethereum protocol provides PoIs using Merkle-Patricia trees [15], [20], [21]. A Merkle-Patricia tree has three types of nodes; leaf, extension and branch nodes, see Fig. 2, as in standard Patricia trees [22], and is used to efficiently store and retrieve data structures associated with strings. In the blockchain context, the string is the hash value of the address of an account or transaction, and the data structure to be retrieved is the account/transaction itself. The branch nodes only store the hash value of the list of its child nodes, see Fig. 2. Leaf and extension nodes, also illustrated in Fig. 2, store a key, that is the hash value of the common path shared by all child nodes, and a value. The value stored by extension nodes is the hash value of the list of child nodes, and the one of the leaf nodes is the hash value of the data that is to be authenticated (e.g. an account or transaction). The use of a hash function to index the addresses provides equal length of the strings, which are equiprobable. The presence of a specific node in the Merkle-Patricia tree is proven by constructing its PoI. A PoI is a collection of node values that enables generation of the hash value, contained by the root node of the tree, e.g. the node labeled 0 in Fig. 2, starting from the specific node to prove. By comparing the generated root hash value with the value stored in the block header, the inclusion in the blockchain of the data structure, associated with the specific node, can be verified [1]. In practice, the PoI is used to verify that a particular leaf node, i.e. an account or transaction, is present in a state tree. Specifically, a PoI is created by starting from the root of the Merkle-Patricia tree, and descending to the specific node. At each level, all nodes, that are siblings to the node on the path from the root to the specific node, are collected, as illustrated in Fig. 3 ((a) and (b)). Notice that a PoI, in general, contains much fewer nodes than the complete Merkle-Patricia tree since most of state tree is a characteristic of the Ethereum specification.

time

(a)

x

Accumulation period

x

x

x

time

(b)

(c)

represent roots of PoI trees. In this work, we mainly consider the transactions tree and the state tree. The transactions tree, which uniquely binds the modifications of accounts certified by a block with the block header, can be used to prove that specific transactions are included in the block. The state tree, depicted on the right-hand side of Fig. 2, provides a snapshot of the entire collection of account states.2

x

Block request PoMI

New block

Fig. 3. Representation of the PoIs (a) of node 4, (b) of node 8 of Fig. 2, and (c) PoMI of 4 and 8. h(x) is the hash value of x and | | is the concatenation operation.

2 The

x

2 3: h(6∥7∥8)

3: h(6∥7∥8)

1: h(4∥5) 4

0: h(1∥2∥3)

0: h(1∥2∥3) 1

x

Fig. 4. Information exchanged between a client and a BN using a lightweight protocol during four block periods, (a) without aggregation scheme and (b) with aggregation. Downlink/uplink messages are depicted above/below the time arrow. “Data of interest” includes the accounts’ data and relative PoMIs.

the branches are not collected during the descent from the root [21]. A single proof can be constructed to prove multiple data structures by collecting the union of the nodes required to prove each of the data structures. We shall refer to such a proof as a Proof of Multiple Inclusions (PoMI)3 . Since the nodes required to prove each data structure are likely to intersect, a PoMI is typically much smaller than if each data structure is to be proven by an individual PoI. Fig. 3(c) provides an example of this reduction, for the proof of both nodes 4 and 8. If two individual proofs are build, the PoI of node 4 contains nodes {5, 2, 3} and the PoI of node 8 contains {1, 2, 6, 7}, such that seven nodes are needed in total. However, if the proofs are sent together in a PoMI, only nodes {2, 5, 6, 7} are needed, motivating the advantage of using this data structure. C. Synchronization protocols A blockchain client is updated on modifications of the blockchain database, observed by BNs, by means of a synchronization protocol. In [7] we have presented three possible protocols that can be adopted for this purpose, denoted by P1, P2 and P3. With P1, the client itself stores the entire blockchain, and locally checks the correctness of the transactions. This configuration is not envisaged for IoT devices due to the requirements of storage memory and processing, and will not be considered in this work. In P2, the BNs are notified about the account updates that the client is interested in receiving. Hence, the client receives the block headers from the BNs, by default, and the accounts of interest, only when they are modified. This scheme, referred to as a lightweight protocol, reduces the amount of data communicated in the downlink, as well as the amount of local processing. In fact, the client only verifies that the information sent by BNs is consistent, delegating to them the auditing of the actual validity of the transactions [21]. It follows that, with P2, the client must be connected to at least one honest BN to be able to detect the presence of false information. Finally, in protocol P3 the IoT device is connected to a proxy node that only sends to the device the useful information, without providing any proof that is included in the blockchain. This protocol is also not considered in this work, as it requires the device to fully 3 In contrast with prior literature [20], we use the terms PoI and PoMI to differentiate the proof from the blockchain-specific data structure that provides it, e.g. Merkle tree (in Bitcoin) or Merkle-Patricia tree (in Ethereum).

SUBMITTED FOR PUBLICATION (2018)

trust the proxy node, which is not in line with the envisioned trustless decentralized architecture. In this paper, we assume that the IoT devices have memory and processing limitations and only consider the lightweight protocols of type P2, such as Bitcoin’s Simplified Payment Verification (SPV) [1] and the Ethereum Light Client [23], which are designed to allow clients to verify the inclusion of transactions in the blockchain without downloading and storing the entire blockchain. Specifically, clients only download block headers, as well as the data structures (e.g. account or transaction data structures) that are of interest. Fig. 4 shows the messages exchanged between a client and a BN using a lightweight protocol during three block periods. The red crosses represent the instants at which new blocks are generated. In the basic lightweight protocol, Fig. 4(a), the information is pushed in the downlink from BS as it becomes available. In the first block period there is no information of interest, and only the block header is sent, while in the second and third periods there are events of interest and the respective data are sent with their PoMI. The presence of both transactions and state tree roots in the Ethereum block headers, see Fig. 2, permits to adopt two different approaches to update the local copy of the account states, as illustrated with the following example. Suppose that an account is updated multiple times during several block periods. The BN can send the last version of the account data, with the corresponding partition of state tree at the last block. In this case, the IoT device just replaces the local data if the PoI root matches the one included in the last block header, otherwise refuses it, cf. [12], [20]. Alternatively, the BNs send the whole sequence of transactions that modified the account, along the block periods, together with the collection of their PoIs build from the transactions tree. The sequence of transactions is applied, by the device, to its local version of the account state, to finally reconstruct the updated state. In this paper, we only consider the first approach, and we remark the extension to the second one in Sec. V-D. III. S YSTEM M ODEL The scheme proposed in this paper aggregates the information in order to reduce the communication cost, see Fig. 4(b). Given that the application run by the client can tolerate a delay, information is accumulated at the BN and then periodically released at the subsequent aggregation point. The approach followed by the BN is to always send a proof by means of the state tree, triggering the replacement of the local copy of the client. This permits to send only the latest version of accounts that are modified multiple times during the accumulation, and merge the PoIs of accounts modified in different blocks, in a unique PoMI. The scheme is investigated in detail in the rest of the paper.

4

device subscribes to block headers for all generated blocks as well as state updates for a set A of accounts, which are a subset of the existing accounts. The generic account, indexed as j ∈ N, is updated independently in a block with probability (or relative frequency) p j . We consider the case where the device is not interested in the full state history of the observed accounts, but merely in their most recent state. That is, the device needs to be informed about only the most recent state of the observed accounts, as well as receive the PoMI that proves the inclusion of the specific account states in the blockchain. The case is representative of a class of problems in which the age of the information, i.e. data freshness, is more valuable than tracking all state changes, and includes environmental monitoring applications and power grid stabilization systems [24]. To simplify the presentation, we assume that a block header and updated accounts’ states take up a fixed number of lH and la bits, respectively. In contrast with this, the size of the PoMI, with length lPoMI bits, is random as a result of the PoMI tree data structure. Specifically, as described in Section II, the size of the PoMI is sublinear in the number of accounts. B. Aggregation protocol The block headers and the updated observed accounts are aggregated at a BN, termed aggregation BN, selected by the device, and transmitted to the device periodically every T seconds. The value of T depends on the information delay, tolerated by the application, from the instant at which the account is modified, to the instant at which the update is delivered to the device. Upon successful reception of the transmission, the IoT device acknowledges the packet. We assume that the device selects the sequence of aggregation BNs, over different aggregation period, as part of the initial network association procedure, e.g. by means of a seed sequence. Consequently, the execution of the protocol only requires downlink messages, because all the information, needed by BNs, is sent by the IoT device in the initialization phase. When no transmission is ongoing, the device is assumed to be in power-saving mode. C. Wireless link

A. Blockchain network and IoT device

The wireless downlink from the base station to the IoT device is assumed to be a block Rayleigh-fading channel with constant channel gain over the duration of a transmission and independent channel gains across transmissions. This occurs, for example, in system based on per-packet frequency hopping (FH). Due to the power constraints of the IoT device, we assume that the base station has no information about the channel and hence performs no power or rate adaptation. As a result, a transmission may fail with probability [25] ! R 2W − 1 , (1) pout = 1 − exp − γ

We consider a blockchain network in which new blocks are generated at exponentially distributed intervals with (networkwide) rate λ. A single IoT device is connected to a set of N BNs via a wireless link through a base station, see Fig. 1. The

where γ is the average received signal-to-noise ratio (SNR), R is the transmission rate in bits/s, W is the bandwidth of the channel in Hz. The downlink packet is retransmitted until it has been received successfully by the device.

SUBMITTED FOR PUBLICATION (2018)

5

T

A. Frame size distribution

Tw F H DH Header 1

F

F

F

D

F Time

DA

... Header B PoMI Account 1

... Account e

Fig. 5. The periodic release of information. Red crosses represent block generations. In the first release, there are two retransmissions of the frame (F), due to failure, in the second release, only one retransmission.

In contrast to the downlink transmissions, we assume that the transmission of the acknowledgment packet in the uplink happens instantaneously and is always received reliably, thanks to power control, performed at the IoT device side, based on the received transmission.

D. Frame structure The downlink frame, represented in Fig. 5, consists of F bits, and is divided into a fixed number H of header bits, representing the standard communication protocol overhead, and a variable number D of payload bits, corresponding the blockchain information, i.e. F = H + D. Its duration is related to the transmission rate R as Tw =

kF R

[s],

where k ≥ 1 is the number of transmissions, including retransmissions due to outage. If the transmission of the frame takes longer than the transmission period, i.e. Tw > T, due to retransmissions, it is halted and considered failed. In this case, that has been analyzed in [7], since the block headers are required in order for the IoT device to stay synchronized to the blockchain, the next frame should include the block headers accumulated in the current frame. In this work, we consider the channel and block generation parameters that provide a negligible probability that the frame cannot be received in the current transmission period, so that the phenomenon can be ignored.4 The D payload bits are divided into DH bits for block headers and DA bits for account updates, i.e. D = DH + DA . DH and DA are random as they depend on the number of generated blocks and account updates.

Recall that the frame is divided into H header bits, DH bits for block headers and DA bits for account states. The H header bits are fixed, while DH depends on the number of generated blocks during the aggregation period T, indicated as B. Similarly, DA depends on the number of generated blocks, as it impacts the number of observed accounts that are updated. We indicate the probability distributions of DH and DA , conditioned on the number of generated blocks, respectively as pDH |B and pDA |B . As a result, we may factorize the distribution of the total frame size F as f ∞ Õ Õ pF ( f ) = pB (b) pDH |B (i|b)pDA |B ( f − i|b), (2) i=0

b=0

where we have used the fact that f = DH + DA . The possible sizes are conditioned on the event that b blocks are generated during T given by (λT)b exp(−λT) . (3) b! The formula directly follows from the assumption of exponential waiting time between blocks, which has been shown to hold for blockchains based on Proof of Work [26]. 1) Distribution of DH : Since we assumed that the block headers are always received within the current transmission period, i.e. Tw < T, the size of DH , when B blocks are generated, can be approximated with the fixed quantity B · lH bits, yielding pDH |B in (2). If this is not the case, the number of block headers that needs to be transmitted should be modeled as a bulk queue, where blocks arrive according to a Poisson l m distribution with rate λ, and are served in bulks of up to lDH , where dxe denotes the smallest integer greater than or equal to x. However, this makes the accurate analysis intractable and is outside of the scope for this work. 2) Distribution of DA : In order to obtain the size of the account updates DA , we first need to characterize the number of accounts U that are updated during an aggregation period T. The probability that account j, characterized by relative frequency p j 5 , is updated at least once in b blocks accumulated during the aggregation period is given by q j = 1−(1−p j )b . We denote by U the total number of accounts that are modified at least once in B blocks. Since each of the accounts is updated independently conditioned on B, U follows a Poisson binomial distribution parameterized by the account update probabilities q1, q2, . . . , q | A | : Õ Ö Ö pU |B (u|b) = qj (4) (1 − ql ) . pB (b) =

B ∈ Fu j ∈B

IV. A NALYSIS In this section, we present an analysis of the aggregation scheme. To this end, we first obtain the distribution of the frame size and frame transmission duration, and then use this result to evaluate the communication cost and latency.

l ∈ Fu \B

Fu is the set of all subsets of {1, 2, . . . , |A|} with cardinality u, B is an element of Fu and Fu \ B is the complement of B. This distribution is used to find the distribution of DA , conditioned on B: pDA |B (a|b) =

|A | Õ

pDA |U,B (a|u, b) · pU |B (u|b).

u=0 4 Scenarios for which this assumption does not hold can be observed when the probability of outage is rather high, and the transmission rate is low.

5 This

quantity is not assumed but experimentally estimated in Sec. V.

(5)

SUBMITTED FOR PUBLICATION (2018)

6

Notice that pDA |U, B (a|u, b) only depends on the realization of the number of modified accounts u. This permits us to write pDA |U, B (a|u, b) = pDA |U (a|u) a Õ = plPoMI |U (i|u)placc |U (a − i|u).

(6)

i=0

To complete the analysis, we need to characterize the size of PoMI and accounts’ information. The size of accounts’ information varies only with the number U of modified accounts,; as in our model they have fixed size of la , this size is simply lacc = U · la bit. On the other hand, the PoMI length lPoMI does not only depend on the number of observed accounts, but also on their position in the state tree. For simplicity, we assume that the tree is perfectly balanced at all levels, and that the location of the observed accounts at the last level is uniformly distributed. The approximation is supported by the fact that Patricia trees are generally well-balanced [27]. However, the fact that the number of proofs that each node can be part of is bounded by the number of descendant leaf nodes, makes the problem of obtaining the distribution of the number of nodes in the PoMI a hard combinatorial problem; even the expected value of the number of nodes is computationally intractable to obtain. Instead, we approximate the number of nodes by relaxing this condition. The resulting approximation captures the characteristics of the PoMI size as the number of modified accounts grows, and is accurate as long as the number of modified accounts is much lower than the total number of leaf nodes in the tree. This is typically the case, as the set of observed accounts is small. Specifically, relaxation results in the following recursive approximation of the expected number of nodes in a PoMI for u accounts, when the tree has height η: u  η Õ 1 ¯ ¯ , (7) Nη (u) = L Nh−1 (u) 1 − L N¯h−1 (u) h=1 with N¯0 (u) = 1. The derivation is given in Appendix A. To obtain the expected number of bits for a PoMI, we assume that the tree does not contain extension nodes, as they have variable size, see [22]. Hence, with this approximation, the internal nodes are only branch nodes. Indicated the size of the output of the hash function with ls , each internal node has fixed size of ls bits. Instead, the leaf nodes are composed by a key and a value, see Sec. II, both containing hash values, resulting in a fixed size of 2 · ls bits. In conclusion, we obtain the expected number of bits required for a PoMI of u accounts: l¯PoMI (u) = ls N¯η (u) + u(2 · ls ).

(8)

B. Transmission duration The total transmission duration Tw depends on F and the number of (re)transmissions that is needed before the packet is successfully received by the IoT device. A frame is transmitted successfully with probability 1− pout , independent of the size of the frame, and hence the number of transmissions is geometrically distributed with the probability mass function k−1 Pr(k transmissions) = pout (1 − pout ).

(9)

Since the rate remains fixed across (re)transmissions, it follows that the probability density function of Tw is ∞ t  Õ R Pr(k transmissions) (10) pTw (t) = pF k k=1 ∞ t  Õ k−1 = (1 − pout ) pF R pout . (11) k k=1 C. Data savings of the aggregation protocol Since the states of accounts can be verified from the state tree root contained in the most recent block header, the aggregation scheme provides data savings by (i) sending a unique PoMI that certifies only the latest state of the modified accounts, (ii) sending only the most updated copy of the account data structure, and (iii) reducing the amount of frame overhead, H. We consider protocol P2 from [7] introduced in Sec. II-C as the benchmark. Recall that P2 requires the device to download the following information at each block period: the frame overhead H, a notification of the new block from each peer, the block header, the PoMI and the account data structures. In addition, with P2, the device receives a notification of new block, indicated as “Block ready” in Fig. 4(a), from each BN, and consequently selects a BN with uplink message, indicated as “Block request” in the same figure. However, to establish a fair comparison with the aggregation protocol, we assume that the BN, in charge of sending the update, is pre-selected via random seed also in P2, removing the need of these messages. The expected amount of bits downloaded with protocol P2 during a block period is ∞ Õ   E F (P2) = H + P = H + lH + a · pDA |B (a|1).

(12)

a=0

The expression is based on (5), and on the fact that exactly one block is generated during a block period. The expected number of bits per block period downloaded using the aggregation protocol proposed in this paper is given by averaging (2): E[F] =

∞ 1 Õ f · pF ( f ), λ · T f =0

(13)

where λ · T is the expected number of blocks within the aggregation period. We can now express the savings of the aggregation protocol as Γ =1 − =1 −

E[F]   = E F (P2) Í∞

f · pF ( f ) . Í∞ λ · T · H + lH + a=0 a · pDA |B (a|1) f =0

(14) (15)

V. E VALUATION To validate our model and show the performance of the aggregation scheme, we have modified the Python implementation of Ethereum protocol, PyEthereum [28]. The system, parametrized as listed in Table I, includes a randomly generated blockchain. This is obtained by generating accounts that

SUBMITTED FOR PUBLICATION (2018)

7

100

TABLE I S YSTEM PARAMETERS

30 dB

contain random information, with size la bits, and inserting them in a newly initialized blockchain database. For the statistical characterization of accounts updates, we take as reference the Ethereum main network as described in the following section. A. Statistical characterization of accounts updates The statistics of account updates plays a fundamental role in the design and evaluation of blockchain protocols. We base our evaluation on the Ethereum main network dataset [29], by analyzing the activity during blocks numbered from 5.1 to 6.4 million. Fig. 6 shows the frequency of updates of the 104 most updated accounts, indexed in descending order of their updates frequencies. To extract this metric, we do not distinguish between transactions from/to the accounts, or consider if there are multiple transaction involving one account in the same block. We model the relative frequency of updates of account j according to the broken power-law: ( α1 j α2 if j ≤ α3, pj = α3α2 −α4 α1 j α4 otherwise. We opt for this function, instead of the plain power-law, adopted e.g. in [30], [31], because the most frequent accounts are associated to web services that provide currency exchanges and are updated at similar rates. The least squares fit gives α1 = 0.63, α2 = −0.37, α3 = 21, α4 = −0.79. This function, also shown in Fig. 6, is used to generate the relative frequencies for our “synthetic” blockchain in the evaluation. In addition to obtaining the account update probabilities, we inspect the accuracy of modelling the number of blocks between two account updates as a geometric distribution as assumed in our model. We compare the empirical cumulative density function (CDF) of an account, j, to the CDF of a geometric distribution with parameter p j . Fig. 7 shows the results obtained for some accounts of the data set. It results that the assumption only holds for frequently updated accounts. This behaviour should be taken into account in future works. B. Validation of Merkle-Patricia proofs length Since the analysis of the length of a Merkle-Patricia proof is based on the assumption that the tree is perfectly balanced, the analysis is validated by comparing analytical results both to numerical results obtained from a perfectly balanced tree, and to measurements obtained from the Merkle-Patricia tree implementation in PyEthereum, which is in general unbalanced, see Fig. 8. In particular, Fig. 8(a) compares the analytical expression for the average number of nodes that compose a

10-3 10-4 100

101

102

103

104

Fig. 6. Relative frequency of updates for the most active accounts, in log-log scale.

1 0.8

x]

γ

10-2

Pr[b

4046 bit 16

j=1

j=5

j=50

j=500

0.6 0.4 0.2 0 100

101

102

103

Number of blocks until next update [x] Fig. 7. Comparison of empirical CDF of accounts (represented with crosses), with index j, with the CDF of geometrical distribution (represented with dots). Experimental (10 5) Numerical (10 5) Analytical (10 5) Experimental (10 6) Numerical (10 6) Analytical (10 6)

1400 1200 1000 # nodes

lH L

800 600 400 200 0 0

5

105

4 3 lPoMI [bits]

Blockchain λ 0.1 s−1 H 1200 la 320 kb ls 256 bit η 5 Communication channel R 250 kbit/s W 180 kHz

Our model Ethereum main network

10-1

2

10 # observed (a)

15

20

10 # observed (b)

15

20

Experim. (10 5) Numerical (10 5) Analytical (10 5) Experim. (10 6) Numerical (10 6) Analytical (10 6)

1 0 0

5

Fig. 8. Comparison of analytical approximation, numerical and experimental results for the (a) number of nodes needed in the PoMI and (b) its length.

PoMI with the experimental data obtained from PyEthereum and with numerical results. The results show that the analytical expression fits the numerical results obtained for a balanced

SUBMITTED FOR PUBLICATION (2018)

8

100

Pr(n)

0.2 0.15

10-2

0.1

10-4

0.05

10-6

0 0

50

100

150

200

250

300

350

400

# nodes n Fig. 9. Probability distribution of the number of nodes in a PoMI, for different number of observed events u, obtained via experiment.

tree; at the same time, it overestimates the average number of nodes, needed for a PoMI, in the Ethereum system. This follows from the fact that the average depth of leaves in a Patricia tree is greater than in a balanced tree [27], implying that some internal levels might not be completely populated, hence the slight reduction in number of nodes needed for the PoMI. Fig. 8(b) compares the length of the PoMI. For the numerical and analytical results, each node is represented by the corresponding hash value, while for the experimental data the PoMI data structure is represented with Recursive Length Prefix (RLP) [15]. The RLP also contains information about the structure of the tree, therefore introducing a small overhead. For this reason, the length for the experimental data is slightly larger than the ones of the numerical and analytical results. The experimental setup also permits to characterize the distribution of the number of nodes in a PoMI, see Fig. 9, in which we show the results obtained for a blockchain with η = 6 levels, completely filled, therefore containing L 6 = 166 accounts, where L is the maximum number of children of a node. The relative position of the accounts in the tree clearly impacts the length of their PoMI and, hence, the communication cost of transmitting them. In addition, the results provide insights on the consequence of using the expected length of PoMI, instead of its distribution, in (6). As the variance of the PoMI distribution remarkably increases with the number of included accounts, u, the precision of the approximation is decreased. On the other hand, its contribution to the total length of the payload, D A, is counterbalanced by the weight of accounts’ data structure, which becomes predominant. This is shown in details in the following text. C. Performance of the aggregation protocol We consider a scenario in which the device is connected to BNs via a communication link parameterized as in Table I. We remark that if the device is solely interested in observing accounts that are updated sporadically, the aggregation protocol only provides reduction of the communication overhead (the frame headers). Therefore, we focus the evaluation on the case where the device observes active accounts. An account, j, is considered active if it is updated at least once every T seconds, with probability PA , i.e.  dT /TB e 1 − 1 − pj ≥ PA (16)

10-8 0

20

40

60

80

100

Fig. 10. Complementary CDF of Tw for two different intervals. 1

10-1

10-2

10-3

0

5

10

15

20

25

30

35

40

[dB] (a) 1

10-1 10-2 10-3 0

0.25

0.5

0.75

1 1.25 R [bit/s]

1.5

1.75

2 106

(b) Fig. 11. Duty cycle of the device (a) for different values of SNR and (b) for different rates.

In the rest of the work, we set PA = 0.9. By requesting updates about more active accounts, than those of actual interest, the device can increase its privacy, at the cost of downloading unnecessary information. The application is further discussed in Sec. V-E. 1) Duty cycle trade-offs: Fig. 10 reports the complementary CDF of the duration of the transmission, Tw , for two deterministic sets of observed accounts: A1 = {1, 2}, that contains the two most frequently updated ones, and A2 = { j |20 < j ≤ 41}, containing the 20 less active accounts when T = 180 s.6 The sets are formed to illustrate two interesting limit scenarios. The probability of channel outage, derived from R and γ of Table I, is 1.6 · 10−3 . The number of observed accounts, and 6 According to our definition, see (16), there are 41 active accounts for T = 180 s and 695 for T = 1800 s.

Data amount [kbits]

SUBMITTED FOR PUBLICATION (2018)

106

P2

T=180 s

9

T=1800 s

104

102 100

H

Block Headers

PoMI

Accounts

Fig. 12. Amount of information downloaded for A3 , during 24 hours, for protocol P2 and for protocol with aggregation with different values of T . 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 200

400

600

800

1000 T [s]

1200

1400

1600

1800

Fig. 13. Gain of the aggregation protocol, analytical expression and simulation.

their statistics, clearly plays a central role in shaping the CDF, as the size of the account data structure is much bigger than the size of a block header, see Table I. The study of the duty cycle of the device, reported in Fig. 11, covers several fundamental trade-offs. Fig. 11(a) shows that the duty cycle decreases when the channel quality (SNR) increases, due to the reduced number of retransmissions, and saturates for high values of SNR, as retransmissions are not likely to happen. A possible strategy that can be adopted by the IoT device, to reduce its duty cycle, is to update less frequently, i.e. increase T, or reduce the set of observed accounts. Fig. 11(b) reports the duty cycle as function of the transmission rate of the wireless link. At low rates, the duty cycle of the device is drastically increased. On the other hand, selecting a high rate causes transmission failures and therefore retransmissions which become dominant when the rate reaches a certain level. 2) Communication cost: The rest of the results focuses on how the different parts of the frame contribute to its total length and on the aggregation gain, defined in Sec. IV-C. We construct a set A3 containing |A3 | = 20 accounts, by randomly selecting among those that are active during T = 1800 s. It should be noted that A2 is a possible realization of A3 . Fig. 12 shows the amount of information, downloaded during 24 hours of execution, for different values of T and different realizations of accounts in A3 . In this scenario, there are no retransmissions; their effect would be a proportional increase in all the quantities. The figure shows that most of communication cost is due to the size of the account data structures, which is an order of magnitude higher w.r.t. the size

of PoMI, and two order of magnitudes higher than the size of the block headers and communication protocol headers. The aggregation gain, Γ, is shown in Fig. 13 for several values of the aggregation period, T, and compared with a simulation of the system. The figure shows a good match between the simulation and the analytics, and that the gain is remarkable, even for small values of T. As T → ∞ the observed accounts are updated almost surely during a period, and will be downloaded in the next transmission. This causes the gain to increase linearly when T is large. D. Remarks on possibilities to further reduce of the communication cost We briefly discuss possible directions for a further reduction of the communication cost for IoT lightweight clients, based on the insights provided by the evaluation of the protocol. The size of the accounts’ data structures has shown a prominent impact on the amount of transmitted data. This can be reduced with several approaches, for example (i) by keeping their size as small as possible, during the design phase of the contract; (ii) by compressing the account information before sending it; (iii) by only sending the portion of account structure that has changed. A completely different approach is to send the updates by means of the transactions tree, when the corresponding accounts are rarely updated. In fact, the size of a transaction is typically lower than the one of the account. In addition, while the state tree grows with the number of accounts, the transactions tree size is limited by the block size. This option, mentioned in Sec. II-C, has not been considered in this paper, as it does not provide aggregation gain. Future works can consider this extension by (i) including in the system model the statistics of the number of transactions, that modify an account, in a single block; (ii) considering also the contribution of the transactions tree to the size of PDA in (5); (iii) finding a strategy to decide if sending the update under the form of updated state, or as collection of transactions. E. Example application: privacy of IoT device We conclude this section by providing an example application of the aggregation protocol. Consider a scenario in which the IoT device is solely interested in observing one (active) account, indexed as j ?. However, to keep j ? private, it requests updates about additional accounts, in which it is not interested, from BNs (privacy by obfuscation) [14]. A malicious BN is aware that the IoT device is interested in one account and applies an outlier detection technique to find it. In the presented model, the only feature available to the BN is the relative frequency of update of accounts. For both sides (device and BN), it is reasonable to assume that the set of observed accounts, indicated as A4 , only contains active accounts, since non active accounts would be excluded by the outlier detection. Based on these considerations, the IoT device constructs A4 by adding j ? and other random active accounts. The construction starts with A4 = { j ? }, then |A4 | is iteratively incremented. At each iteration, the set of active accounts is

SUBMITTED FOR PUBLICATION (2018)

10

EXh [Xh |Nh−1 = nh−1, U = u] = Pr(Xh = 1|Nh−1 = nh−1, U = u), and the expected total number of ancestor nodes at level h is E Nh [Nh |Nh−1 = nh−1, U = u] = Lnh−1 · EXh [Xh |Nh−1 = nh−1, U = u]. By the law of total expectation,   E Nh [Nh |U = u] = E Nh−1 E Nh [Nh |Nh−1, U = u] |U = u   = E Nh−1 LNh−1 · EXh [Xh |Nh−1, U = u] |U = u    u 1 U=u . = E Nh−1 LNh−1 · 1 − LNh−1

4

E[F] [bits]

10 8 6 4 2

0 40 35 30 25 20 15 10 5 0

360180 1080 720 1800 1440 T [s]

Fig. 14. Communication cost as function of | A4 | and T .

By applying a first-order Taylor expansion at E Nh−1 [Nh−1 |U = u] we obtain E Nh [Nh |U = u] ≈

(17) 

split in |A4 | segments and one account is randomly picked from each segment ( j ? is always picked among the accounts in its segment). The iteration is repeated until the tolerated communication cost, expressed by (13), is reached. Finally, A4 is sent to the BN. There is a trade-off between the delay, given by the aggregation protocol, and privacy, i.e. |A4 |. This tradeoff is shown in Fig. 14, for different tolerated communication costs E[F], and j ? = 41 (that is an active account). A further improvement, not implemented in this paper, is impose that accounts in A4 should be located in proximity of each other in the state tree. In fact, this provides shorter PoMI and therefore lower communication cost, see Fig. 9. VI. C ONCLUSION In this paper, we have investigated what is the communication cost of sending blockchain information to Ethereumlike lightweight clients. A novel aggregation scheme has been proposed that has the potential to obtain a lower communication cost, at the expense of higher information delay, or availability of information, at the application layer. The analysis of the scheme showed the probability distributions of the data structures exchanged over the wireless link, and their impact on the total downlink budget. Finally, the results show that, if the statistics of account updates and the channel state are known, the lightweight clients can construct a list of events of interest that provides a predictable average communication cost. The example application illustrated how to apply our findings to improve the privacy of IoT devices. The guidelines presented in this paper can be applied to design more advanced blockchain lightweight protocols. A PPENDIX A D ERIVATION OF THE EXPECTED NUMBER OF NODES IN A P O MI Under the relaxation described in Sec. IV, the probability that an arbitrary node at level h is ancestor to one of u modified leaf nodes (denoted by the binary random variable Xh ) is Pr(Xh = 1|Nh−1 = nh−1, U = u) = (1 − 1/(Lnh−1 ))u , where Nh−1 is the number of nodes at level h − 1 that are ancestors to a modified leaf node and L is the branching factor of the tree. Since Xh is a binary random variable,

1 L E Nh−1 [nh−1 |U = u] 1 − LE Nh−1 [nh−1 |U = u]

u

.

Denoting N¯h (u) = E Nh [Nh |U = u] and using the fact that the PoMI will always contain a single root to define N¯0 (u) = 1, (17) can be obtained by recursion. The approximated number of nodes for a complete PoMI, in a tree of height η, is the sum of the nodes required at each level which yields (7): N¯η (u) =

η Õ

N¯h (u).

h=1

R EFERENCES [1] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” [Online]. Available: https://bitcoin.org/bitcoin.pdf, 2008, accessed: 2018-07-17. [2] K. Christidis and M. Devetsikiotis, “Blockchains and smart contracts for the internet of things,” IEEE Access, vol. 4, pp. 2292–2303, 2016. ˇ Stefanovi´c, and P. Popovski, “Dis[3] P. Danzi, M. Angjelichinoski, C. tributed proportional-fairness control in microgrids via blockchain smart contracts,” in 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm), Oct 2017, pp. 45–51. [4] J. Horta, D. Kofman, D. Menga, and A. Silva, “Novel market approach for locally balancing renewable energy production and flexible demand,” in 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm), Oct 2017, pp. 533–539. [5] E. M¨unsing, J. Mather, and S. Moura, “Blockchains for decentralized optimization of energy resources in microgrid networks,” in Control Technology and Applications (CCTA), 2017 IEEE Conference on. IEEE, 2017, pp. 2164–2171. [6] D. Perard, J. Lacan, Y. Bachy, and J. Detchart, “Erasure code-based low storage blockchain node,” Preprint available: https://arxiv.org/abs/1805.00860, 2018. [7] P. Danzi, A. E. Kalor, C. Stefanovic, and P. Popovski, “Analysis of the communication traffic for blockchain synchronization of iot devices,” in 2018 IEEE International Conference on Communications (ICC). IEEE, 2018, pp. 1–7. [8] M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang, “A first look at cellular machine-to-machine traffic: large scale measurement and characterization,” ACM SIGMETRICS performance evaluation review, vol. 40, no. 1, pp. 65–76, 2012. [9] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5g: Physical and mac-layer solutions,” IEEE Communications Magazine, vol. 54, no. 9, pp. 59–65, 2016. [10] A. Kiayias, A. Miller, and D. Zindros, “Non-interactive proofs of proofof-work,” Cryptology ePrint Archive, Report 2017/963, 2017. Accessed: 2017-10-03, 2017. [11] A. Palai, M. Vora, and A. Shah, “Empowering light nodes in blockchains with block summarization,” in New Technologies, Mobility and Security (NTMS), 2018 9th IFIP International Conference on. IEEE, 2018, pp. 1–5. [12] “The ethereum-blockchain size will not exceed 1tb anytime soon,” [Online]. Available: https://dev.to/5chdn/the-ethereum-blockchain-sizewill-not-exceed-1tb-anytime-soon-58a, accessed: 2018-07-17.

SUBMITTED FOR PUBLICATION (2018)

[13] “Slock.it incubed client,” [Online]. Available: https://slock.it/ incubed.html, accessed: 2018-07-17. [14] D. Gruber et al., “Unifying lightweight blockchain client implementations,” NDSS 2018 - Workshop on Decentralized IoT Security and Standards, 2018. [15] G. Wood, “Ethereum: A secure decentralised generalised transaction ledger,” [Online]. Available: http://gavwood.com/paper.pdf, 2014, accessed: 2018-07-17. [16] Y. Zhang, S. Kasahara, Y. Shen, X. Jiang, and J. Wan, “Smart contractbased access control for the internet of things,” IEEE Internet of Things Journal, pp. 1–1, 2018. [17] O. Alphand et al., “Iotchain: A blockchain security architecture for the internet of things,” in Wireless Communications and Networking Conference (WCNC), 2018 IEEE. IEEE, 2018, pp. 1–6. [18] A. Narayanan, J. Bonneau, E. Felten, A. Miller, and S. Goldfeder, Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction. Princeton University Press, 2016. [19] A. Kiayias, A. Russell, B. David, and R. Oliynykov, “Ouroboros: A provably secure proof-of-stake blockchain protocol,” in Annual International Cryptology Conference. Springer, 2017, pp. 357–388. [20] V. Buterin, “Merkling in ethereum,” [Online]. Available: https://blog.ethereum.org/2015/11/15/merkling-in-ethereum/, accessed: 2018-07-17. [21] M. Al-Bassam et al., “Fraud proofs: Maximising light client security and scaling blockchains with dishonest majorities,” ArXiv preprint arXiv:1809.09044v1, 2018. [22] D. R. Morrison, “Patricia - practical algorithm to retrieve information coded in alphanumeric,” Journal of the ACM (JACM), vol. 15, no. 4, pp. 514–534, 1968. [23] F. Zsolt, “Client side flow control model for the les protocol,” [Online]. Available: https://github.com/zsfelfoldi/go-ethereum/wiki/ClientSide-Flow-Control-model-for-the-LES-protocol, 2016, accessed: 201807-17. [24] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” IEEE Transactions on Information Theory, vol. 63, no. 11, pp. 7492–7508, Nov 2017. [25] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge University Press, 2005. [26] C. Decker and R. Wattenhofer, “Information propagation in the bitcoin network,” in Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on. IEEE, 2013, pp. 1–10. [27] B. Rais, P. Jacquet, and W. Szpankowski, “Limiting distribution for the depth in patricia tries,” SIAM Journal on Discrete Mathematics, vol. 6, no. 2, pp. 197–213, 1993. [28] “Pyethereum,” [Online]. Available: https://github.com/ethereum/pyethereum, accessed: 2018-07-17. [29] “Google cloud blog,” [Online]. Available: https://cloud.google.com/ blog/products/data-analytics/ethereum-bigquery-public-dataset-smartcontract-analytics, accessed: 2018-09-24. [30] M. Lischke and B. Fabian, “Analyzing the bitcoin network: The first four years,” Future Internet, vol. 8, no. 1, p. 7, 2016. [31] S. Somin, G. Gordon, and Y. Altshuler, “Social Signals in the Ethereum Trading Network,” Preprint available: https://arxiv.org/abs/1805.12097, 2018.

11

Suggest Documents