Mar 3, 2017 - centers operated by government institutions and corporations (legal entities, separate and distinct from their ... https://bitcoin.org/bitcoin.pdf. 7 ...
Bitcoin and Blockchain mechanism Mariem Hammami GOV-SUR March 3, 2017
2
Contents 1 Mechanism of Bitcoin 1.1 Transactions . . . . . . . . . . . . . 1.2 Timestamping . . . . . . . . . . . . 1.3 Proof of work . . . . . . . . . . . . 1.4 Network . . . . . . . . . . . . . . . 1.5 Fees . . . . . . . . . . . . . . . . . . 1.6 Root hash/ Merkle tree . . . . . . 1.7 Splitting values Inputs/ Outputs 1.8 Privacy . . . . . . . . . . . . . . . . 1.9 Simple Payment Verification SPV 1.10 Probability analysis of an attack . 2 Bitcoins analysis 2.1 Nodes . . . . . . . . . . . . . 2.1.1 Nodes distribution . 2.2 Price . . . . . . . . . . . . . 2.3 Bitcoin wallet . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . .
. . . .
Bibliography
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
7 8 9 9 11 12 13 14 15 16 16
. . . .
19 19 20 22 23 27
3
CONTENTS
CONTENTS
4
Introduction Why is the cryptocurrency Bitcoin considered to be a revolutionary technology? All we create online and all the operations we execute are usually provided by centralized servers. We depend on these servers for almost everything: they process our payment
Figure 1: Centralized server and track our shopping carts, they send our email and store it, they verify our identities for websites and smartphone applications, etc. Most of these servers are in massive data centers operated by government institutions and corporations (legal entities, separate and distinct from their owners). This means that we don’t really own our data, we only visit it from time to time. In order to access to our data and manipulate it, we need a machine to transport us from a digital room to another digital room. All these machines were designed to be controlled by a single person or a hierarchical organization of people who trust each other. All of this make us very vulnerable. In addition, concerning commerce on the net, nowadays we almost exclusively rely on financial institutions to process electronic payments. Completely non-reversible transactions (exchanges of goods between a buyer and a seller) are still not possible because of the institutional policy of controlling the cost of transactions and limiting the minimum practical transaction size. With the possibility to reverse the sent amounts, the operators (sellers) must be very prudent about their customers. They could ask their costumers for more information than 5
CONTENTS
CONTENTS
they need and fraud is still possible in this case. So what is needed is a cryptocurrency (a medium of exchange using cryptography to secure the transactions), in other words, an electronic payment system built on cryptographic proof rather than trust, which allows any two agreed parties to interact directly with each other without the need for a third trusted party with irreversible transactions. That is exactly what Bitcoin does, its technology has become the first decentralized cryptocurrency in 2009 (since released as open-source software). In addition to that, some routine escrow mechanisms can be implemented to also protect the buyers. In this document, we will introduce how Bitcoin works and we will study the mechanism of bitcoins which is also called blockchain mechanism.
Figure 2: Decentralized network
6
Chapter 1 Mechanism of Bitcoin Definition 1.0.1. A peer-to-peer system (P2P) is a decentralized communication model, where all parties have equal capabilities and each one of them can set up a communication: broadcast a message. Bitcoin is a peer-to-peer electronic payment system. It allows payment online without operating through a financial institution. In this chapter, we will explain how this system works. 1 A node refers to any computer that is running the Bitcoin client software and cooperating in the peer-to-peer network by broadcasting transactions. The P2P version allows each node to operate as both a client and a server at the same time.
Figure 1.1: A pays B with bitcoins: P2P system
1
This lecture note is based on Bitcoin: A Peer-to-Peer Electronic Cash System by Satoshi Nakamoto https://bitcoin.org/bitcoin.pdf
7
1.1. TRANSACTIONS
1.1
CHAPTER 1. BLOCKCHAIN
Transactions
An electronic coin is considered as a chain of digital signatures. An owner does a transaction by transferring an amount of his bitcoins to another user. The transactions are saved in this following way: 1) We calculate the hash (see Definition 1.1.1) of the previous transaction of the coin and the public key of the next owner. 2) The owner of the coin signs the output of the hash with his private key 3) We add this to the end of the coin. Definition 1.1.1. The hash h is a function that maps any input k (called key) to a finite distinct value h(k) such that: (1) For two keys k1 ≠ k2 , we have with strong probability that h(k1 ) ≠ h(k2 ). (2) For k1 ≠ k2 , knowing h(k1 ) gives no information about h(k2 ). For example, if k1 and k2 are exactly the same, except for one bit, then every bit in h(k2 ) changes with 1/2 probability compared to h(k1 ). It means knowing the bits of h(k1 ) does not give any information about the bits of h(k2 ). (3) Knowing the hash of an input gives us no information about it. The chain of the ownership can be verified by checking the signatures and the public keys of the previous owners (See Figure 1.2 [8]).
Figure 1.2: Bitcoin: Chain of digital signatures Problem 1.1.2. A payee cannot verify if one of the owners double-spent the coin. 8
CHAPTER 1. BLOCKCHAIN
1.2. TIMESTAMPING
Solution 1.1.3. Put in place a trusted central authority or a mint to check every transaction and ensure that the coins have not been double-spent. After each transaction, the coin is returned to the mint to generate a new one. Only coins issued from the mint are safe and trusted. Remark 1.1.4. This solution works, but in this case, the mint is playing a central role and the company that is managing the mint controls the entire money system, exactly like a bank, since every transaction must go through them. We definitely don’t want that, so we need a better solution to help a payee verify that an owner did not double spending. The solution is the following. Solution 1.1.5. Since the only way to confirm the absence of a transaction is being aware of all transactions, we announce publicly all the transactions. We establish a system for the participants such that they have to agree on one history of the order in which the coins were received. The earliest transaction is the one that counts, reason why we do not care about later attempts to double-spent. Using this solution, the payee needs time proof, i.e. in that recorded time the majority of nodes agreed that, for each transaction, it was the first received and for that we use a timestamp server.
1.2
Timestamping
Definition 1.2.1. If you download a bitcoin wallet, you receive a private key (half of a digital signature that proves you are the owner) and a public key (which is the corresponding half). A blockchain is a chain of blocks, where the blocks are strongly related and each block contains a record of some public keys, meaning the records of a number of transactions. The block size is limited to 1 MB. The timestamp server takes the hash of the block, timestamps it and publish it. So each timestamped hash actually contains the previous timestamp, which proves that the data must have existed at that time. All of this forms a chain, each time we add a timestamp, it reenforces the ones before it (see Figure 1.3).
1.3
Proof of work
Definition 1.3.1. A proof of work POW system/protocol is a computational problem that produces a piece of data, hard (costly) to find and easy to verify. 2 In Bitcoin, we use Hashcash proof of work for block generation. It is a peace of data (output of a hash function) that satisfies certain given requirements, accessible for others to verify but very costly to produce, in other words, difficult to reconstruct. Here is how it works: The proof of work searches for a precise value that when it is hashed (with SHA-256, the Secure Hash Algorithm from the National Security Agency, which generates a hash function with output of length 256 bits) the output of the hash begins with a number of zero bits required. 2
For more information about this subject, check https://en.wikipedia.org/wiki/Proof-of-work_ system
9
1.3. PROOF OF WORK
CHAPTER 1. BLOCKCHAIN
Figure 1.3: Timestamping Authority TSA In our timestamp network, we implement the work of proof by incrementing a nonce (a number at once) in the block until a value that gives the block’s hash with the required zero bits is found. The complexity is exponential in the number of zero bits required but the found value can be easily verified by doing a single hash. Let us see an example 3 . Example 1.3.2. Suppose the given data is the base string "Hello, world!". We search for the needed variation such that the output of the hash SHA-256 begins with ’000’. We vary the string by adding an integer value (the nonce) to the end and incrementing it each time. Finding the right match took 4251 tries in this example. 1
3
5
" Hello , world !0 " 1312 af178c253f84028d480a6adc1e25e 81 caa44c749ec81976192e2ec934c64 " Hello , world !1 " e 9 a f c 424b79e4f6ab42d99c81156d3a1722 8 d6e1eef4139be78e948a9332a7d8 3
This example is taken from https://en.bitcoin.it/wiki/Proof_of_work
10
CHAPTER 1. BLOCKCHAIN 7
9
1.4. NETWORK
" Hello , world !2 " a e 3 7 3 43a357a8297591625e7134cbea22f 5928 be8ca2a32aa475cf05fd4266b7
11
--------------------------------------------------------13
15
17
19
21
" Hello , world !4248 " 6 e 1 1 0 d98b388e77e9c6f042ac6b497cec4 6660 deef75a55ebc7cfdf65cc0b965 " Hello , world !4249 " c 0 0 41 90b822f1669cac8dc37e761cb736 52 e7832fb814565702245cf26ebb9e6 " Hello , world !4250 " 0000 c3af42fc31103f1fdc0151fa747ff8 7349 a4714df7cc52ea464e12dcd4e9
Once the CPU effort has been expended to realize the proof of work, the block cannot be modified without redoing the proof of work (which is costly to produce). In addition to that, later blocks are chained after to it, which means the work to change a block includes redoing all what comes after it.
Figure 1.4: Chain of blocks
1.4
Network
This is how the network is running: 1. New transactions are communicated to all the nodes. 2. Each node collects new transactions and put it in a block. 3. Each node searches for a difficult proof of work for its block. 4. When a node finds a proof of work, it communicates the block to all the other nodes. 11
1.5. FEES
CHAPTER 1. BLOCKCHAIN
5. The nodes check the block, its proof of work and the validity of all the transactions in it (meaning no double spending transaction). 6. If the nodes accept the block, they start working on creating the next block in the chain with the hash of the accepted block as the previous hash. Nodes always consider the longest chain (the greatest invested proof of work effort) to be the correct one and they keep working on it to extend it, as in the following figure, the correct chain is the blue one.
If two nodes (or more) publish different versions of the next block at the same time, some nodes may receive one or the other first, so they start working on the one they received first and save the other one in case it becomes longer. Eventually the tie will be broken when the next proof of work is found and that chain becomes longer. The nodes that were working on the other chain will switch to the longer one. It is no problem if the new transactions do not reach all nodes. Since they reach many nodes, they will get into a block. If a node does not receive a block, when it receives the next block, it will realize that it missed one and it can always request from the other nodes the missed block.
1.5
Fees
In this section we explain how to motivate the nodes to stay honest. 1. The first transaction in every block is a special transaction, it is a reward for the creator of the block, this motivates the nodes to support the network and provides a way to initially distribute bitcoins into circulation (since there is no central authority to issue them). 2. The transaction fees: The user must pay an insignificant fee for each transaction. Standard fees = 0,0001 BTC 4 . 4
The website https://bitcoinfees.21.co/#fees gives the actual fastest and cheapest transaction fee with the delay which is the predicted number of blocks that will confirm the transactions. For example if transactions are predicted to have a delay between 1-3 blocks, there is a 90% chance that they will be confirmed within that range (ESTIMATED TIME) meaning around 10 to 30 minutes in this case.
12
CHAPTER 1. BLOCKCHAIN
1.6. ROOT HASH/ MERKLE TREE
Every 210’000 new blocks the reward is cut in half. At first, in 2009 in was 50 BTC, in 2012 in became 25 BTC and in 2016 it become 12,5 BTC. After 64 halving event it will be reduced to almost 0. Hence, the number of bitcoins is limited and we have already created 75% of it. Once the required number of bitcoins have entered circulation, the incentive can entirely be inflation free.
1.6
Root hash/ Merkle tree
Definition 1.6.1. The Merkle tree (also called the hashtree) is a binary tree (see Figure 1.5) of hashes in which the leaves are hashes of data blocks, every non-leaf node is labeled with the hash of the labels/values of its child nodes. It allows efficient and secure verification of the contents of large data structure. Given a trusted tophash (the root node of the hashtree), the hashtree can be received from any non-trusted source (since it is a P-2-P network), then the received hash tree is verified using the trusted tophash. If the received hashtree is fake or damaged, another source will be checked until the program find the matching hashtree with the tophash.
Figure 1.5: Binary Tree When the transition is buried under enough blocks, before the spent transactions can be neglected to save disk space. How to reclaim disk space? To facilitate saving disk space without breaking the block’s hash, the transactions are hashed in a Merkle tree, in particular, only the roots (called Merkle Root or Root Hash) are included in the blocks hash (see Figure 1.6). Since the interior hashes do not need to be saved, we compress the old blocks by taking off the branches of the tree (see Figure 1.7). Remark 1.6.2. The main way of identifying a block in the blockchain is via its block header hash which is calculated by running the block header through the SHA256 twice. A block header with no transactions in it takes about 80 Bytes. If block headers are generated every 10 minutes, then we have 6 generated every hour, then we have per year: 80 Bytes ⋅ 6 ⋅ 24 ⋅ 365 = 4.2 MB 13
1.7. SPLITTING VALUES INPUTS/ OUTPUTS
CHAPTER 1. BLOCKCHAIN
Figure 1.6: Hashtree
Figure 1.7: Compressing the Hashtree: Merkle branch
1.7
Splitting values Inputs/ Outputs
It is possible to deal with coins individually, but it would be inefficient to make a separate transaction for each cent in a transfer. In order to allow a value to be split and combined, transactions contain multiple inputs and outputs. Usually from a larger previous transaction there are a single input or multiples inputs that combine smaller amounts. For the outputs, there are at most two: One for the payment and if there is any change, then there is an input that returns it back to the sender. In Figure 1.8, we give an example of two transactions with different inputs and outputs. 14
CHAPTER 1. BLOCKCHAIN
1.8. PRIVACY
Figure 1.8: Example of inputs and outputs in two transactions
1.8
Privacy
The traditional banking model assures a level of security by limiting access to information for only the parties involved including the trusted third party and excludes the public.
Figure 1.9: Traditional privacy model Contrary to the peer-to-peer system, where by publishing all transactions, there is no longer need to the trusted third party and the privacy is preserved by keeping public keys anonymous. In that way, the public can see that someone did send to someone else
Figure 1.10: New P2P privacy model an amount but they cannot link the transaction to anyone, i.e. they cannot identify the sender neither the receiver. To keep transactions from being linked to a common owner, for each transaction we use new key pair. Problem 1.8.1. With multi-input transactions, some linking is still inevitable, because these transactions reveal necessarily that their inputs belong to the same owner. If an owner of key is exposed, then by linking the key to the transactions, all transactions that belong to him can be revealed. 15
1.9. SIMPLE PAYMENT VERIFICATION SPV
CHAPTER 1. CALCULATIONS
Solution 1.8.2. One of the most recent solution for this problem is Coinparty.
1.9
5
Simple Payment Verification SPV
To verify payments without running a node, a user needs to keep a copy of only the block headers of the longest proof of work chain. He obtains the Merkle branch by linking the transaction to the block where it is timestamped. He cannot check the transaction for himself, but when he links the transaction to a place in a chain, he can see if the nodes has accepted it (when blocks are added after it). Problem 1.9.1. This verification is reliable as long as honest nodes control the network. If an attacker overpowers the Bitcoin network, nodes can be fooled by an attacker’s made transactions. Solution 1.9.2. A strategy to protect against this is developing alerts from network nodes to detect invalid blocks, we motivate the users to download the full block and alerted transactions. Business that receive frequent payments will probably prefer to keep running their own nodes for independent security.
1.10
Probability analysis of an attack
Imagine the following scenario: An attacker tries to generate a "fake" chain faster than the honest chain. Even in worst case when an attacher manages to accomplish that, it doees not make the system vulnerable to arbitrary changes like creating money out of no where or taking money that never belonged to the attacker. Since nodes do not accept an invalid transaction (as payment) and honest nodes do not accept a block containing them, an attacker can only try to modify his own transactions to take back a recently spent money. The race between the honest chain and an attacker can be considered as a binomial random walk (see Definition 1.10.1). Definition 1.10.1. The binomial random walk is a mathematical path that consists of a succession of random steps, where the probability to take a step further (+1) is a given p and to take a step back (−1) is 1 − p depending on the actual position. It our case, we have: (+1): Success event, when a honest chain is extended by one block. (−1): Failure event, when the attacker’s chain is extended by one block. We can compute the probability that the generated alternative chain (an attacker’s chain) catches up with the honest chain as follows: p = Probability that honest nodes find the next block. q = Probability that an attacker finds the next block. 5
Check the research article CoinParty: Secure Multi-Party Mixing of Bitcoins by Jan Henrik Ziegeldorf, Fred Grossmann, Martin Henze, Nicolas Inden, Klaus Wehrle Communication and Distributed Systems (COMSYS), RWTH Aachen University, Germany
16
CHAPTER 1. CALCULATIONS qz = Probability that an attacker will catch up from z block behind. ⎧ ⎪ if p ≤ q ⎪1 qz = ⎨ q z ⎪ if p > q ( ) ⎪ ⎩ p By hypotheses p > q, (by distribution of CPU). When the number of block z that the attacker have to catch up increases, the probability qz decreases exponentially, i.e. very fast. qz ÐÐ→ 0 z→∞
Now the question is: How much of blocks need to be added after a given transaction, to be sure that the transaction is secure, i.e. to be sufficiently sure that the sender can no longer modify it? Consider an attacker with the nodes against him, if he does not a lucky move early enough so that p ≤ q, his chances will vanish as he falls behind (qz ÐÐ→ 0). z→∞ Now assume that some sender of a transaction is an attacker, who wants to make a recipient (the person who is expecting money) that he have paid him for a while, then take back the money after some time. The receiver (the instrument who is receiving the money) will be alerted but the attacker would hope that will be too late. To prevent the sender from having time to prepare in advance an alternative chain of blocks by working on it continuously until he gets far enough ahead with a sufficient number of blocks until p ≤ q, the receiver generates a new key pair and handles the public key to the sender only before signing. Once the transaction is sent, the attacker starts preparing in secret a parallel chain containing an alternative modified version of his transaction. Therefore the recipient waits until the transaction has been added to a block and z blocks have been linked after it. The recipient does not know the exactly how much progress the attacker has made. But we assume that the honest blocks take the average expected time per block. With the new hypotheses, the attacker’s potential progress will be a Poisson distribution with parameter λ , q λ ∶= z p So the probability that the attacker can still catch up when he is z blocks behind is: λk e−λ p(k) k! k=0 ∞
qz = ∑
Where p(k) is the probability that the attacker can still catch up at the point k, i.e. after he have managed to creat k blocks. z−k ⎧ ⎪ ⎪( pq ) p(k) = ⎨ ⎪ ⎪ ⎩1
if k ≤ z if k > z
By rearranging qz is equal to: λk e−λ q z−k (1 − ( ) ) k! p k=0 z
1−∑
Example 1.10.2. [8] To apply this result on several examples we use the following code C. 17
CHAPTER 1. CALCULATIONS
1
3
5
7
9
11
13
15
# include < math .h > double AttackerSuccessProbability ( double q , int z ) { double p = 1.0 - q ; double lambda = z * ( q / p ) ; double sum = 1.0; int i , k ; for ( k = 0; k