An Adaptive Load-balanced Partitioning Module in ...

6 downloads 3462 Views 1MB Size Report
Sep 20, 2015 - Internet technology and the advent of the cloud computing a need for new tools ... proposed module by using the Yahoo Cloud Serving Benchmark. ..... 20.https://datastax.github.io/python-driver/api/cassandra/policies.html ...
An Adaptive Load-balanced Partitioning Module in Cassandra using Rendezvous Hashing Sally M. Elghamrawy MISR Higher Institute for Engineering and Technology, Mansoura, Egypt [email protected], [email protected]

Abstract. With the rapid growth and use of social networks, the appearance of Internet technology and the advent of the cloud computing a need for new tools and algorithms is appeared to handle the challenges of the big-data. One of the key advances in resolving the big-data challenges is to introduce scalable storage systems. NoSQL databases are considered as efficient big data storage management systems that provide horizontal scalability. To ensure scalability of the system, data partitioning strategies must be implemented in these databases. In this paper, an Adaptive Rendezvous Hashing Partitioning Module (ARHPM) is proposed for Cassandra NoSQL databases. The main goal of this module is to partition the data in Cassandra using rendezvous hashing with proposing a Load Balancing based Rendezvous Hashing (LBRH) algorithm for guaranteeing the load balancing in the partitioning process. To evaluate the proposed module, Cassandra is modified by embedding the APRHM partitioning module in it and a number of experiments are conducted to validate the load balancing of the proposed module by using the Yahoo Cloud Serving Benchmark. Keywords: Cassandra; NoSQL Database; Rendezvous Hashing; Partitioning; Consistent hashing; Load Balancing.

1 Introduction Over the past decade, information and internet technology had grown rabidly, resulting in data explosion, and the traditional data processing applications became not suitable for handling these data. The term Big Data [1] appeared with its associated techniques and technologies to manage the massive amounts of datasets. Big data had become related to all aspects of human life. The researchers found that there is a vital need to address the challenges of storing Big Data. The relational database RDBMS can’t handle big data. NoSQL [2] (meaning ‘not only SQL’) databases are used for storing big data due to its ability to expand easily according to the data scale. The data in NoSQL database must be distributed over heterogeneous servers. NoSQL databases provide shared-nothing horizontal scalability. To ensure this scalability, data partitioning strategies must be implemented. Sharding is the horizontal partitioning of data, it means the ability to shard the database and then distribute data stored in each shard. Range, random and hashing partitioning techniques are presented for NoSQL databases to partition data on different nodes. Some NoSQL databases like Apache HBase [3] uses range partitioning, while the most common NoSQL databases like

Google BigTable [4] and Cassandra [5] used by Facebook adopt in consistent hashing [6] as their partitioning strategies. The basic consistent hashing used in Cassandra presents a load balancing problem because it ignores the nature of nodes being assigned the data to them, it only depends on blind hashing. There have been great efforts done to adopt inconsistent hashing to solve load balancing problem [7, 8]. Unlike consistent hashing, Rendezvous hashing (HRW) uniformly distributes records of database over nodes using specific hash function. In this paper an Adaptive Partitioning module is proposed based on Rendezvous Hashing (ARHPM) for Cassandra databases, with proposing an associated Load Balancing LBRH algorithm to guarantee the load balancing in the partitioning process. To enhance the speed of the partitioning process, a spooky hash function is used in ARHPM that proved to be twice the speed of the Murmur hash function used by Cassandra consistent hashing. In addition, using a virtual hierarchical structure with the Highest Random Weight (HRW) [9] algorithm reduce time taken by consistent hashing in precomputing and tokens storage processes. The rest of the paper is organized as follows. Section 2 covers related partitioning studies. Section 3 demonstrates the proposed Adaptive Rendezvous Hashing Partitioning module, the Load Balancing algorithm based on Rendezvous Hashing LBRH is shown too. Cassandra performance is evaluated in section 4 showing the effect of the load balancer sub module in ARHPM, and the results are compared against recent work’s results.

2 Related Work There are enormous interests of researchers in investigating the partitioning strategies of NoSQL databases due to its impact on the performance of systems. Hash, range and hybrids between hash and range partitioning are the most common strategies used by NoSQL systems. The types of data partitioning are summarized by Ata Turk et al [10], showing the advantages and disadvantages of each type. A number of researchers have developed different methods to enhance in Cassandra performance and in partitioning strategies of different NoSQL databases: Ata Turk et al [10] proposed data partitioning method based on hypergraph that is constructed to correctly model multi-user operations. Abramova et al [11] analyzed the scalability of Cassandra by testing the replication and data partitioning strategies used. Lakshminarayanan [7] proposed an adaptive partitioning scheme for consistent hashing that effect in the heterogeneity of the systems. X. HUANGET AL [12] proposed a dynamic programming algorithm in the consistent hashing ring based on imbalance coefficient for Cassandra cluster to calculate the position of the new coming node. Ramakrishnan, et al. [13] proposed a processing pipeline using the Random Partitioner of Cassandra to allow any non-Java executable make use of the NoSQL and allowing offline processing for Cassandra. Zhikun et al. [14] proposed a Hybrid Range Consistency Hash (HRCH) partition strategy for NoSQL database to improve the degree of processing and the data loading speed. Most of these approaches either consider a scalability test of the NoSQL database or enhancing in consistent hashing of Cassandra. Cassandra had three basic partitioning strategies: Random Partitioning, ByteOrdered and Murmur3 partitioning.

To the best of our knowledge, it’s the first attempt to replace the whole partitioning module of Cassandra with the rendezvous hashing technique and test their scalability, load balancing, and timing of partitioning process.

3 The proposed Adaptive-Rendezvous Hashing partitioning module (ARHPM) This section’s main goal is to demonstrate the proposed Adaptive-Rendezvous Hashing partitioning module ARHPM using Highest Random Weight (HRW) algorithm based on the Hash Based virtual hierarchies [9], in partitioning Cassandra. In addition, a proposed load balancing algorithm is presented. The proposed ARHPM presents a load balancing scheme to ensure that any exchange of load or capabilities between nodes will ultimately reduce the large potential value and will lead the system to be in a more balanced state, as will be shown in details. The implementation of the proposed partitioning module is based on load balancer and rendezvous hash partition using spooky hash function. The goal of the proposed partitioning module is to improve Cassandra in the following: (1) Manage the load of forward request. (2) Balance the load among each node by taking in consideration the node’s storage capacity using Load Balancing based Rendezvous (HRW) Hashing algorithm LBRH. (3) Enhance the timing of hashing by using the spooky hash function and using the rendezvous hashing that dispense of using the virtual servers in Cassandra. The proposed ARHPM module consists of three main sub-modules, as shown in figure 1. Load Requests Manager Workload analyzer Validator

Load Detector

Request Validator

Load Balancer Operation Collector

Balance coordinator

Hashing Coordinator Adaptive Rendezvous Hashing Applier

Balancing algorithm

Spooky Hash Function Node/Key Assigner

Bottleneck Detector

Evaluator

Cassandra Data model Fig. 1: The proposed Adaptive-Rendezvous Hashing partitioning module (ARHPM)

The Load Request Manager Sub Module: Its main goal is to control the requests coming from different nodes, it consists of: Workload analyzer: It is used to analyze the workload in each node in the system and caches them, and also caches the most frequently appeared node locations in a specific Cassandra cluster. Based on Heuristic 1: Suppose that there is a key with name K and a set of nodes N= {𝒏𝟏 , 𝒏𝟐 , 𝒏𝟑 , … 𝒏𝒊 }. Each node n∈ 𝑵 is defined in terms of: =, where capacity(n) indicates the ability of the node to accept number of keys and queries. Active degree(n) indicates the number of finished queries. Dependency(n) is concerned of how many times the node asks the help of another node. Load detector: manages the load requests and checks the ability for managing this load. Request validator: It validates the requests sent to the system and parsing the query if it contains any unrecognizable format. The load Balancer Sub Module: Cassandra's current load balancing [20] depends on transferring most of keys requests of the most loaded node, this operation does not guarantee node balancing. As a result, many attempts were proposed to build load balancers in front of Cassandra to ensure the balancing [7,8,12], but this load balancer itself can be the bottleneck in the system. In the proposed module, a load balancer will process in a different way by using the benefits of current Cassandra load balancing policy without transferring any data. The load balancer manages the balancing in the partitioning process without affecting the performance. The operation collector can record the nodes participated and the operations which are done in the system and the load statuses for each node collected in the previous sub module. Unlike the consistent hashing techniques that categorize the nodes to be allocated based on their distance between each other which result in non-uniform distribution. In other hand, using the rendezvous hashing (HRW) will make each node has the opportunity to equally receive the key K, which result in uniform the load among nodes. The operation collector main goal is to make each node exchange information about one another. Each node will be associated with its corresponding load. Balance Coordinator: it decides each node's role in the balancing procedure according to the list created from operation collector. The balance coordinator’s main goal is to implement the proposed Load Balancing based Rendezvous Hashing algorithm LBRH, shown in figure 3. It also resolves all conflicts that may occur. There are two scenarios for balancing the load when using the rendezvous (HRW) hashing: Load Balancing based Rendezvous Hashing (LBRH) Input: Node Tuples’s ID (𝑵𝒐𝒅𝒆𝒊 (𝐂𝐀𝐏𝐀𝐂𝐈𝐓𝐘(𝐧𝒊 ), 𝐀𝐂𝐓𝐈𝐕𝐄 𝐃𝐄𝐆𝐑𝐄𝐄(𝐧𝒊 ), 𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐂𝐘(𝐧𝒊 )) and node (𝐧𝒊 )′𝐬 neighbors’ in same zone according to HRW Output: A List of the distributed load assigned to which node (𝑳𝑫𝑫𝑺𝒋 , 𝒏) 1. Supervise (∑𝒙𝒊=𝟏(𝐧𝒊 ) 𝒊𝒏 𝐧𝒊 ) ∈ 𝑪𝒋 2. For Each n in ∑𝒙𝒊=𝟏(𝐧𝒊 ) 3. Calculate 𝑳𝑫𝑪𝒖𝒓 ((𝐧𝒊 ) 𝒊 4. Check load (𝐧𝒊 ) 5. IF 𝑳𝑫𝑪𝒖𝒓 > 𝓣𝓗𝑳𝑫 IS OVERLOAD THEN INTENSE-LOAD() 𝒊 6. INTENSE-LOAD ( ) 7. { aa : Register→ Pending_Relaod_List() 8. 𝑳𝑫𝑪𝒖𝒓 =fragment (𝓛𝓓) 𝒊

9. 10. 11. 12. 13. 14. 15. 16.

If 𝑳𝑫𝑴𝒂𝑿 < 𝓣𝓗𝑳𝑫 𝒊 Choose specific (n(i)) Split (n(i)) Partition (n(i)) 𝑴𝒂𝑿 Bid_Bonus (𝑳𝑫𝑴𝒂𝑿 , 𝑳𝑫𝑴𝒂𝑿 𝒊 𝒊+𝟏 , 𝑳𝑫𝒊−𝟏 ) Else Go to aa Combine (n) Create (𝑳𝑫𝑫𝑺𝒋 , 𝒏) 𝑳𝑰𝑺𝑻 }

17. 18. 19. 20. 21. 22. 23. 24. 25.

If 𝑳𝑫𝑪𝒖𝒓 < 𝓣𝓛𝑳𝑫 is overload then LOW_KEY-LOAD() 𝒊 Low_KEY-LOAD( ) { 𝑳𝑫𝑪𝒖𝒓 =𝑳𝑫𝑪𝒖𝒓 – 𝓣𝓛𝑳𝑫+𝒊 𝒊 𝒊 𝒋𝒊 For each 𝑪𝑪𝑳𝑫 in 𝑪𝒋 𝑪𝒖𝒓 Bid_Bonus(𝑳𝑫𝑪𝒖𝒓 , 𝑳𝑫𝑪𝒖𝒓 𝒊 𝒊+𝟏 , 𝑳𝑫𝒊−𝟏 ) Minimize (𝝋) End For Create 𝑳𝑫𝑫𝑺𝒋 , 𝒏) 𝑳𝑰𝑺𝑻 }

Fig. 3: The Load Balancing based Rendezvous Hashing (LBRH)

Scenario 1: To uniformly balance the load when a new key request needed to be allocated to specific node based on balancing algorithm. Scenario 2: When a node is down, its load must be uniformly distributed across the other nodes participated in the system. Taking in consideration that: (a) The keys to be distributed may be have more importance or popularity than others. (b) The nodes in both scenarios are heterogeneous in many ways: its ability to proceed operations, network capabilities, power consuming, and processing time. So randomly distributing the load may lead to bottleneck problems or insufficient performance, so the process of balancing the load when partitioning the database, must deal with all the attributes for node and keys. Heuristic 2 illustrate some terms used in the LBRH algorithm. Heuristic 2: The capacity (n) in heuristic 1 is calculated in terms of < 𝐿𝐷𝑖𝐶𝑢𝑟 (n), 𝐿𝐷𝑖𝑀𝑎𝑋 (n)>. Where 𝐿𝐷𝑖𝐶𝑢𝑟 is the current load that the node holds right now and the 𝐿𝐷𝑖𝑀𝑎𝑋 is the maximum load that the node i can hold calculated based on each node feature. There are two main phases for the process: First phase: Let (𝑛𝑖 ∈ N) be the set of the participating nodes in the cell 𝐶𝑗 in HRW structure [11]. And 𝐶𝐶𝑗𝑖 be the set of cell coordinator for each cell responsible of a number of node i. Each 𝐶𝐶𝑗𝑖 has a load 𝑗𝑖 denoted by 𝐶𝐶𝐿𝐷 , the cell coordinator’s main goal is to manage the load between the nodes in this cell based on each node’s 𝐿𝐷𝑖𝐶𝑢𝑟 and 𝐿𝐷𝑖𝑀𝑎𝑋 , to categorize the nodes to Intense load, Moderate Load, and Low-Key load, as will be shown in the LBRH algorithm. By using two threshold values as a high value threshold 𝒯ℋ𝐿𝐷 and low value threshold 𝒯ℒ𝐿𝐷 . To allocate the key, it must assign a popularity factor (Ԗ). While the load is assigned to a specific node, the load balancing algorithm will be activated to ensure that the load is balanced in the allocating process by minimizing 𝜑, shown in equation (1). Second phase: Each node in the cell has neighbor's node, the main goal of the LBRH algorithm is to predict the neighbor node performance in allocating the load by using a bid-bonus algorithm proposed in figure 4.

𝑗=1,𝑚 𝑗𝑖

𝜑 = ∑ 𝐶𝐶𝐿𝐷 − 𝑖=1,𝑛

𝑗𝑖 ∑𝑚 𝑗=1 𝐶𝐶𝐿𝐷

(1)

∑𝑛𝑖=1 𝐿𝐷𝑖𝑀𝑎𝑋

Bottleneck Detector: The Balance Coordinator module contacts this sub-module when the load is distributed across different node. The involved node’s information in this distribution is sent to bottleneck detector. Then the detector has two main roles: First, it detects any bottlenecks that might occur in the load distribution process. Second, it updates the list by detecting the node statues after applying the load balancing algorithms. It detects if a node is involved in the load balancing process, if yes, then its capacity and active degree are changed, so the list must be refreshed. The Bid bonus algorithm used by LBRH Bid_Bonus( ) 1. For Each 𝒏𝒊 (Cap) in 𝑪𝒊 do 2. CreatBID(𝒏𝒊 ) 3. If 𝒃𝒊𝒅𝒊 (Cap)=trustybids[𝒊] then 4. If Expct{𝒃𝒊𝒅𝒊 (Cap)} > Higest(Cap) then 5. 𝒃𝒊𝒅𝒊 (𝐂𝐚𝐩) → 𝒉𝒊𝒈𝒆𝒔𝒕𝒃𝒊𝒅𝒍𝒊𝒔𝒕[𝒊] 6. 𝑬𝒍𝒔𝒆 𝒊𝒇 7. Recieved {𝒃𝒊𝒅𝒊 (Cap)} > Higest(Cap) then 8. 𝒃𝒊𝒅𝒊 (𝐂𝐂𝐚𝐩) → 𝒉𝒊𝒈𝒆𝒔𝒕𝒃𝒊𝒅𝒍𝒊𝒔𝒕[𝒊]) 9. For each 𝒃𝒊𝒅𝒊 (Cap)in 𝒉𝒊𝒈𝒆𝒔𝒕𝒃𝒊𝒅𝒍𝒊𝒔𝒕[𝒊] 10. if 𝒃𝒊𝒅𝒊 (Cap)=trustybids[𝒊]then 11. Sort (𝒃𝒊𝒅𝒊 (Cap)) → 𝒔𝒐𝒓𝒕𝒆𝒅𝒃𝒊𝒅𝒍𝒊𝒔𝒕[𝒊] 12. sendnew (𝒃𝒊𝒅𝒊 (Cap) ) 13. For each 𝒃𝒊𝒅𝒊 (Cap) in 𝒔𝒐𝒓𝒕𝒆𝒅𝒃𝒊𝒅𝒍𝒊𝒔𝒕[𝒊] 14. Get HigestPairBids (𝒃𝒊𝒅𝒊 (Cap)) → 𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] 15. If 𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] > < 𝓣𝓛𝑳𝑫 then 16. For first 𝒃𝒊𝒅𝒊 (Cap, AD, Dep) in 𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] 17. Get (𝒃𝒊𝒅𝒊 (AD)) 18. If 𝒏𝒊 -ActDgree (𝒃𝒊𝒅𝒊 ( AD)> 𝒏𝒊 -ActDgree𝒃𝒊𝒅𝒊+𝟏 (AD) 19. 𝐭𝐡𝐞𝐧 𝒃𝒊𝒅𝒊 (AD)) → 𝐅𝐢𝐧𝐚𝐥𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] 20. If 𝒏𝒊 -ActDgree(𝒃𝒊𝒅𝒊 (AD)< 𝒏𝒊-ActDgree(𝒃𝒊𝒅𝒊+𝟏 (AD) 21. 𝐭𝐡𝐞𝐧 𝒃𝒊𝒅𝒊+𝟏 (AD) → 𝐅𝐢𝐧𝐚𝐥𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] 22. Else 23. Get(𝒏𝒊 -Dependency (𝒃𝒊𝒅𝒊 (Dep)) 24. If 𝒏𝒊 -Dependency (𝒃𝒊𝒅𝒊 (Dep)< 𝒏𝒊 -Dependency (𝒃𝒊𝒅𝒊+𝟏 (Dep) then 25. 𝒃𝒊𝒅𝒊 (Dep)) → 𝐅𝐢𝐧𝐚𝐥𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] 26. If 𝒏𝒊 -Dependency (𝒃𝒊𝒅𝒊 (Dep)> 𝒏𝒊 -Dependency (𝒃𝒊𝒅𝒊+𝟏 (Dep) then 27. 𝒃𝒊𝒅𝒊+𝟏 (Dep)) → 𝐅𝐢𝐧𝐚𝐥𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] 28. Else 29. RandomPair(𝒃𝒊𝒅𝒊 (Cap, AD, Dep)) → 𝐅𝐢𝐧𝐚𝐥𝑯𝒊𝒈𝒉𝒆𝒔𝒕𝑷𝒂𝒊𝒓[𝒊] Fig. 4: The Bid bonus algorithm used by LBRH

Hashing coordinator sub Module: It is the core module in APRHM. Its main goal is to implement the adaptive HRW hashing algorithm for the partitioning process. The hashing module consists of four main submodules: Adaptive rendezvous (HRW) Hashing applier: In this sub-module, the virtual hierarchies’ skeleton on HRW [9] is used. It partitions keys of the Cassandra database using rendezvous hashing on the nodes distributed in a sufficient way to guarantee balanced partitioning. The nodes on a Cassandra cluster are divided into a number of Rendezvous Geographic Zones (ℝ𝓖ℤ ) [16], and the data are distributed according to the virtual hierarchical design in each zone. Spooky Hash Function: The default hash function used by the original partitioning module in Cassandra is the Murmur Hash [17]. APRHM implements the non-cryptographic Spooky Hash [18] function for the required nodes and keys in Cassandra. One of the main reasons of using spooky hash instead of Murmur hash is that later proved to be half the speed of spooky hash on x86-64. Node/key assigner: Implements the process of assigning the hashed key to the node yielding the highest weight. Evaluator: The evaluator is used in every node of the rendezvous hash range to ensure the accurate allocation of keys to nodes.

5 Performance Evaluation A number of experiments are conducted on Cassandra to evaluate the improvement of the proposed partitioning module APRHM. The experiments have been divided into two main parts for the evaluations, as follows: (1) Evaluates the performance of the LBRH algorithm implemented in the load balancer sub module. (2) The response time of Cassandra with APRHM is tested under different environments. The Apache Cassandra version 3.4 is modified by embedding the APRHM partitioning module in it. The standard Yahoo! Cloud Serving Benchmark [15] is installed YCSB 0.1.4. In the experiments, two clusters of 8 nodes/cluster are used. The experiment configurations are shown in table 1. The nodes are created by using the VMware vSphere system [19]. Run at Ubunt10 system.

First experiment: The performance of the load balancing sub module embedded in our partitioning (APRHM) module is evaluated, and compares it against two partitioners used in original Cassandra [11,12]. Workload C of YCSB is used in load mode to upload the data to a cluster of 8 nodes. Figure 5 shows the performance of Cassandra with ByteOrder partitioner, the load is condensed in a one or two nodes which result in uneven distribution of data. Figure 6 demonstrates that using Murmur in Cassandra the load is almost distributed on the 8 nodes but not uniformly. However, Fig. 7 demonstrates a uniformly balanced performance of Cassandra between the 8 nodes when using the proposed APRHM. This comparison validated the Load Balancing based Rendezvous Hashing algorithm LBRH. Table 1: Cassandra cluster nodes specifications Number of Cassandra clusters: Number of nodes per cluster: Node specification:

2 8

Hadoop 2.6.4 configurations Cassandra 3.4 configurations dfs.replication.max: 512 Replication Factor: 1, Heap dfs.blocksize: 128MB size: 1GB

Dual-core Intel Core i5-4200U at 2.6GHZ, 8G of RAM, 200GB disks and 1GB Ethernet, run. 200 GB disk space. 8 million records database

Fig. 5: The performance of Cassandra with Byteorder partitioner

Fig. 6: The performance of Cassandra with Murmur partitioner

Fig.7: The performance of Cassandra with APRHM partitioner

Second Experiment: The execution time of assigning the records to the nodes in APRHM is compared with standard Cassandra ByteOrdered, Random, and Murmur3 [11,12,13] partitioning when using READ workload c in YCSB. Figure 8 shows that APRHM is the fastest partitioner when varying number of loaded records.

Fig 8: The execution time of different partitioners with workload C

6 Conclusions and Future Work An Adaptive Rendezvous Hashing Partitioning ARHPM module is proposed in this paper to enhance performance of Cassandra databases. ARHPM partitions the data in Cassandra database using rendezvous hashing that uses the spooky hash function in a random way to make each node have the same chance to receive the keys, which result in uniformly distribute the load. In addition, a Load Balancing based Rendezvous (HRW) Hashing LBRH algorithm for guarantee load balancing with heterogeneous nodes. ARHPM module proved the ability to balance the load upon nodes with different storage, capabilities and load. ARHPM adapted to the load variations of YCSB with maintaining Cassandra performance. The comparative experiments showed that the load in APHRM is balanced more uniformly than the two default partitioners of standard Cassandra. And the ARHPM timing for distributing the load is faster than different partitioners of standard Cassandra. As a future work, an intention to use bloom filter method on every node of the rendezvous hash range to enhance the timing of hashing.

References 1. Demchenko, Y., P. Membrey, P.Grosso, C. de Laat, Addressing Big Data Issues in Scientific Data Infrastructure. First International Symposium on Big Data and Data Analytics in Collaboration (BDDAC 2013). Part of The 2013 Int. Conf. on Collaboration Technologies and Systems (CTS 2013), May 20-24, 2013, San Diego, California, USA. 2. V. Benzaken, G. Castagna, K. Nguyen, and J. Siméon,“Static and dynamic semantics of NoSQL languages,” SIGPLAN Not., vol. 48, no. 1, pp. 101–114, Jan. 2013.

3. HBase Development Team. HBase: BigTable-like structured storage for Hadoop HDFS [EB/OL]. [2013-03-20]. http://wiki.apache.org/hadoop/Hbase/ 4. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, et al. BigTable: A Distributed Storage System for Structured Data [C]. Proc of the 7th OSDI. Seattle: ACM, 2006: 205–218 5. A. Lakshman, P. Malik. Cassandra: a decentralized structured storage system[C]. Operating Systems Review 44(2), 2010: 35-40 6. Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web.Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, ’97, ACM, New York, NY, USA, 1997; 654–663. 7. Lakshminarayanan Srinivasan , Vasudeva Varma Adaptive Load-Balancing for Consistent Hashing in Heterogeneous Clusters 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 8. J. Byers, J. Considine, and M. Mitzenmacher, “Simple load balancing for distributed hash tables,” inPeer-to-Peer Systems II, ser. Lecture Notes in Computer Science, M. Kaashoek and I. Stoica, Eds. Springer Berlin Heidelberg, 2003, vol. 2735, pp. 80–87. 9. Yao, Zizhen; Ravishankar, Chinya; Tripathi, Satish (May 13, 2001). Hash-Based Virtual Hierarchies for Caching in Hybrid Content-Delivery Networks (PDF). Riverside, CA: CSE Department, University of California, Riverside. Retrieved 15 November 2015. 10.Ata Turk, R. Oguz Selvitopi, Hakan Ferhatosmanoglu, and Cevdet Aykanat,” Temporal Workload-Aware Replicated Partitioning for Social Networks”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 11, NOVEMBER 2014. 11.Veronika Abramova et al, Testing Cloud Benchmark Scalability with Cassandra, 2014 IEEE 10th World Congress on Services. 12.Xiangdong Huang, Jianmin Wang, Yu Zhong ,Shaoxu Song and Philip S. Yu, “Optimizing data partition for scaling out NoSQL cluster” published online 20 September 2015 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3643 13.Ramakrishnan et al. Processing Cassandra Datasets with Hadoop-Streaming Based Approaches, IEEE 2015 Transactions on Services Computing DOI: 10.1002/cpe.3643 14.Zhikun Chen ,Hybrid Range Consistent Hash Partitioning Strategy--A New Data Partition strategy for NoSQL Database, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications 15.Brian F. Cooper, Adam Silberstein, Erwin Tam, et al. Benchmarking Cloud Serving Systems with YCSB[C]. Proc of SoCC. Indianapolis: ACM, 2010 16.Karim Seada, Ahmed Helmy, Rendezvous Regions: A Scalable Architecture for Service Location and DataCentric Storage in Large-Scale Wireless Networks. Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04) 17.Yuki Kurihara. "Digest::MurmurHash". GitHub.com. Retrieved 18 March 2015. 18.Jenkins, Bob. "SpookyHash: a 128-bit noncryptographic hash". Retrieved Jan 29, 2012. 19."Server Virtualization with VMware vSphere | VMware India". www.vmware.com. Retrieved 2016-03-08 20.https://datastax.github.io/python-driver/api/cassandra/policies.html retrieved 2016- 1- 4.

Suggest Documents