the question of database transaction processing ... - e-Learning - UNIMIB

0 downloads 80 Views 314KB Size Report
Basically Available – the system guarantees some level of availability to the data even in regards to node failures. T
THE QUESTION OF DATABASE TRANSACTION PROCESSING: AN ACID, BASE, NOSQL PRIMER by Charles Roe

©2013 DATAVERSITY Education, LLC. All rights reserved.

Imagine a world where CAP Theorem is wrong. Database engineers and application designers now have absolute control over every transaction, at all times, across the entire planet, no matter how many distributed nodes an enterprise has. Relational systems are now happily married to the aggregate of any and all non-relational designs. It no longer matters if an enterprise wants to employ a range of SQL-centric databases within its burgeoning collection of Key-Value, Document, Column-Oriented, and Graph stores – they all fit together like fine wine and aged French cheese. Their BI applications work together to collect data in a timely fashion from every platform, they have achieved 99.999% uptime, nodes are quickly and easily added with barely the click of a button, every transaction is consistent within itself all the time with no soft states, eventualities, or latencies, the world is a better place. If you want strong consistency, with high availability, over a distributed network of 100,000 cheap, commodity servers, with total ACID compliance, then you are in luck! The anecdote above is meant to present a utopian vision that were it entirely true – in all cases – would make the lives of everyone in the database and application design fields much easier. They could design cheap, perfect systems that just worked and worked, for everyone, all the time. On the downside, it would also make many of their jobs redundant. Luckily for the numerous organizations involved in such industries, the utopian fantasy is far from a reality. But how far is it, really? Are there not some platforms available that provide complete transactional integrity within non-relational systems? Are there not intercessions between competing models that meet somewhere in between? There are actually many elements of such a vision that are working together. ACID and NoSQL are not the antagonists they were once thought to be; NoSQL works well under a BASE model, but also some of the innovative NoSQL systems fully conform to ACID requirements. Database engineers have puzzled out how to get non-relational systems to work within an environment that demands high availability, scalability, with differing levels of recovery and partition tolerance. BASE is still a leading innovation that is wedded to the NoSQL model, and the evolution of both together is harmonious. But that doesn’t mean they always have to be in partnership; there are several options. So while the opening anecdote is true in many cases, organizations that need more diverse possibilities can move into the commercial arena and get the specific option that works best for them.

©2013 DATAVERSITY Education, LLC. All rights reserved.

1

THE TRANSACTION RELIABILITY MODELS The primary issue discussed within this paper boils down to two disparate database reliability models: ACID (Atomicity, Consistency, Isolation, and Durability) and BASE (Basically Available, Soft state, Eventual consistency). The first (ACID) has been around for some 30+ years, is a proven industry standard for SQL-centric and other relational databases, and works remarkably well in the older, yet still extant, world of vertical scaling. The second (BASE) has only recently gained popularity over the past 10 years or so, especially with the rise of social networking, Big Data, NoSQL, and other leviathans in the new world of Data Management. BASE requirements rose out of a need for ever-expanding horizontally scaled distributed networks, with non-relational data stores, and the real-time availability constraints of web-based transaction processing. While there are now more crossovers and negotiations between the two models, they essentially represent two competing groups, with Brewer’s CAP Theorem acting as the referee in the middle forcing tough decisions on each team. Relational database systems are almost always ACID complaint because relational indexing is centralized and so there is no advantage in BASE. An ACID database functions like a unit that is fully consistent with transactional updates; while a BASE system functions like independent units that are eventually consistent, and without transactional updates. There were no other choices than relational systems in the past; but the advent of NoSQL now gives organizations that need a different model more choices. They can do ACID or BASE, so the variety is much greater. It is now possible to provide ACID constraints and other enterprise features within specific NoSQL systems if that is what the organization requires to meet its needs.

ACID – A BRIEF OVERVIEW The primary work with database reliability constraints began in the 1970s with Jim Grey. He formulated the first three elements of the acronym – Atomicity, Consistency, and Durability – in his seminal work “The Transaction Concept: Virtues and Limitations” that was published in 19811. The paper looked at transactions in terms of contract law, whereby each transaction had to conform to specific “transformations of a system state.” All transactions had to obey the laws defined within the contract parameters. According to Grey, each transaction within a database had to obey protocols, either happened or didn’t happen,

©2013 DATAVERSITY Education, LLC. All rights reserved.

2

and could not be changed once committed. In 1979, Bruce Lindsay et al. expanded on Grey’s preliminary findings with their paper “Notes on Distributed Databases.”2 The paper focused on the essentials for achieving consistency within distributed database management systems, data replication, authorization and access controls, recovery management, two-stage commits, and others. The final foundational element of ACID came in 1983 with the publication of Andreas Reuter and Theo Härder’s paper “Principles of Transaction-Oriented Database Recovery.”3 They added the principle of Isolation to the discussion and officially coined the acronym ACID; it has persisted for the past 30 years as the indispensable constraint for achieving reliability within database transactions, and in simple terms means: • Atomicity: All operations are performed or none of them are. If one part of the transaction fails, then all fail. •

Consistency: The transaction must meet all rules defined by the system at all times; there are never any half-completed transactions.

• Isolation: Each transaction is independent unto itself. • Durability: Once complete, the transaction cannot be undone.4

ACID constraints have provided transaction processing with a reliable foundation from which to build for decades, and would have continued were it not for the advent of the Internet, the growth of distributed data stores, the unprecedented increase in data volumes and variability, the need

REAL-WORLD DATABASE EXAMPLE 1: MarkLogic’s Enterprise NoSQL database with integrated search and application server provides a robust NoSQL platform with ACID compliance as one of its central tenets, no need for predefined schemas, different availability configurations, near linear scaling of hardware, Big Data search, MapReduce capabilities, Hadoop integration, multiple language application authoring, and without any loss of security often associated with many NoSQL systems. is a firm believer that technology can make or break a successful Data Governance program. That does not mean that technology is the solution to Data Governance. You cannot purchase and install a piece of hardware or software and therefore have a Data Governance program. Data Governance is all about formalizing people’s behavior associated with data. Technology can assist in managing workflow, Metadata, communications, modeling, analytics, you name it. But the truth is that the implementation of the technology itself requires governance. Formalized behavior is a necessary result.

©2013 DATAVERSITY Education, LLC. All rights reserved.

3

to document and store unstructured data, and the subsequent need for more flexibility in terms of scaling, design, processing, cost, and disaster recovery. This is not a claim that ACID requirements are no longer essential to transaction processing, because they are. Web-scale applications, non-relational data stores, and global distribution of data centers required the creation of new alternatives. Nevertheless, with new evolutions in the database field, it is now possible to have a fully functional NoSQL database that conforms to strict ACID compliance – such systems are still rare within the industry at this time, but they do now exist for organizations that need such an evolution.

NOSQL AND THE SHIFTING SANDS OF DATABASE ARCHITECTURES A paradigm shift began occurring during the late 1990s and the early part of the new millennium. The evolution of the Internet, and its resulting data explosion, forced many of the burgeoning Internet behemoths to reconsider new ways to deal with their data. The same could be said for smaller organizations, and startups that saw the growth of web-based processing as a goldmine with no end. Thus began a necessary shift from relational data stores to non-relational data stores, collectively now known as NoSQL or “Not Only SQL.” The development of non-relational architectures has been part of the database discussion for decades, but didn’t really take root until the late 1990s with the work of Carlo Strozzi and others. Strozzi coined the term “NoSQL”5 in 1998 with the creation of his new, still relational, database model that no longer used SQL as its programming language (the term was reintroduced at a meetup in 2009 by Eric Evans and Johan Oskarsson and has since stuck). Other major players that helped caused what is today a seismic shift in database technologies include:6 • 1997 - Microsoft’s publication of the MDX Standard • 2000- Release of Neo4j • 2000 – Dr. Eric Brewer’s address on CAP Theorem • 2003- Development of Memcached • 2003 – Publication of the paper “The Google File System”

©2013 DATAVERSITY Education, LLC. All rights reserved.

4

• 2004 – Publication of the paper “MapReduce: Simplified Data Processing on Large Clusters” • 2005 – CouchDB is released • 2006 – Publication of the paper “Bigtable: A Distributed Storage System for Structured Data” • 2006 – XQuery language specification is standardized by WC3 • 2007 – Publication of the paper “Dynamo: Amazon’s Highly Available Key-value Store”

The avalanche had begun; now many more distributed, non-relational data stores such as MongoDB, Cassandra, Project Voldemort, Terrastore, Redis, Riak, HBase, and plethora of others are available on the market. The events listed above are only an umbrella summary of so much other work also going on in the industry at the time. They represent a microcosm of hundreds, if not thousands, of smaller organizations and thought leaders who worked to move the industry forward to where it stands today. The impetus for such a change occurred due to many factors, the least not of which include: • Unstructured data doesn’t fit neatly into relational tabular structures, and the proliferation of so many different data formats required new storage capabilities. • Vertical scaling was proving to be too expensive and could not deal with the voluminous expansion of data occurring. • Traditional RBDMS could no longer handle the needs of high throughput that organizations like Google or Amazon required. • The complexity of object-relational mapping was not needed for many of the tasks web-based applications performed. • Pre-determined forced schemas were too constraining in the new world where schemas needed to be quickly adaptable. • The administration overhead of the old paradigm, where servers were run in-house, had become too costly; off-site “clouds” could provide cheaper alternatives and faster “out scaling” of distributed systems. • ACID requirements were too constraining for many of the Internet’s demands; some constraints had to be relaxed.

©2013 DATAVERSITY Education, LLC. All rights reserved.

5

Application developers needed new innovations outside of the traditional SQL-based, table-oriented database architectures that had governed the industry since its inception. The Internet giants such as Google, Amazon, Facebook, LinkedIn, and others built their own designs or altered those freely available on the market, and helped to revolutionize the industry in the process. The proliferation of smaller startups needed cheaper alternatives, and the open source community stepped in to fill a void with the development of many free or low-cost database designs. The avalanche has since spread to all corners of the industry, and multiple alternatives exist.

REAL-WORLD DATABASE EXAMPLE 2: Couchbase Server is a document store built for high availability, in a distributed NoSQL environment. It provides easy scalability within the Cloud or on standard commodity servers. It provides full consistency, with zero application downtime, an extensive range of query and index support, incremental MapReduce, auto-sharding, cross-cluster replication, and native support for JSON documents.

That above list is in no way exhaustive, but only outlines some of the primary incentives that helped to precipitate the rise of non-relational data stores and the now-popular acronym in the database transaction reliability world: BASE.

CAP AND BASE – THE SHOTS HEARD AROUND THE WORLD BASE is a clever acronym, especially when paralleled with ACID – data professionals are the chemists of the IT universe. While it is not known for sure who originated the term, most people give credit to Dr. Eric Brewer for at minimum popularizing the term. In 2000, his keynote address at the ACM Symposium titled “Towards Robust Distributed Systems”7 proved to be the shining moment when many in the industry nodded their heads and knew, at least in their guts, that momentous changes were on the horizon. BASE is essentially the diametric opposite to ACID, with the limitations outlined by Brewer falling across the spectrum. The BASE acronym entails: • Basically Available – the system guarantees some level of availability to the data even in regards to node failures. The data may be stale, but will still give and accept responses.

©2013 DATAVERSITY Education, LLC. All rights reserved.

6

• Soft State – the data is in a constant state of flux; so, while a response may be given, the freshness or consistency of the data is not guaranteed to be the most current. • Eventual Consistency – the data will eventually be consistent through all nodes and in all databases, but not every transaction at every moment. It will reach some guaranteed state eventually.

In Brewer’s address he presented a simple table that outlined the essential traits of each of the two models:8

ACID

BASE

• Strong Consistency

• Weak Consistency – stale data OK

• Isolation

• Availability first

• Focus on “commit”

• Best effort

• Nested transactions

• Approximate answers OK

• Availability?

• Aggressive (optimistic)

• Conservative (pessimistic)

• Simpler!

• Difficult evolution (e.g. schema)

• Faster • Easier evolution

He stressed that the entire balance between the two is a spectrum; database engineers had to choose the specifics of what they needed and wanted versus what they could attain when developing their particular application. The real focus of the opposition between the two competing models is demonstrated with Brewer’s CAP Theorem, which outlines the three major characteristics of database transaction processing, and contends that only two of the characteristics can be met at any given time. The three central elements of CAP (Consistency, Availability, and Partition Tolerance) have since been expanded to include much more detail, along with extensive experimentation by engineers around the world to verify and quantify the real-world results of such a theorem.

©2013 DATAVERSITY Education, LLC. All rights reserved.

7

CONSISTENCY - HOW IS THE DATA PERCEIVED? ACID constraints provide strong consistency, all the time, no matter what. Such requirements often have repercussions, though; especially regarding availability. If the system must always remain in a consistent state, so all parties see the same view of the data at the beginning and end of a transaction, then across thousands of nodes that data may not always be available. The same repercussion affects disaster recovery and the loss of nodes – if one part of a distributed database collapses, then strong consistency would not allow any further updates until the entire system is realigned. Thus, we come to Eventual Consistency and a range of other consistency guarantees:

REAL-WORLD DATABASE EXAMPLE 3: MongoDB is an open source, document datastore that achieves high availability through built-in replication, it includes dynamic schemas that can evolve as applications evolve, full index and query support, aggregation framework and MapReduce, strong consistency, advanced security protocols, horizontal scalability for operations of any size, and auto sharding.

A. Strong (Strict) Consistency – All read operations return the value from the last finalized write operation. It doesn’t matter which replica the operation completed the write to; all replicas must be in the same state for the next operation to occur on those values. B. Eventual Consistency – This has the greatest variability of potential values returned. At any given point readers will see some written value, but there is no guarantee that any two readers will see the exact same write. All replicas will eventually have the latest update; it’s just a matter of time when that will happen. C. Monotonic Read Consistency – This is also known as a session guarantee. Reads are similar to eventual consistency in that the data could still be stale; but monotonic read consistency guarantees that over time the client will get a more up-to-date read (or the same read) if they request a read from the same object. D. Read Your Own Writes (or Read My Writes) – guarantees that the client always reads their most recent writes, but other may not see the same updates. It doesn’t matter what replica the writes are going to, the client always sees their most updated one.

©2013 DATAVERSITY Education, LLC. All rights reserved.

8

E. Causal Consistency – if a client reads one value (a) and then writes the next value (b), and another client then reads the value of (b) they will also see the value of (a) since they are connected to each other. Therefore, any writes that are causally related must be seen by all processes in the specific order they were written.9

There are many other consistency guarantees with different names, including (but not limited to) casual+, sequential, consistent prefix, entry consistency, release consistency, FIFO consistency, and bounded staleness. But the main issue is the fact that application programmers must weigh their options when deciding on the consistency requirements of any given transaction. The other two characteristics of CAP Theorem impose restrictions on what level of consistency can be guaranteed.

AVAILABILITY – HOW ACCESSIBLE IS THE DATA?

REAL-WORLD DATABASE EXAMPLE 4: Datastax Enterprise offers a complete NoSQL Big Data platform through a production-certified Apache Cassandra implementation, vigorous security protocols not often seen in NoSQL platforms, simple data migration procedures, high availability and scalability, mixed workload isolation and separation, no need for standard ETL systems, and multiple (tunable) consistency levels that allow the developer to decide on the consistency on a per-query basis.

Everyone wants access to their data at all times, no matter if there are database failures, power outages, node crashes, or other hardware and software complications; read and write operations must continue. Executives want constant access to their critical files; clients need continuous email communication; web-based billing modules need to work 24/7 or sales suffer; corporate reporting databases are needed around-the-clock for global operations to carry on. Availability is not a luxury; it’s a necessity for all modern enterprises. The IEEE 610 definition of availability is “the degree to which a system or component is operational and accessible when required for use by an authorized user.”10 Availability is not guaranteed through a specific piece of software or hardware, but is rather a set of intentions and strategies used to reach the intended percentage value of downtime for software and hardware systems (though many systems give availability ratings to their products, to entice potential customers to purchase them).

©2013 DATAVERSITY Education, LLC. All rights reserved.

9

Availability is usually given in a numeric percentage that relates the availability level to the amount of expected downtime per year for the software or hardware system – the goal of course is 100%, but that is unrealistic, so the real target is referred to as the five nines, or 99.999% availability.

AVAILABILITY LEVEL11 ESTIMATED DOWNTIME PER YEAR 99.999% 5 minutes 99.99% 52 minutes 99.9% 8.5 hours 99% 3.5 days or about 87 hours Planned downtimes can often be alleviated through process management planning. If an upgrade is needed, then certain systems can be taken off line while others remain live, and then they can be switched. Or the most important systems can be taken down only during low activity times. Planned downtimes are not the biggest issue here. IT departments are well versed in dealing with planned events; its unplanned downtimes that the availability model seeks to ameliorate. The redundancy built into distributed systems helps to increase the inherent reliability of the entire system. Yet as availability REAL-WORLD DATABASE increases, some other aspect of Brewer’s Theorem EXAMPLE 5: may decrease – the question is whether to lose consistency or partition tolerance or use a different Neo4j is a reliable, fully ACID alternative. compliant graph database. It

PARTITION TOLERANCE – FUNCTION WITHOUT COMMUNICATION? The third and final characteristic of CAP relates to availability in many ways, but looks at the problem of failures and disaster recovery in a different light. Where availability seeks to lessen the impact of downtime on a given system, partition tolerance

offers massive scalability, and high availability when distributed across many systems. It contains a transversal framework for easy and fast graph queries, and has a convenient REST interface or object-oriented Java API. Its easy-to-implement graph structure allows the simple storage of data in a highly accessible environment.

©2013 DATAVERSITY Education, LLC. All rights reserved.

10

deals with the distribution of data over many different partitions, in many different database replicas. One of the biggest issues in distributed computing is the fact that data is often stored in multiple partitions, throughout many nodes, across data centers that are not geographically related. A given write can happen in one replica in Virginia, while next being written to another replica in India, and these operations must continue. The system must be able to tolerate the possibility that these two replicas could at any given time – temporarily or permanently – not communicate with each other, and that nodes can be added or taken off line without a disruption.

REAL-WORLD DATABASE EXAMPLE 6: The InfiniteGraph graph database offers flexible consistency models from ACID to eventual for various transactions. It provides scalability across large distributed networks, concurrency, straightforward replication and backup, multiple supported APIs, hybrid schema models, predictive language qualification queries, and fully atomic transactions.

Two substantial Cloud-based systems that require partition tolerance due to extensive, geographically distributed replica locations are Amazon’s SimpleDB and Google’s BigTable; they both helped to transform the entire industry, and both had to make concessions during development. SimpleDB provides robust fault-tolerance, with high availability, scalability, and eventual consistency (it can provide consistent or eventually consistent reads), but not strong consistency. Google’s BigTable is a column-oriented database that allows for an unlimited number of columns; it was designed for massive scalability and high availability. It has single-row transaction consistency, so can assure eventual consistency with updates, and strong consistency in certain transactions.12 Each system needed to conform to the requirements of considerable geographical distribution, so had to relax some constraints in lieu of others – such are the realities of design.

THE COMPROMISE – CAN YOU HAVE ALL THREE? The downside to this entire discussion is that in the early years of development, no grand utopian vision existed; there were tradeoffs no matter what model was used (and in many systems there still are):

©2013 DATAVERSITY Education, LLC. All rights reserved.

11

• Consistency and Partition Tolerance forfeited Availability • Availability and Consistency forfeited Partition Tolerance • Availability and Partition Tolerance forfeited Consistency

Brewer listed a number of examples of each possibility, their traits, and some of the tradeoffs that system designers must contend with in his paper. Some are listed below13:

SELECTION CHARACTERISTICS EXAMPLES C + A (No P)

2-phase commits Cache validation protocols

Single-site databases Cluster databases LDAP xFS file system

C + P (no A)

Pessimistic locking Make minority partitions unavailable

Distributed databases Distributed locking Majority protocols

A + P (no C)

Expirations/leases Conflict resolution Optimistic

Coda Web caching DNS

In terms of modern, horizontally-scaled systems prevalent in a web-based environment, the decision often comes down to Consistency and Availability. Partition Tolerance is a must in a distributed node environment, since a primary element of the entire system is grounded on partitioned data. Developers must decide which operations can relax consistency and which operations can relax availability. While every system is different and dependent on innumerable factors, most large-scale, web-centric systems have decided to relax consistency in some form, and push for high availability. The original development of Cassandra by Facebook14 was due to a need for decentralized control of data so there would be no single failure point, multi-data-center replication potential across a massive distributed system, constant scalability requirements with the need for no

©2013 DATAVERSITY Education, LLC. All rights reserved.

12

downtime when new machines were added, high availability of data 24/7/365, MapReduce capability, and differing levels of consistency due to their highly variable data types. So they ultimately balanced the entire system with tunable consistency (an essential attribute in Cassandra), with massive scalability and high availability. They released Cassandra as an open source platform in 2008; it is now a principal NoSQL platform throughout the industry. Some of the latest innovations in the industry are allowing organizations to alter some of the original assumptions along the ACID/BASE/NoSQL continuum though. As mentioned within a few of the sidebars, there are now platforms that allow varying levels of consistency depending on the transaction – if one transaction needs ACID constraints, that can be programmed into the system, while those that can relax consistency are allowed to do so. It is possible to scale a system to literally “hundreds of terabytes of source data while maintaining sub-second query response time,”15 and still have the support of full ACID properties as well. There doesn’t have to be a loss performance either, and high availability along with a schema-agnostic structure maintains inherent flexibility within the entire system. Such innovations are certainly not commonplace, but they are now available for organizations that require such innovation, with more under development all the time.

CONCLUSION - BUSINESS CONTINUITY IS ESSENTIAL The crucial question of ACID versus BASE, the implementation of relational versus non-relational data stores, consistency versus availability, all hinges on one primary element: success in the marketplace. It doesn’t matter if an organization’s developers are fawning over the possible integration of a new NoSQL-based system if it is an unnecessary or untenable addition to a business that still only requires a relational platform. Certainly, development planning for the future must include an IT strategy that focuses on growth, flexibility, security, and cost. The exponential growth of data, and its subsequent collection and analysis, are mitigating factors that now require many organizations to adopt a Big Data solution, with some kind of NoSQL platform as its foundation. Yet, the exact design of that solution, and its countless considerations are entirely dependent on the needs of the enterprise: server virtualization, the impact of Cloud-based services, data center utilization and consolidation, more efficient application performance, streamlined data governance processes, hardware costs, and a host of other issues ultimately govern the decision to implement a NoSQL solution and what that solution will mean to the organization. Much of the financial industry still requires strict ACID compliance for absolute transaction

©2013 DATAVERSITY Education, LLC. All rights reserved.

13

integrity in all banking operations, as do many military and other governmental organizations; the same could be said of the health care industry, especially regarding patient records. An online travel service may be able to relax consistency during the search process through various soft state protocols and updating mechanisms so that higher availability can be ensured. A social networking site that starts small, but plans on exponential growth, would want to implement a system with availability and scalability in mind from the beginning, and may decide to use some sort of eventual or tunable consistency with various user “click traffic” that doesn’t require strict consistency. The multitude of options available in the marketplace makes deciding on a specific NoSQL solution challenging. Do you need a graph database or document store? Key-value store or columnar database? What sort of executive support is there for the project? Is it an enterprise-wide implementation, or only for a small program to begin with? Will outside consultants be necessary? What about application programmers? Should the solution be developed in-house, or should an off-the-shelf solution be purchased? These questions and innumerable others will decide the fate of a particular solution. But in the end, the need for such a solution distills down to one essential factor: the need to remain innovative and flexible in an ever-changing business environment. ACID and BASE are just clever acronyms for complex problems. Luckily for modern organizations. there are myriad solutions for those problems. Which one works best for you?

©2013 DATAVERSITY Education, LLC. All rights reserved.

14

NOTES

1 Grey, J. (June 1981). The Transaction Concept: Virtues and Limitations. Tandem Computers Incorporated. Retrieved from http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf. 2 Lindsay, B.G., Selinger P.G., Galtieri, C., Gray, J. N., Lorie, R. A., Price, T. G., Putzolu, F., Traiger, I. L., Wade, B. W. (July 1979) Notes on Distributed Databases. Cambridge University Press. Retrieved from http://domino.research.ibm.com/library/cyberdig.nsf/papers/A776EC17FC2FCE73852579F100578964 /$File/RJ2571.pdf 3 Reuter, A., Härder, T. (December 1983) Principles of Transaction-Oriented Database Recovery. Computing Surveys. Retrieved from http://cc.usst.edu.cn/Download/5b953407-339b-46c3-9909-66dfa9c3d52a.pdf. 4 Pritchett, D. (2008). BASE: An Acid Alternative. Retrieved from http://queue.acm.org/detail.cfm?id=1394128. 5 Strozzi, C. (1998). NoSQL: A Relational Database Management System. Retrieved from http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20Page. 6 McCreary, D. and McKnight, W. (June 2012). The CIO’s Guide to NoSQL. Retrieved from http:// www.dataversity.net/the-cios-guide-to-nosql-3. And, Haugen, K. (2010). A Brief History of NoSQL. Retrieved from http://blog.knuthaugen.no/2010/03/a-brief-history- of-nosql.html. 7 Dr. Brewer, E. A. (July 2000). Towards Robust Distributed Systems. ACM Symposium on the Principles of Distributed Computing. Retrieved from http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf. 8 Ibid. 9 Terry, D. (October 2011) Replicated Data Consistency Explained Through Baseball. Microsoft Research. Retrieved from http://research.microsoft.com/apps/pubs/default.aspx?id=157411. And,

©2013 DATAVERSITY Education, LLC. All rights reserved.

15

Strauch, C. (2012). NoSQL Databases. Stuttgart Media University. Retrieved from http://www.christof-strauch.de/nosqldbs.pdf. And, Burckhardt, S., Leijen, D., Fähndrich, M., Sagiv, M. (2012). Eventually Consistent Transactions. Springer-Verlag. Retrieved from http://research.microsoft.com/pubs/158085/ecr-esop2012.pdf. And, Wada, H., Fekete, A., Zhao, L., Lee, K., Liu, A., (January 2011). Data Consistency Properties and the Trade-offs in Commercial Cloud Storages: the Consumers’ Perspective. Retrieved from http://www.nicta.com.au/pub?doc=4341 10 IEEE.org (Institute of Electrical and Electronics Engineers). Retrieved from http://standards.ieee.org/findstds/standard/610-1990.html. 11 FAQ on High Availability (July 2012). Fujitsu Technology Solutions. Retrieved from http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp_faq-h-availability.pdf. And, Microsoft High Availability Overview (January 2008). Retrieved from http://download.microsoft.com/download/3/B/5/3B51A025-7522-4686-AA16-8AE2E536034D/ Microsoft%20High%20Availability%20Strategy%20White%20Paper.doc. 12 Ramanathan, S., Goel, S., Alagumalai, S. (November 2011). Comparison of Cloud Database: Amazon’s SimpleDB and Google’s Bigtable. International Journal of Computer Science. Retrieved from http://ijcsi.org/papers/IJCSI-8-6-2-243-246.pdf. 13 Dr. Brewer, E. A. (July 2000). Towards Robust Distributed Systems. ACM Symposium on the Principles of Distributed Computing. Retrieved from http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf. 14 Lakshman, A., Prashant, M. (2009). Cassandra - A Decentralized Structured Storage System. Retrieved from http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf. 15 Hunter, J. (2013). Inside MarkLogic Server: Its data model, indexing system, update model, and operational behaviors. Retrieved from http://www.marklogic.com/resources/inside-marklogic-server/.

©2013 DATAVERSITY Education, LLC. All rights reserved.

16

SPONSOR PAGE

www.marklogic.com

|

[email protected]

|

+1 877 992 8885