International Journal of Modern Computer Science (IJMCS) Volume 3, Issue 1, February, 2014
ISSN: 2320-7868 (Online)
NewSQL Databases: Scalable RDBMS for OLTP Needs to Handle Big Data Rakesh Kumar
Shilpi Charu
Software Engineer DigiCollect GIS, Bangalore, India
[email protected]
Department of Information Technology JECRC, Jaipur, India
[email protected]
Abstract: One of the key advances in resolving the 3V (volume, velocity and variety) of Big Data problem has been the emergence of an alternative databases (SQL based RDBMS, NoSQL and NewSQL). NewSQL is a different type of relational database management systems that is provide the same scalable performance of NoSQL (Not Only SQL) systems for OLTP (Online Transaction Processing) workloads as well as still maintaining the ACID (Atomicity, Consistency, Isolation, Durability) guarantees of a traditional single node database system. This paper includes the introduction to Distributed Databases, OLTP (Online Transaction Processing), RDBMS (Relational database management systems). Further this paper contains Introduction to NoSQL, Key attribute of NoSQL Databases, Scalability and Performance with NoSQL, and at last covering Introduction to NewSQL, its categorization, Characteristics of NewSQL solution, Architecture of NuoDB, VoltDB and Comparative Characteristic of RDBMS, NoSQL, and NewSQL Databases. The aim of this paper is to show mainly importance of NewSQL as a database management and give the best solution for embedded in an appliance with both commercial as well as open-source offerings available. Keywords: NuoDB, NoSQL, NewSQL, OLTP, RDBMS, VoltDB
I.
INTRODUCTION
Today's databases are not only expected to be flexible enough to handle different variety of data formats, they are expected to deliver extreme performance as well as easily scale to handle big data. According to an estimate, different type of data created in 2010 is approximately 1,200 exabytes and will continuously grow to approximately 8,000 exabytes in 2015, and more than 13,000 exabytes (approximately 5,200 GB of data for every person on Earth) in 2015 with the Internet being the primary data driver. This growth of storage capacity, leading to the emergence of data management systems where data is stored in a distributed way, but accessed as well as analyzed as if it resides on a single machine.
Figure: 1 Distributed Databases
Online social networking service (Facebook) needs to store approximately 135 billion messages a month and Twitter has the problem of storing 7 TB of data per day, with the prospect of this requirement doubling multiple times per year. Criticality RES Publication © 2012 http://ijmcs.info
of different types of data as well as continuity in data availability has become more important than ever and expects data to be available 24×7 and from everywhere. Structured Query Language (SQL) became the standard of data processing because it contains such as data definition, data manipulation and data querying, all under one umbrella. RDBMS have always been distinguished by the ACID (Atomicity, Consistency, Isolation, Durability) principle set that ensures that data integrity is preserved at all costs. RDBMS (Relational database management systems) can guarantee performance on the order of thousands of transactions per second, but in this time online transaction processing (OLTP) in scenarios such as games, advertising, fraud detection and risk analysis involves close to more than million transactions per second that traditional RDBMS cannot easily handle. High availability without any single point of failure as well as durability such as challenges have created a new wave in database processing solutions, that manage data in structured as well as unstructured ways. New type of data management solutions are emerging to handle distributed (relational/non-relational) content on open platforms at the speed of a mouse click. II.
NOSQL
Handle the big data problem has been the emergence of NoSQL (Not Only SQL) databases designed to meet scalability requirements of distributed architectures that is an alternative database technology. These all data stores not require fixed table schemas (schema-less), avoid join operations as well as Page | 13
International Journal of Modern Computer Science (IJMCS) Volume 3, Issue 1, February, 2014
scale horizontally. NoSQL were mainly developed in response to the failure of existing suppliers to address the performance, scalability, and flexibility requirements of large-scale data processing (Cloud Computing and Web applications).
ISSN: 2320-7868 (Online)
data across the cluster as well as even across data centers, to ensure high availability and support disaster recovery. A managed NoSQL database system should never need to be taken offline, for any reason, supporting 24x365 continuous operation of applications.
B. Distributed Query Support A relational database can eliminate or reduce the ability to perform complex data queries. Not Only SQL database systems retain their full query expressive power even when distributed across hundreds of servers.
C. Integrated Caching To reduce latency as well as increase sustained data throughput, advanced NoSQL (Not Only SQL) database technologies transparently cache data in system memory.
Figure: 2 CAP Theorem
Data model
Key Value store
Column Store
Docume nt Store
NoSQL Databas es
SimpleDB, Redis, Riak, Dynamo, Voldemort, BerkeleyD B
MongoD B, CouchD B
High
Cassandra, DynamoD B, Accumulo, HBase, Big Table, Hypertable, PNUTS High
High
High
High
Moderate
Variable (High) High
Variabl e Variabl e High
None
Low
Low
High
Variable (None)
Minimal
Variable (Low)
Graph Theory
Perform ance Scalabili ty Flexibili ty Comple xity Functio nality
Graph Datab ase Neo4j
IV. High
Table: 1 Key attribute of NoSQL Databases
III.
SCALABILITY AND PERFORMANCE WITH NOSQL
A. Auto-sharding NoSQL (Not Only SQL) database automatically spreads data across servers, without requiring any applications to participate. Servers can be removed as well as added from the data layer without application downtime, with data automatically spread across the servers. Most NoSQL databases support data replication, storing multiple copies of RES Publication © 2012 http://ijmcs.info
There are a lot of features and benefits to using NoSQL, but do not provide SQL (Structured Query Language) support, as well as non-adherence to ACID properties. NoSQL could help enterprises manage large distributed data but enterprises cannot afford to lose the ACID (Atomicity, Consistency, Isolation, Durability) properties. So, a new type of data-management solutions are emerging to address large data OLTP concerns, without sacrificing SQL interfaces as well as ACID properties.
NEWSQL
NewSQL databases mainly designed to meet scalability requirements of distributed architectures and to improve performance like that horizontal scalability is no longer a necessity, including new MySQL storage engines, transparent sharding technologies, software and hardware appliances, and completely new databases. NewSQL databases purport to have the equal performance as NoSQL (Not Only SQL) systems as well as provide administrators with ACID performance guarantees. To address big data OLTP (online transaction processing) business scenarios that neither traditional OLTP systems nor NoSQL systems address, but different type of database systems, commonly named NewSQL systems. NewSQL is a combination of various new scalable and highperformance SQL database vendors, these vendors have designed solutions to bring the benefits of the relational model to the distributed architecture, and improve the performance of relational databases to an extent that the scalability is no longer an issue. NewSQL was designed to maintain SQL while addressing existing issues with traditional OLTP systems, mainly their scalability and performance. They estimate these systems should operate approximately 50 times faster.
Page | 14
International Journal of Modern Computer Science (IJMCS) Volume 3, Issue 1, February, 2014
NEWSQL CATEGORIZATION
V.
NewSQL categorization is based on the different approaches adopted by vendors to preserve the SQL interface, as well as address the performance and scalability concerns of traditional OLTP solutions.
ISSN: 2320-7868 (Online)
data migration. ScaleBase that offers like a solution lets you get the scalability you need from the database, but instead of rewriting the database, you can use your existing one. Other solutions in the field are dbShards for instance and these systems provide a sharing middleware layer to automatically split databases across multiple nodes.
New Architecture databases
New MySQL storage engines
Transparent clustering/sha rding
VoltDB (open source)
TokuDB (commercial)
dbShards (commercial)
NuoDB (commercial)
InfiniDB
ScaleBase (commercial)
Drizzle (open source)
Xeround
ScalArc
4.
Clustrix (commercial)
GenieDB
Schooner MySQL
5.
MemSQL
Akiban
Continuent Tungsten (open source)
6.
Table: 2 Category of NewSQL Databases
A. New Architecture Databases This type of NewSQL systems are newly designed from scratch to achieve performance and scalability. One of the key in improving the performance is making non-disk (memory) or new kinds of disks (flash/SSD) the primary data store. Solutions can be software only (VoltDB, NuoDB and Drizzle) or supported as an appliance (Clustrix, Translattice). These completely new solutions that can support your scalability requirements. Some changes to the code will be required, and data migration is still needed. These new architectures can be further categorized such as General Purpose Databases and InMemory Databases.
VI. 1. 2. 3.
CHARACTERISTICS OF NEWSQL SOLUTION NewSQL provides feature SQL as the primary mechanism for application interaction. NewSQL support ACID properties for transactions. NewSQL controls a non-locking concurrency control mechanism which is helpful for the real-time reads will not conflict with writes. NewSQL (dbShards) architecture providing much higher per-node performance than available from traditional RDBMS solutions. NewSQL support a scale-out, parallel, sharednothing architecture, capable of running on a large number of nodes without suffering bottlenecks. NewSQL systems are approximately 50 times faster than traditional OLTP RDBMS. VII.
NEWSQL DATABASE NUODB
NuoDB is a SQL (Structured Query Language) as well as ACID (Atomicity, Consistency, Isolation, Durability) compliant distributed database management system. NuoDB is a new distributed, peer-to-peer, asynchronous approach which is different from traditional shared-disk or shared-nothing architectures. NuoDB was not designed with a specific operating system (OS), network backplane as well as virtualization model in mind but it is a formal piece of software that exploits the resources it is given. NuoDB supports distributed architecture which is split into three layers such as an administrative tier, a transactional tier and a storage tier which follows a three-tiered architecture mainly.
B. New MySQL Storage Engines MySQL is portion of the LAMP stack as well as is used extensively in OLTP. To overcome MySQL’s scalability problems, a set of storage engines are generated, that include Xeround, Akiban, MySQL NDB cluster, GenieDB, Tokutek, etc.
C. Transparent Clustering/Sharding Figure: 3 Architecture of NuoDB
This solutions retain the OLTP databases in their original format, and provide a pluggable feature to cluster transparent sharding to improve scalability. Schooner MySQL, Continuent Tungsten and ScalArc follow the former approach, but ScaleBase and dbShards follow the latter approach. Both approaches allow reuse of existing skillsets as well as ecosystem, and avoid the need to rewrite code or perform any RES Publication © 2012 http://ijmcs.info
VIII.
NEWSQL DATABASE VOLTDB
VoltDB uses NewSQL styled approaches that can easily execute transactions 45 times faster than a typical relational database system. VoltDB can also scale across 39 servers, as well as handle up to 1.6 million transactions per second across Page | 15
International Journal of Modern Computer Science (IJMCS) Volume 3, Issue 1, February, 2014
300 CPU cores. It requires far fewer servers than a typical Hadoop implementation, doing the same work in 20 nodes that would require Hadoop 1,000 nodes to execute. NewSQL is a category of SQL database products that address the scalability as well as performance issues posed by traditional online transaction processing (OLTP) relational database management systems (RDBMS). IX. COMPARATIVE CHARACTERISTICS OF RDBMS, NOSQL AND NEWSQL DATABASES Characteristic ACID compliance (Data, Transaction integrity) OLAP/OLTP Data analysis (aggregate, transform, etc.) Schema rigidity (Strict mapping of model) Data format flexibility Distributed computing Scale up (vertical)/Scale out (horizontal) Performance with growing data Performance overhead Popularity/community Support
RDBMS NoSQL NewSQL Yes No Yes Yes Yes
No No
Yes Yes
Yes
No
Maybe
No Yes Yes
Yes Yes Yes
Maybe Yes Yes
as workload for processing small and frequent requests and also focus on providing fast response times.
ACKNOWLEDGMENT This paper work was supported by DigiCollect GIS, Bangalore. We wish to thank to Mr. Sudhir Murthy (Chief Technology Officer, DigiCollect GIS, Bangalore) for valuable suggestions, kind support as well as encouragement. Further, also want to convey thanks to My Father and Mother and last but not least all faculty member of JECRC Foundation, Jaipur for their time to time suggestions and technical support.
REFERENCES [1]
[2]
[3]
Fast
Fast
Huge Huge
Moderate Minimal Growing Slowly growing
Very Fast
[4]
Table1: 3 Comparative Characteristic of RDBMS, NoSQL, and NewSQL Databases
X.
CONCLUSION
Key advances in resolving the big data problem has been the emergence of an alternative database technology and address big data OLTP business scenarios that neither traditional OLTP systems nor NoSQL systems address, alternative database systems have evolved (NewSQL systems). NewSQL is a class of modern RDBMS that provide the same scalable performance of NoSQL systems for OLTP read-write workloads while still maintaining the ACID guarantees of a traditional database system. In this paper, we discuss background of NoSQL as well as NewSQL databases systems and some bottlenecks for largescale data management systems and also discuss what inspired the NewSQL movement. Due to performance being the top priority, NoSQL as well as NewSQL databases tend to have more security gaps than traditional SQL databases and these issues need to research in depth to overcome the situation. NewSQL database need to be benchmarking and it is important to evaluate scalability and load testing, some of popular NoSQL and NewSQL Databases as well as compare these type of databases in respect of Big Data analytics. To compare performance we need to simulate the exact conditions as well RES Publication © 2012 http://ijmcs.info
ISSN: 2320-7868 (Online)
[5]
[6]
[7]
[8]
Rakesh Kumar, Shilpi Charu (January, 2015) “Comparison between Cloud Computing, Grid Computing, Cluster Computing and Virtualization”, International Journal of Modern Computer Science and Applications, Vol.3, Issue.1, pg. 42-47, ISSN: 2321-2632. Rakesh Kumar, Sakshi Gupta (December, 2014) “Open Source Infrastructure for Cloud Computing Platform Using Eucalyptus”, Global Journal of Computers & Technology, Vol.1, Issue.2, pg. 44-50, ISSN: 2394-501X. Siddharth Jain, Rakesh Kumar, Anamika, Sunil Kumar Jangir, (Dec 2014) “A Comparative Study for Cloud Computing Platform on Open Source Software”, ABHIYANTRIKI : An International Journal of Engineering & Technology (AIJET), Vol. 1, No. 2, pg: 28-35. Rakesh Kumar, Bhanu Bhushan Parashar (November, 2014) “Dynamic Resource Allocation and Management Using OpenStack”, National Conference on Emerging Technologies in Computer Engineering (NCETCE) – 2014, Supported by: Computer Society Chapter, IEEE Delhi Section. Rakesh Kumar, Sonu Agarwal, Muskan Bansal, Anurag Mishra (November, 2014) “Open Source Virtualization Management Using Ganeti Platform”, National Conference on Emerging Technologies in Computer Engineering (NCETCE) – 2014, Supported by: Computer Society Chapter, IEEE Delhi Section. Siddharth Jain, Rakesh Kumar, Sourabh Kumawat, Sunil Kumar Jangir (Novenber, 2014) “An analysis of security and privacy issues, Challenges with possible solution in cloud computing”, National Conference on Computational and Mathematical Sciences (COMPUTATIA-IV), Technically Sponsored By: ISITA and RAOPS, Jaipur. Rakesh Kumar, Laveena Adwani, Sourabh Kumawat, Sunil Kumar Jangir (November, 2014) “OpenNebula: Open Source IaaS Cloud Computing Software Platforms”, National Conference on Computational and Mathematical Sciences (COMPUTATIA-IV), Technically Sponsored By: ISITA and RAOPS, Jaipur. Rakesh Kumar, Bhanu Bhushan Parashar, Sakshi Gupta, Yougeshwary Sharma, Neha Gupta (October, 2014) “Apache Hadoop, NoSQL and NewSQL Solutions of Big Data”, International Journal of Advance Foundation and Research in
Page | 16
International Journal of Modern Computer Science (IJMCS) Volume 3, Issue 1, February, 2014
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
Science & Engineering (IJAFRSE), Volume 1, Issue 6, (Page No: 28-36). Rakesh Kumar, Yougeshwary Sharma, Sonu Agarwal, Pragya, Bhanu Bhushan Parashar (October, 2014) “Extremely effective CRM Solution Using Salesforce”, Journal of Emerging Technologies and Innovative Research (JETIR), Volume 1, Issue 5, (Page No: 278-282). Rakesh Kumar, Neha Gupta, Shilpi Charu, Somya Bansal, Kusum Yadav (August, 2014) “Comparison of SQL with HiveQL”, International Journal for Research in Technological Studies, Vol. 1, Issue 9, ISSN (online): 2348-1439, pg. 28-30. Rakesh Kumar, Kanishk Jain, Hitesh Maharwal, Neha Jain, Anjali Dadhich (July, 2014) “Apache CloudStack: Open Source Infrastructure as a Service Cloud Computing Platform”, IJAETMAS, Vol.1, Issue.2, pg. 111-116, ISSN: 2349-3224. Rakesh Kumar, Neha Gupta, Hitesh Maharwal, Shilpi Charu, Kusum Yadav (May, 2014 ) “Critical Analysis of Database Management Using NewSQL”, IJCSMC, Vol.3 Issue.5, pg. 434438, ISSN 2320–088X . Rakesh Kumar, Neha Gupta, Shilpi Charu, Kanishk Jain, Sunil Kumar Jangir (May, 2014) “Open Source Solution for Cloud Computing Platform Using OpenStack”, IJCSMC, Vol. 3, Issue. 5, pg.89 – 98, ISSN 2320–088X. Rakesh Kumar, Neha Gupta, Shilpi Charu, Sunil Kumar Jangir (April, 2014) “Manage Big Data through NewSQL”, National Conference on Innovation in Wireless Communication and Networking Technology – 2014, Association with THE INSTITUTION OF ENGINEERS(INDIA). Rakesh Kumar, Neha Gupta, Shilpi Charu, Sunil Kumar Jangir (April, 2014) “Architectural Paradigms of Big Data”, National Conference on Innovation in Wireless Communication and Networking Technology – 2014, Association with THE INSTITUTION OF ENGINEERS(INDIA). http://planetcassandra.org/what-is-nosql/
ISSN: 2320-7868 (Online)
Ms. Shilpi Charu is a Senior Lecturer in the Department of Information Technology of Jaipur Engineering College & Research centre, Jaipur. She completed her M.Tech. in Data Structures and Algorithms from Jagannath University, Jaipur and B.E. in Computer Engineering from Stani Memorial College of Engineering and Technology, Jaipur. She has more than 4 years of teaching experience and published 10 research papers in different International Journal and National Conferences. Her research interests includes Data Mining, Cloud Computing, Big data, Hadoop, NoSQL and NewSQL.
AUTHOR’S BIOGRAPHIES Mr. Rakesh Kumar was born in Karah Dih, Nalanda, Bihar, India in 15 April 1993. He is currently working as Software Engineer at DigiCollect GIS Bangalore, India and passed his B.Tech. Degree in the department of IT at JECRC, Jaipur, India which is affiliated to RTU, Kota. He has more than 1 year industrial experience and published 15 research papers in different International Journal and National Conferences. He is a senior member of the IACSIT, and member of SCIEI, UACEE, IAENG. He is Red Hat Certified System Administrator (RHCSA), Red Hat Certified Engineer (RHCE), Microsoft Certified Professional (MCP) and IBM DB2 Certified. His research interests includes Cloud Computing, Big data, Hadoop, NoSQL and NewSQL. RES Publication © 2012 http://ijmcs.info
Page | 17