Effective Way to Handling Big Data Problems using ...

8 downloads 622 Views 517KB Size Report
NoSQL Solutions are basically meant to solve a big data problem that relational databases are either ... organizations need a web based solution that integrates ...
Journal of Advanced Database Management & Systems ISSN: 2393-8730 (online) Volume 2, Issue 2 www.stmjournals.com

Effective Way to Handling Big Data Problems using NoSQL Database (MongoDB) Rakesh Kumar1*, Shilpi Charu2, Somya Bansal2 1

Software Engineer & Digi Collect GIS, Bangalore, India Department of Information Technology & JECRC, Jaipur, Rajasthan, India

2

Abstract NoSQL Solutions are basically meant to solve a big data problem that relational databases are either not well suited for, too expensive to use or require you to implement something that breaks the relational nature of your DB anyways. In this research paper, we are discussing about reasons for fall other database, introduction to NoSQL database, types, examples, benefits, advantages, challenges of NoSQL and when use what types of NoSQL. Further, discussing about introduction to MongoDB (NoSQL database), how to use MongoDB in blogpost, features, advantages, limitations of MongoDB and at last covering installation process of MongoDB on Ubuntu 14.04 LTS. The aim of this paper is to show an importance of NoSQL database (MongoDB) that give effective way to handling big data problems. Keywords: ACID, BASE, CAP, CURD, ETL, MongoDB, NoSQL, SQL

*Author for Correspondence E-mail: [email protected]

Approximately 80–90% of the different types of data in this modern time have been created in the past few years alone and every day, we create approximately 2.5 quintillion bytes of data that comes from everywhere like posts to social sites, collect weather information, digital images and videos, transaction information, and more. In such scenario, organizations need a web based solution that integrates the robust presentation features of a portal like user interfaces, collaboration, and secure access, with centralized as well as enormously scalable data storage as the back end, composing of different types of content such as Images, Audio, Video, Documents, Metadata in huge amount [1–4].

mainly without impacting performance or taking the database offline. B. Scaling Problems Relational databases were mainly designed for single-server configurations, not for horizontal scale-out. They were meant to serve approximately 100s of ops per second, not 100,000s of ops per second. With a lot of engineering hours, custom sharding layers, as well as caches, scaling an RDBMS is hard at best or impossible at worst. C. Takes Too Long Analyzing data in real time requires a break from the familiar ETL as well as data warehouse approach. You do not have time for lengthy load schedules. You need also to run aggregation queries against variably structured data.

Reason for Fall of Other Databases Many databases make you choose between a flexible data model, low latency at scale, as well as powerful access, but increasingly you need all mainly three at the same time. A. Rigid Schemas You should be able to analyze semi-structured, unstructured, as well as polymorphic data, and it should be easy to add new data. But problem is that these types of data do not belong in relational rows and columns. Relational schemas are hard to change incrementally,

NoSQL NoSQL (Not only SQL) database provides a mechanism for retrieval as well as storage of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach contain simplicity of design, horizontal scaling, as well as finer control over availability. NoSQL databases are increasingly used in big data as well as real-time web applications. Many NoSQL database stores compromise

INTRODUCTION

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 42

Handling Big Data Problems Using MongoDB

consistency (in the sense of the CAP theorem) in favour of availability as well as partition tolerance [5–8]. Types of NOSQL A. Key-Value Storage Key-Value Storages are simple and easy NoSQL systems such as Redis that are basically a really fancy hash table. You have a value you want to get later, so you assign it a key as well as stuff it into the database, you can only query a single object at a time and only by a single key. B. Document Storage Normally, these are objects with a hierarchical structure such as XML files, JSON files, and any other sort of tree structure, but the values of different nodes on the tree can be indexed.

Kumar et al.

They have a much speed relative to traditional row-based SQL databases on lookup because they sacrifice performance on joining. C. Columnar Storage These store the data in columns rather than rows, so updating and adding are expensive, but most queries are cheap because each column is essentially implicitly indexed. But, if your query can not use an index, you are in no better shape with a Columnar Store rather than a regular SQL database. D. Graph Storage Graph Databases (neo4j) make joins as cheap as possible, because even a simple row query would require many joins to retrieve. A tablescan type query would may be slower than a standard SQL database because of all of the extra joins to retrieve the data.

Table I: Some Examples of NoSQL Types of NoSQL

Examples

Key-Value Storage

CouchDB, Dynamo, FoundationDB, MemcacheDB, Redis, Riak,Fair Com c-treeACE, Aerospike, OrientDB, MUMPS

Document Storage

Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB

Columnar Storage

Accumulo, Cassandra, Druid, HBase, Vertica

Graph Storage

Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog

BENEFITS OF NOSQL When compared with relational databases, NoSQL databases are scalable as well as provide more performance, and their data model addresses many issues that relational model is not designed to address:  Large volumes of structured, semistructured, as well as unstructured data.  Quick iteration, agile sprints, as well as frequent code pushes.  Object-oriented programming that is simply to use as well as flexible.  Scale-out architecture instead of expensive, monolithic architecture.

ADVANTAGES OF NOSQL A. Elastic Scaling As transaction rates as well as availability requirements increases, and as databases move into the cloud, the economic advantages of scaling out on commodity hardware become irresistible. Relational databases might not scale out easily on commodity clusters, but the new breed of NoSQL databases are designed to expand transparently to take advantage of

new nodes, as well as they are usually designed with low-cost commodity hardware. B. Big Data Today, the volumes of big data that can be handled by NoSQL systems, like Hadoop, outstrip what can be handled by the biggest RDBMS. C. Goodbye DBAs NoSQL databases are mainly designed from the ground up to require less management like automatic repair, data distribution, as well as simpler data models lead to lower administration. D. Economics NoSQL databases mainly use clusters of cheap commodity servers to manage the exploding data as well as transaction volumes, while relational databases tend to rely on expensive proprietary servers and storage systems. When using NoSQL the cost per gigabyte or transaction/second for NoSQL can be many times less than the cost for RDBMS, allowing you to store as well as process more data at a much lower price.

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 43

Journal of Advanced Database Management & Systems Volume 2, Issue 2 ISSN: 2393-8730(online)

E. Flexible Data Models NoSQL Key Value stores as well as document databases allow the application to store virtually any structure it wants in a data element. More rigidly defined BigTable-based NoSQL databases (Cassandra, HBase) typically allow new columns to be created without too much fuss.

falls well short of that goal. NoSQL today requires a lot of skill to install as well as a lot of effort to maintain. E. Expertise Almost each NoSQL developer is in a learning mode and situation will address naturally over time, but for now, it's far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.

CHALLENGES OF NOSQL The promise of the NoSQL has generated a lot of enthusiasm, but there are many obstacles to overcome before they can appeal to mainstream enterprises. Here, are a few of the important challenges such as: A. Maturity Relational databases systems have been around for a long time, stable as well as richly functional. NoSQL (Not only SQL) advocates will argue that their advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is reassuring. Most, not only SQL alternatives are in preproduction versions with large key features yet to be implemented. Living on the technological leading edge is a demanding prospect for large developers, but enterprises should approach it with extreme caution. B. Support Every organization wants the reassurance that if a key system fails, they will be able to get competent support as well as timely. All Relational databases management system vendors go to great lengths to support a high level of enterprise. Many NoSQL systems are open source projects, and although there are usually one or more firms offering support for each NoSQL database, these companies often are small start-ups without the global reach, support resources, as well as credibility of a Microsoft, Oracle, or IBM. C. Business intelligence and analytics Not only SQL databases have evolved to meet the scaling demands of modern Web 2.0 applications as well as offer some facilities for ad-hoc query and analysis. Some relief is provided by the emergence of solutions like HIVE or PIG that can provide easier access to data held in Hadoop clusters as well as perhaps eventually, other NoSQL databases. D. Administration The main goals for NoSQL may be to provide a zero admin solution, but the current reality

CHOOSING NOSQL 







Key-value databases are mainly useful for storing session information, preferences, user profiles, shopping cart data and would avoid using this database when we need to query by data, have relationships between the data being stored. Document databases are mainly useful for Content Management Systems (CMS), blogging platforms, web analytics, ecommerce applications, and Real-time analytics and would avoid using this database for systems that need complex transactions spanning multiple operations. Column family databases are mainly useful for CMS, blogging platforms, expiring usage, maintaining counters, heavy write volume like log aggregation and would avoid using this databases for systems that are in early development, changing query patterns. Graph databases are well suited to problem spaces where we have connected data, like spatial data, social networks, routing information for goods as well as money, recommendation engines.

MongoDB MongoDB is a famous NoSQL database that is an open source, written in C++, crossplatform, high performance as well as document oriented database. MongoDB uses collections to store data as well as represent relationships between them and data is in the format of BSON documents. It is a binary format in that zero or more key/value pairs are stored as a single entity i.e. as a document. BSON is based on JSON style documents. JSON (JavaScript Object Notation) is a format that is easy for computers to parse and generate [9–12].

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 44

Handling Big Data Problems Using MongoDB

Using MongoDB in BlogPost Using MongoDB database, your blog posts can be stored in a single collection, with each entry looking like this: With a document type database your data is stored almost exactly as it is represented in your program. {_id: 1, Author: {Name: "Rakesh Kumar", email: "[email protected]"}, Post: "I like you", Date: {$date: "2015-03-12 12:47UTC"}, Location: [-121.2322, 42.1223222], Rating: 1.1, Comments: [ {User: "[email protected]", Upvotes: 11, Downvotes: 04, Text: "I agree with you"}, {User: "[email protected]", Upvotes: 321, Downvotes: 11, Text: "You are a man"} ], Tags: ["Politics", "Virginia"]} Features of MongoDB MongoDB features include like full index support, replication, high availability, and auto-sharding. Here we are discussing some important features of MongoDB such as: A. Indexing MongoDB supports secondary indexing, that makes retrieval faster as well as unique, compound and geospatial indexing is also possible. B. Stored JavaScript Users can also use JavaScript function as well as scripts on server side. C. Aggregation MongoDB supports MapReduce that is a very useful aggregation tool. D. Horizontal Scaling MongoDB scales horizontally. MongoDB scales out and up easily on a variety of platforms including in the cloud using services like Amazon EC2 and Rackspace. E. Sharding This is a process in which large databases are broken down into different tables so that they can be processed on multiple machines and in MongoDB, this is automatic. MongoDB’s sophisticated sharding keys make balancing your data across large clusters easy and powerful.

Kumar et al.

MongoDB How Makes Easy Many organizations are using MongoDB database for analytics because it lets them store any kind of data, analyze it in real time, as well as change the schema as they go. A. New Data MongoDB’s document model enables you to store as well as process data of any structure like events, time series data, geospatial coordinates, text, binary data, and anything else. You can easily adapt the structure of a document’s schema just by adding new fields, making it simple to bring in new data as it becomes available. B. Horizontal Scalability MongoDB’s supports automatic sharding distributes data across fleets of commodity servers, with complete application transparency. Multiple options for scaling like range-based, hash-based as well as locationaware sharding, MongoDB can support thousands of nodes, petabytes of data, as well as hundreds of thousands of ops per second without requiring you to build custom partitioning and caching layers. C. Powerful Analytics In Real Time, with rich index as well as query support like secondary, geospatial and text search indexes as well as the aggregation framework and native MapReduce (MR), MongoDB can easily run complex ad-hoc analytics and reporting in place. Advantages of MongoDB  Schema-less (without schema) design enables rapid introduction of new CDR (Call Detail Record) types to the system  Scale BillRun production site already controls several TB in a single table, w/o being limited by adding new fields  Rapid replica Set easily enables meeting regulation with easy to setup multi data center DRP as well as HA solution  Sharding enables linear as well as scale out growth w/o running out of budget  With over approximately 2,000/s CDR inserts, then MongoDB architecture is great for a system that must support high insert load. So you can easily guarantee transactions with findAndModify as well as two-phase commit  Supports developer oriented queries

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 45

Journal of Advanced Database Management & Systems Volume 2, Issue 2 ISSN: 2393-8730(online) 

Location based is being utilized to analyze users usage as well as determining where to invest in cellular infrastructure

Dis-advantages of MongoDB  The current database is locked when MongoDB is writing onto it; therefore this does not allow concurrent writes  MongoDB reports scalability constraints when the data exceeds hundreds of GB Install MongoDB on UBUNTU 14.04 LTS MongoDB is a NoSQL database type intended for storing large amounts of data in documentoriented storage with dynamic schemas [13– 15]. NoSQL refers to a database with a data model other than the tabular format used in relational databases like MySQL, PostgreSQL, and Microsoft SQL. Here we are discussing about step by step to install MongoDB on Ubuntu 14.04 LTS:

MongoDB you just installed to prevent apt-get from auto-updating. echo "mongodb-org hold" | sudo dpkg --setselections echo "mongodb-org-server hold" | sudo dpkg -set-selections echo "mongodb-org-shell hold" | sudo dpkg -set-selections echo "mongodb-org-mongos hold" | sudo dpkg --set-selections echo "mongodb-org-tools hold" | sudo dpkg -set-selections Step #3: Get MongoDB Running: Start MongoDB sudo service MongoDB start Check MongoDB Service Status sudo service MongoDB status

Step #1: Setup a the Package Database First of all Import the MongoDB public key used by the package management system: sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

Summary List of Status Statistics (Continuous) MongoSTAT

Create a List File for MongoDB: echo 'deb http://downloadsdistro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list

Enter the MongoDB Command Line Mongo

Reload Local Package Database: sudo apt-get update Step #2: Install Latest Stable Version of MongoDB

Summary List of Status Statistics (5 Rows, Summarized Every 2 Seconds) MongoSTAT–Rowcount 5 2

By default, running this command will look for a MongoDB server listening on port 27017 on the localhost interface. If you’d like to connect to a MongoDB server running on a different port, then use the –port option. For example, if you wanted to connect to a local MongoDB server listening on port 22222, then you’d issue the following command: Mongo –port 22222

Install Stable Version of MongoDB: sudo apt-get install -y mongodb-org sudo apt-get install -y mongodb-org=2.6.1 mongodb-org-server=2.6.1 mongodb-orgshell=2.6.1 mongodb-org-mongos=2.6.1 mongodb-org-tools=2.6.1 If you’d like MongoDB to auto-update with apt-get than you’re done with the installation. But, it’s possible to ‘pin’ the version of

Shutdown MongoDB sudo service MongoDB stop Restart MongoDB sudo service MongoDB restart Step #4: Verify MongoDB Installation: Check installed MongoDB version $ Mongo --version Connect MongoDB using command line and execute some test commands for checking

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 46

Handling Big Data Problems Using MongoDB

proper working. $ Mongo > db.test.save({tecadmin: 100}) > db.test.find () {“_id”: Object Id (“52b0dc8285f8a8071cbb5daf”), “tecadmin”: 100}

CONCLUSION NoSQL Databases are becoming an increasingly important part of the database landscape, as well as when used appropriately, can offer real benefits. Enterprises should proceed with caution with full awareness of the legitimate limitations as well as issues that are associated with these databases [16–18].

REFERENCES 1. K. Rakesh, C. Shilpi. An Importance of Using Virtualization Technology in Cloud Computing. Global Journal of Computers & Technology. ISSN: 2394-501X. Feb 2015; 1(2): 56–60p. 2. K. Rakesh, C. Shilpi. Comparison between Cloud Computing, Grid Computing, Cluster Computing and Virtualization. International Journal of Modern Computer Science and Applications. ISSN: 2321-2632. Jan 2015; 3(1), 42–47p. 3. K. Rakesh. OpenStack Juno Release Includes Features of NFV, Big Data. International Journal of Modern Embedded System (IJMES). ISSN: 23209003(Online). Dec 2014; 2(6): 11–13p. 4. K. Rakesh, G. Sakshi. Open Source Infrastructure for Cloud Computing Platform Using Eucalyptus. Global Journal of Computers & Technology. ISSN: 2394-501X. Dec 2014; 1(2): 44– 50p. 5. Jain et al. A Comparative Study for Cloud Computing Platform on Open Source Software. ABHIYANTRIKI: An International Journal of Engineering & Technology (AIJET). Dec 2014; 1(2): 28– 35p. 6. K. Rakesh, P. Bhanu Bhushan. Dynamic Resource Allocation and Management Using OpenStack. National Conference on Emerging Technologies in Computer Engineering (NCETCE) – Nov. 2014, Supported by: Computer Society Chapter, IEEE Delhi Section.

Kumar et al.

7. Kumar et al. Open Source Virtualization Management Using Ganeti Platform. National Conference on Emerging Technologies in Computer Engineering (NCETCE) – 2014, Supported by: Computer Society Chapter, IEEE Delhi Section. 8. Jain et al. An analysis of security and privacy issues, Challenges with possible solution in cloud computing. National Conference on Computational and Mathematical Sciences (COMPUTATIA-IV). Technically Sponsored By: ISITA and RAOPS, Jaipur. Nov 2014. 9. Kumar et al. OpenNebula: Open Source IaaS Cloud Computing Software Platforms. National Conference on Computational and Mathematical Sciences (COMPUTATIA-IV). Technically Sponsored By: ISITA and RAOPS, Jaipur. Nov 2014. 10. Kumar et al. Apache Hadoop, NoSQL and NewSQL Solutions of Big Data. International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE). Oct 2014; 1(6): 28–36p. 11. Kumar et al. Extremely effective CRM Solution Using Sales force. Journal of Emerging Technologies and Innovative Research (JETIR). Oct 2014; 1(5): 278– 282p. 12. Kumar et al. Comparison of SQL with HiveQL. International Journal for Research in Technological Studies. ISSN (Online): 2348-1439, Aug 2014; 1(9): 28– 30p. 13. Kumar et al. Apache CloudStack: Open Source Infrastructure as a Service Cloud Computing Platform. IJAETMAS. ISSN: 2349-3224. Jul 2014; 1(2): 111–116p. 14. Kumar et al. Critical Analysis of Database Management Using NewSQL. IJCSMC. ISSN 2320–088X. May 2014; 3(5): 434– 438p. 15. Kumar et al. Open Source Solution for Cloud Computing Platform Using OpenStack. IJCSMC. ISSN 2320–088X. May 2014; 3(5): 89–98p.

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 47

Journal of Advanced Database Management & Systems Volume 2, Issue 2 ISSN: 2393-8730(online)

16. Kumar et al. Manage Big Data through NewSQL. National Conference on Innovation in Wireless Communication and Networking Technology–April 2014, Association with The Institution of Engineers (India). 17. Kumar et al. Architectural Paradigms of Big Data. National Conference on Innovation in Wireless Communication and Networking Technology–April 2014, Association with The Institution of Engineers (India).

18. MongoDB URL: http://www. mongodb. org/

Cite this Article Kumar Rakesh, Charu Shilpi, Bansal Somya. Effective Way to Handling Big Data Problems using NoSQL Database (MongoDB). Journal of Advanced Database Management & Systems. 2015; 2(2): 42–48p.

JoADMS (2015) 42-48 © STM Journals 2015. All Rights Reserved

Page 48

Suggest Documents