Lenovo Big Data Configuration for Cloudera Enterprise with Apache ...

CONFIGURATION BRIEF Big Data

Lenovo Big Data Configuration for Cloudera Enterprise with Apache Spark Big Data with Volume, Velocity and Variety The Growth of Big Data

HIGHLIGHTS

By 2009, the world generated 800 billion GB of data, a level that is expected to increase to 40 trillion GB by 2020. In all, 90% of the data in the world today was created in the last two years alone. This data comes from everywhere, including sensors that are used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone global positioning system signals. This data is big data. Big data spans the following dimensions:   

Volume: Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information. Velocity: Often time-sensitive, big data must be used as it is streaming into the enterprise to maximize its value to the business. Variety: Big data extends beyond structured data, including unstructured data of all varieties, such as text, audio, video, click streams, and log files.

 Up to 100X Faster than Hadoop with MapReduce  Deploy with an Existing Hadoop Cluster or Standalone  No Restrictions on Scale Out Capability  Reduced time to value

Big Data Opportunity Big data is more than a challenge; it is an opportunity to find insight into new and emerging types of data to make your business more agile. Big data also is an opportunity to answer questions that, in the past, were beyond reach. Until now, there was no effective way to harvest this opportunity. Today, Cloudera uses the latest big data technologies, such as the massive map-reduce scale-out capabilities of Apache Hadoop, to open the door to a world of possibilities. Apache Hadoop has changed how big data is managed, processed and analyzed. It allows companies to store and process very large amounts of data at very low costs. At the heart of Apache Hadoop is the MapReduce platform. With its array of tools, MapReduce is a great platform for handling a broad range of batch processing requirements. Unfortunately, MapReduce relies on persistent storage and a one pass approach to processing data. These traits make MapReduce a less than optimal platform choice for data processing needs which require minimal latency and iterative computation. Fortunately there is a new platform available for handling these types of workloads: Apache Spark.

Rack deployment of Lenovo Big Data Configuration for Cloudera Enterprise with Apache Spark.

WWW.LENOVO.COM

CONFIGURATION BRIEF Lenovo Big Data Configuration for Cloudera Enterprise with Apache Spark

Apache Spark Accelerates Big Data What makes Apache Spark so different from Hadoop with MapReduce? Flexibility and a rich tool set. Apache Spark is gaining popularity as a big data processing framework that offers the ability to run different types of big data and analytics applications on a common framework. This includes batch, streaming, machine learning and others. The challenge of maintaining different codebases running on different clusters can be effectively addressed by deploying Spark applications on a single cluster.

Cloudera Provides Big Data with Options Cloudera brings the power of Apache Hadoop and Spark to the enterprise. Apache Hadoop and Spark are open source software frameworks used to reliably manage large volumes of structured and unstructured data. Cloudera enhances this technology to withstand the demands of your enterprise, adding administrative, workflow, provisioning, and security features. The result is that you get a more developer and user-friendly solution for complex, large-scale analytics. Cloudera allows organizations to run large-scale, distributed analytics jobs on clusters of cost-effective server hardware. This infrastructure can be used to tackle large data sets by breaking up the data into “chunks” and coordinating data processing across a massively parallel environment. After the raw data is stored across the nodes of a distributed cluster, queries and analysis of the data can be handled efficiently, with dynamic interpretation of the data formatted at read time. The bottom line: Businesses can finally get their arms around massive amounts of untapped data and mine that data for valuable insights in a more efficient, optimized, and scalable way.

2


The Apache Spark Stack The Spark architecture enables a single framework to be used for multiple projects. Typical big data usage scenarios to date have deployed the Hadoop stack for batch processing separately from another framework for stream processing, and yet another one for advanced analytics such as machine learning. Apache Spark combines these frameworks in a common architecture, thereby allowing easier management of the big data code stack and also enabling reuse of a common data repository. The Spark architecture can run in a variety of environments. It can run alongside the Hadoop stack, leveraging Hadoop YARN for cluster management. It can run over Apache Mesos and also includes a simple cluster manager Standalone Scheduler.

Lenovo and Cloudera Enable Spark Value This configuration is an extension of the Lenovo Big Data Reference Architecture for Cloudera Distribution Hadoop. The predefined configuration provides a baseline configuration for a big data solution, which can be modified, based on the specific customer requirements, such as lower cost, improved performance, and increased reliability. When Cloudera is deployed on Lenovo System servers with Lenovo networking components it yields a solution with superior performance, reliability, and scalability. This configuration supports entry level deployment models through high-end architectures and the ability to easily scale as the use of big data grows. A choice of infrastructure components provides the flexibility to meet a variety of big data analytic requirements.

3

Lenovo System x3550 M5 and x3650 M5 High Performance Rack Servers form the foundation of this configuration


High Performance and Value with Lenovo x3650/x3550 M5 servers and Cloudera CDH with Apache Spark

Why Lenovo System servers for Cloudera Apache Hadoop and Apache Spark Lenovo offers a wide range of servers and options. The Lenovo reference configurations for Cloudera Enterprise with Apache Spark bring together the right mix of technology and software. This configuration integrates the latest powerful Lenovo System rack and enterprise servers, robust Lenovo Networking options, and the big data capabilities of Cloudera Apache Hadoop and Apache Spark.

Why Lenovo Lenovo is a leading provider of x86 servers for the data center. Featuring rack, tower, blade, dense and converged systems, the Lenovo server portfolio provides excellent performance, reliability and security. Lenovo also offers a full range of networking, storage, software, solutions, and comprehensive services supporting business needs throughout the IT lifecycle. With options for planning, deployment, and support, Lenovo offers expertise and services needed to deliver better servicelevel agreements and generate greater end-user satisfaction.

For More Information To learn more about the Lenovo Big Data Configuration for Cloudera Enterprise with Apache Spark solution, contact your Lenovo Business Partner or visit: http://shop.lenovo.com/us/en/systems/solutions/big-data/

© 2016 Lenovo. All rights reserved. Availability: Offers, prices, specifications and availability may change without notice. Lenovo is not responsible for photographic or typographical errors. Warranty: For a copy of applicable warranties, write to: Lenovo Warranty Information, 1009 Think Place, Morrisville, NC, 27560, Lenovo makes no representation or warranty regarding thirdparty products or services. Trademarks: Lenovo, the Lenovo logo, System x, ThinkServer are trademarks or registered trademarks of Lenovo. Microsoft and Windows are registered trademarks of Microsoft Corporation. Intel, the Intel logo, Xeon and Xeon Inside are registered trademarks of Intel Corporation in the U.S. and other countries. Other company, product, and service names may be trademarks or service marks of others. CRN: BDACLDSPK62 06/2016

Lenovo Big Data Configuration for Cloudera Enterprise with Apache ...

Lenovo Big Data Configuration for Cloudera Enterprise with Apache ...

Suggest Documents

Lenovo Configuration Guide for Cloudera Enterprise with Apache Spark

Lenovo Configuration Guide for Cloudera Enterprise with Apache Spark

Lenovo Big Data Configuration for Apache Spark

Lenovo Big Data Validated Design for Cloudera Enterprise with VMware

Lenovo Big Data Validated Design for Cloudera Enterprise with VMware

Lenovo Big Data Reference Architecture for Cloudera Distribution for ...

Lenovo Big Data Validated Design for Cloudera Streaming Analytics

Lenovo Database Configuration for MongoDB Enterprise

Lenovo Database Configuration for MongoDB Enterprise

Using apache storm for big data

Lenovo Big Data Reference Architectures

Cloudera Enterprise Reference Architecture for VMware Deployments ...

Big data - Enterprise

Big data - Enterprise

Big Data - IDG Enterprise

Big Data - IDG Enterprise

Lenovo Big Data Reference Architecture for MapR Converged Data ...

Lenovo Big Data Validated Design for Hortonworks Data Platform [PDF]

Lenovo Big Data Reference Architecture for Hortonworks Data Platform

Apache Directory Studio Apache DS Configuration

Cloudera Developer Training for Apache Spark.pdf - Google Drive

Survey of Apache Big Data Stack

Big Data MDX with Mondrian and Apache Kylin - inovex GmbH

Oracle: Big Data for the Enterprise [PDF]