Lenovo Big Data Reference Architectures

0 downloads 180 Views 531KB Size Report
MapR Converged Data Platform for Apache Hadoop and Apache Spark ... Hadoop Distributed File System (HDFS), HCatalog, Pig
ARCHITECTURE BRIEF Big Data

Lenovo Big Data Reference Architectures Easy-to-implement hardware, software and services for analyzing data at rest and data in motion

Lenovo Big Data Solutions Support:  Apache Hadoop and Apache Spark distributions  Batch and real-time analytics  Industry-standard interfaces  Various data types and databases  Variety of client interfaces  High Performance and great value with Lenovo x3650 M5 and x3550 M5 servers

Big data is here—in a big way. Mind-boggling volumes of data are created every second of every day. What’s more, the number and types of sources generating data is growing exponentially from both people and electronic devices. More data should lead to better decisions. But because data growth is so fast, many enterprises are analyzing a smaller percentage of data than before. So what is the answer? You don’t have the time, resources or capital to experiment with what works in big data analytics and what doesn’t. The challenge you face is to manage the volume, variety and velocity of data, apply analytics for better insight and use this insight for making better business decisions faster than the competition. Lenovo reference architecture solutions for big data analytics are easy to order and easy to implement. They include hardware, software and services along with a standardized blueprint. You can quickly deploy these cost-effective solutions to analyze stored data at rest such as databases. Additionally, these solutions may be used to analyze data in motion, like streaming data. The solutions are built around powerful, affordable, scalable Lenovo M5 servers and Lenovo networking options so you can deploy proven solutions quickly.

WWW.LENOVO.COM

ARCHITECTURE BRIEF Lenovo Big Data Reference Architectures

Benefits of deploying a Lenovo Reference Architecture Lenovo offers Big Data Reference Architectures based on distributions from:    

Cloudera Hortonworks IBM MapR

Big data projects are not all alike. What works for one company may not work for another. Instead of re-inventing the wheel to apply business analytics to big data, it makes business sense to take advantage of what has already been proven. This is especially true when considering a reference architecture. Reference architectures for big data analytics provide technical blueprints that include a well-defined scope, a complete listing of requirements, and architectural decisions proven in the field including Lenovo Scalable Infrastructure Services that help make complex solutions simple. A reference architecture provides the value of synergy amongst each of the solution building blocks, with the flexibility needed to meet your requirements. Take a look at some of the advantages of using a Lenovo reference architecture for big data: Ease of integration: Deployment using a reference architecture helps ensure the new solution works with what you already have in place and are likely to add later, such as data warehouse, stream compute engines, internal and external storage devices and more. Flexibility and simplicity: A reference architecture helps strike a cost-effective balance between requirements of the solution and time-to-value, offering flexibility for your enterprise to adapt as your big data requirements evolve. Simplify complex solutions with Lenovo Scalable Infrastructure Services. Lenovo Scalable Infrastructure Services can reduce the complexity of deployment with preintegrated, delivered and fully supported solutions that match best-in-industry components with optimized solution design. With Lenovo Scalable Infrastructure Services, you can focus your efforts on maximizing business value, instead of consuming valuable resources to design, optimize, install and support the infrastructure required to meet business demands. Lenovo has reference architectures for the following Big Data offerings:      

Cloudera CDH for Apache Hadoop and Apache Spark Hortonworks Data Platform for Apache Hadoop IBM BigInsights for Apache Hadoop Lenovo Big Data Configuration for Apache Spark MapR Converged Data Platform for Apache Hadoop and Apache Spark

2

ARCHITECTURE BRIEF Lenovo Big Data Reference Architectures

Cloudera Enterprise Distribution The Lenovo Big Data Reference Architecture for Cloudera Distribution for Hadoop is certified by Cloudera and provides a thoroughly tested and integrated solution that combines the benefits of leading-edge technologies with mature, enterprise-ready features. Starting with a preconfigured hardware platform that is Cloudera-certified helps your team to be up and running analytics quickly. Supporting both Apache Hadoop and Apache Spark, Cloudera allows organizations to run largescale, distributed analytics jobs on clusters of cost-effective server hardware. This infrastructure can be leveraged to tackle very large data sets by breaking up the data into smaller sets and coordinating the processing of the data across a massively parallel environment.

Hortonworks Data Platform This Lenovo Big Data Reference Architecture for Hortonworks Data Platform is certified by Hortonworks and provides a thoroughly tested and integrated solution that combines the benefits of leading-edge technologies with mature, enterprise-ready features. Starting with a preconfigured and Hortonworks certified hardware platform helps your team to be up and running analytics quickly. The Hortonworks Data Platform, powered by Apache Hadoop, is a highly scalable and fully open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Hadoop projects including MapReduce, Hadoop Distributed File System (HDFS), HCatalog, Pig, Spark, Hive, HBase, ZooKeeper and Ambari. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included.

IBM BigInsights The Lenovo Big Data Reference Architecture for IBM BigInsights brings the power of Apache Hadoop to the enterprise to reliably manage large volumes of structured and unstructured data. IBM BigInsights enhances Hadoop technology to withstand the demands of your enterprise, adding administrative, workflow, provisioning, and security features. This reference architecture is built around powerful, affordable, scalable Lenovo server, storage and networking options. Combined with IBM BigInsights software, this architecture enables you to quickly deploy a proven design. IBM BigInsights combines the best of open source software with enterprise-grade capabilities to help meet your critical business requirements. It provides a thoroughly tested solution that combines the benefits of leading-edge technologies with mature, enterprise ready features.

MapR Distribution The Lenovo Big Data Reference Architecture for MapR Converged Data Platform for Apache Hadoop and Apache Spark leverages the MapR container based storage architecture to provide a reliable file system service distributed across the entire cluster. It uses the Hadoop MapReduce framework and the Apache Spark architecture to tackle a wide variety of applications including very large data sets spread across many nodes. This Lenovo reference architecture is certified by MapR and provides a thoroughly tested and integrated solution that combines the benefits of leading-edge technologies with mature, enterprise-ready features. Starting with a preconfigured hardware platform helps your team to be up and running quickly. MapR software deployed on Lenovo servers and networking components provides extremely high performance, reliability and scalability. This reference architecture supports entry through high-end configurations and the ability to easily scale as the use of big data grows. A choice of infrastructure components provides flexibility in meeting varying big data analytics requirements.

3

ARCHITECTURE BRIEF Lenovo Big Data Reference Architectures

Delivering Apache Hadoop and Apache Spark

Lenovo x3650 M5 2U rack server

Lenovo x3550 M5 1U rack server

Lenovo G8272 1U 10 GbE RackSwitch

While the power of Hadoop is easy to understand, it can be difficult to implement due to the wide variety of workload characteristics that must be identified before the IT infrastructure can be designed—variables such as workload type, size of MapReduce worker and task, compression, scaling requirements and many more. Relying on reference architecture can speed time-to-value with faster deployment and less guesswork. The Lenovo M5 rack servers, like the powerful two-socket x3650 M5 and x3550 M5, enhance performance and reduce power consumption of big data clusters. With a design architecture well suited for big data workloads, the 2U two-socket x3650 M5 rack server supports industry leading data storage capacity, the latest Intel Xeon E5-2600 high performance compute processing power, flash storage options, and energy efficient features. All three Big Data reference architectures leverage this server as a data node for scale-out clusters. Lenovo M5 rack servers are intelligent, industry-leading enterprise x86 systems designed to reduce costs and complexity. They are the result of a design that combines proven innovation and industry standard components. Solutions range from feature-rich, scalable enterprise x86 servers to rack optimized servers designed for business productivity to entry level tower servers. They have been created to achieve three primary objectives:  Maximize performance: Lenovo M5 servers are designed to extract the most compute power possible from x86 CPUs and deliver it to your workloads.  Minimize cost: Lenovo M5 servers help lower acquisition costs of enterprise servers, enabling you to deploy small clusters while preserving the ability to greatly expand clusters if and when you need to grow your infrastructure.  Simplify deployment: With single points of management and remote administration, the Lenovo M5 servers are built for simpler qualification, faster deployment and easier administration. As mentioned, Lenovo M5 servers are designed to reduce costs and simplify your IT infrastructure. Managing energy in the data center is a growing concern due to the increasing numbers of servers, the incremental heat they generate and ultimately, the rising cost of energy. System x servers not only help increase performance per watt, but also help you budget, plan and control power usage. The data network is a private data interconnect network between the nodes used for data access, moving data across nodes within a cluster and ingesting data into the cluster. While a 1 Gb Ethernet switch may be sufficient for some workloads, a 10 Gb Ethernet switch can provide extra I/O bandwidth for added performance. The recommended 10 Gb Ethernet switch is the Lenovo RackSwitch G8272. Hardware management is supported with the Lenovo XClarity™ Administrator, which runs on the dedicated systems management node. As a centralized resource management solution it reduces complexity, speeds up response, and enhances the availability of Lenovo® server systems and solutions. Lenovo XClarity Administrator provides foundational systems management for Lenovo servers and networking equipment. Through an uncluttered, dashboard-driven GUI, XClarity provides automated discovery, monitoring, firmware updates, pattern-based configuration management, and installation of operating systems to bare metal servers.

4

ARCHITECTURE BRIEF Lenovo Big Data Reference Architectures

Why Lenovo Lenovo is a leading provider of x86 servers for the data center. The server portfolio includes rack, tower, blade, dense and converged systems, and provides excellent performance, reliability and security. Lenovo also offers a full range of networking, storage, software, solutions, and comprehensive services supporting business needs throughout the IT lifecycle (such as planning, deployment, and support). Lenovo offers expertise and services needed to deliver better service-level agreements, and generate greater end-user satisfaction. Download the Lenovo Big Data Reference Architecture for Cloudera Distribution for Hadoop Download the Lenovo Big Data Reference Architecture for Hortonworks Data Platform Download the Lenovo Big Data Reference Architecture for IBM BigInsights brings the power of Apache Hadoop Download the Lenovo Big Data Reference Architecture for MapR Converged Data Platform Download the Lenovo Big Data Configuration for Apache Spark

For More Information To learn more about the Lenovo Big Data Reference Architectures, contact your Lenovo Business Partner or visit: http://shop.lenovo.com/us/en/systems/solutions/big-data/

© 2017 Lenovo. All rights reserved. Availability: Offers, prices, specifications and availability may change without notice. Lenovo is not responsible for photographic or typographical errors. Warranty: For a copy of applicable warranties, write to: Lenovo Warranty Information, 1009 Think Place, Morrisville, NC, 27560, Lenovo makes no representation or warranty regarding thirdparty products or services. Trademarks: Lenovo, the Lenovo logo, System x, ThinkServer are trademarks or registered trademarks of Lenovo. Microsoft and Windows are registered trademarks of Microsoft Corporation. Intel, the Intel logo, Xeon and Xeon Inside are registered trademarks of Intel Corporation in the U.S. and other countries. Other company, product, and service names may be trademarks or service marks of others. CRN: BDACLDSPK62 04/2017