Cloudera Developers Training - SoftSource Solutions Pte Ltd

4 downloads 141 Views 457KB Size Report
Hadoop Ecosystem - Hive, Pig, Sqoop, Flume and Oozie. The Cloudera Certified Developer for Apache Hadoop. (CCDH) certifies your technical knowledge, ...
EXCLUSIVE

TRAINING BUNDLES INCLUDES: - Instructor-led Cloudera Developer training - Certificate of Training Completion - 6 months access to unlimited attempts of the practice tests - CCDH Exam voucher (optional)

CONTACT US NOW This course is designed for people with experience with programming languages such as PHP, Python or C#. A background in Java is preferred. B e s t f o r D e ve l o p e r s , Programmers, Engineers & A r c h i t e c t s

Cloudera’s four-day developer training course for Apache Hadoop delivers the key concepts and expertise necessary to create robust data processing applications using Apache Hadoop. Through lecture and interactive, hands-on exercises conducted by SoftSource Solutions’ trainers certified by Cloudera, the attendees will navigate the Hadoop ecosystem, learning topics such as: • The core technologies of Hadoop - HDFS and MapReduce. • How to develop and test MapReduce applications • How to use MapReduce combiners, partitioners, and the distributed cache • Best practices for developing and debugging MapReduce applications • How to implement data input and output in MapReduce applications • Algorithms for common MapReduce tasks • How to join data sets in MapReduce • How Hadoop integrates into the data center • How to use Mahout’s Machine Learning algorithms

• Hadoop Ecosystem - Hive, Pig, Sqoop, Flume and Oozie. The Cloudera Certified Developer for Apache Hadoop (CCDH) certifies your technical knowledge, skill and ability to write, maintain, and optimize Apache Hadoop development projects. The exam can be demanding and will test your fluency with concepts and terminology covered in the training.

SoftSource Solutions Pte Ltd http://www.softsource.com.sg | 6746 5355 | [email protected] ©2013 SoftSource Solutions Pte Ltd. All rights reserved. SoftSource and the SoftSource logo are trademarks or registered trademarks of SoftSource Solutions Pte Ltd. All other trademarks are the property of their respective companies. Information is subject to change without notice.

Introduction The Motivation for Hadoop • Problems with Traditional Large-Scale Systems • Requirements for a New Approach • Introducing Hadoop Hadoop: Basic Concepts • The Hadoop Project and Hadoop Components • The Hadoop Distributed File System • Hands-On Exercise: Using HDFS • How MapReduce Works • Hands-On Exercise: Running a MapReduce Job • How a Hadoop Cluster Operates • Other Hadoop Ecosystem Projects Writing a MapReduce Program • The MapReduce Flow • Basic MapReduce API Concepts • Writing MapReduce Drivers, Mappers and Reducers in Java • Writing Mappers and Reducers in Other Languages Using the Streaming API • Speeding Up Hadoop Development by Using Eclipse • Hands-On Exercise: Writing a MapReduce Program • Differences Between the Old and New MapReduce APIs Unit Testing MapReduce Programs • Unit Testing • The JUnit and MRUnit Testing Frameworks • Writing Unit Tests with MRUnit • Hands-On Exercise: Writing Unit Tests with the MRUnit Framework Delving Deeper into the Hadoop API • Using the ToolRunner Class • Hands-On Exercise: Writing and Implementing a Combiner • Setting Up and Tearing Down Mappers and Reducers by Using the Configure and Close Methods

• Writing Custom Partitioners for Better Load Balancing • Optional Hands-On Exercise: Writing a Partitioner • Accessing HDFS Programmatically • Using The Distributed Cache • Using the Hadoop API’s Library of Mappers, Reducers and Partitioners Practical Development Tips and Techniques • Strategies for Debugging MapReduce Code • Testing MapReduce Code Locally by Using LocalJobReducer • Writing and Viewing Log Files • Retrieving Job Information with Counters • Determining the Optimal Number of Reducers for a Job • Creating Map-Only MapReduce Jobs • Hands-On Exercise: Using Counters and a Map-Only Job Data Input and Output • Creating Custom Writable and WritableComparable Implementations • Saving Binary Data Using SequenceFile and Avro Data Files • Implementing Custom Input Formats and Output Formats • Issues to Consider When Using File Compression • Hands-On Exercise: Using SequenceFiles and File Compression Common MapReduce Algorithms • Sorting and Searching Large Data Sets • Performing a Secondary Sort • Indexing Data • Hands-On Exercise: Creating an Inverted Index • Computing Term Frequency — Inverse Document Frequency • Calculating Word Co-Occurrence • Hands-On Exercise: Calculating Word Co-Occurrence

• Hands-On Exercise: Implementing Word Co-Occurrence with a Customer Writable-Comparable Joining Data Sets in MapReduce Jobs • Writing a Map-Side Join • Writing a Reduce-Side Join Integrating Hadoop into the Enterprise Workflow • Integrating Hadoop into an Existing Enterprise • Loading Data from an RDBMS into HDFS by Using Sqoop • Hands-On Exercise: Importing Data with Sqoop • Managing Real-Time Data Using Flume • Accessing HDFS from Legacy Systems with FuseDFS and HttpFS Machine Learning and Mahout • Introduction to Machine Learning • Using Mahout • Hands-On Exercise: Using a Mahout Recommender An Introduction to Hive and Pig • The Motivation for Hive and Pig • Hive Basics • Hands-On Exercise: Manipulating Data with Hive • Pig Basics • Hands-On Exercise: Using Pig to Retrieve Movie Names from Our Recommender • Choosing Between Hive and Pig An Introduction to Oozie • Introduction to Oozie • Creating Oozie Workflows • Hands-On Exercise: Running an Oozie Workflow Conclusion

SoftSource Solutions Pte Ltd http://www.softsource.com.sg | 6746 5355 | [email protected] ©2013 SoftSource Solutions Pte Ltd. All rights reserved. SoftSource and the SoftSource logo are trademarks or registered trademarks of SoftSource Solutions Pte Ltd. All other trademarks are the property of their respective companies. Information is subject to change without notice.