Cloudera Developer

CloudLabs

Projects

Assignment

24x7 Support

Lifetime Access

.

Course Overview

Cloudera Developer training course delivers the key concepts and expertise developers need to use Apache Hadoop to develop high-performance parallel applications. Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources.

At the end of the training, participants will be able to:

Pre-requisite

  1. should have programming experience
  2. knowledge of Java.

Duarion

5 days

Course Outline

  1. Problems with traditional large-scale systems
  2. Requirements for a new approach
  1. An Overview of Hadoop
  2. The Hadoop Distributed File System
  3. Hands-On Exercise
  4. How MapReduce Works
  5. Hands-On Exercise
  6. Anatomy of a Hadoop Cluster
  7. Other Hadoop Ecosystem Components
  1. The MapReduce Flow
  2. Examining a Sample MapReduce Program
  3. Basic MapReduce API Concepts
  4. The Driver Code
  5. The Mapper
  6. The Reducer
  7. Hadoop’s Streaming API
  8. Using Eclipse for Rapid Development
  1. Relational Database Management Systems
  2. Storage Systems
  3. Creating workflows with Oozie
  4. Importing Data from RDBMSs With Sqoop
  5. Hands-On Exercise
  6. Importing Real-Time Data with Flume
  7. Accessing HDFS Using FuseDFS and Hoop
  1. Using Combiners
  2. Using LocalJobRunner Mode for Faster Development
  3. Reducing Intermediate Data with Combiners
  4. The configure and close methods for MapReduce Setup and Teardown
  5. Writing Partitioners for Better Load Balancing
  6. Directly Accessing HDFS
  7. Using The Distributed Cache
  1. Hive Basics
  2. Pig Basics
  1. Sorting and Searching
  2. Indexing
  3. Machine Learning with Mahout
  4. Term Frequency – Inverse Document Frequency
  5. Word Co-Occurrence
  1. Testing with MRUnit
  2. Debugging MapReduce Code
  3. Using LocalJobRunner Mode for Easier Debugging
  4. Eclipse development techniques
  5. Retrieving Job Information with Counters
  6. Logging
  7. Splittable File Formats
  8. Determining the Optimal Number of Reducers
  9. Map-Only MapReduce Jobs
  10. Implementing Multiple Mappers using ChainMapper
  1. Custom Writables and WritableComparables
  2. Saving Binary Data using SequenceFiles and Avro Files
  3. Creating InputFormats and OutputFormats
  1. Map-Side Joins
  2. The Secondary Sort
  3. Reduce-Side Joins
  1. Introduction to graph techniques
  2. Representing Graphs in Hadoop
  3. Implementing a sample algorithm: Single Source Shortest Path
  1. The Motivation for Oozie
  2. Oozie’s Workflow Definition Format

Reviews