Cloudera Developer

CloudLabs

Projects

Assignment

24x7 Support

Lifetime Access

Course Overview

Cloudera Developer training course delivers the key concepts and expertise developers need to use Apache Hadoop to develop high-performance parallel applications. Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources.

At the end of the training, participants will be able to:

Pre-requisite

should have programming experience
knowledge of Java.

Duarion

5 days

Course Outline

The Motivation For Hadoop

Problems with traditional large-scale systems
Requirements for a new approach

Hadoop Basic Concepts

An Overview of Hadoop
The Hadoop Distributed File System
Hands-On Exercise
How MapReduce Works
Hands-On Exercise
Anatomy of a Hadoop Cluster
Other Hadoop Ecosystem Components

Writing a MapReduce Program

The MapReduce Flow
Examining a Sample MapReduce Program
Basic MapReduce API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop’s Streaming API
Using Eclipse for Rapid Development

Integrating Hadoop Into The Workflow

Relational Database Management Systems
Storage Systems
Creating workflows with Oozie
Importing Data from RDBMSs With Sqoop
Hands-On Exercise
Importing Real-Time Data with Flume
Accessing HDFS Using FuseDFS and Hoop

Delving Deeper Into The Hadoop API

Using Combiners
Using LocalJobRunner Mode for Faster Development
Reducing Intermediate Data with Combiners
The configure and close methods for MapReduce Setup and Teardown
Writing Partitioners for Better Load Balancing
Directly Accessing HDFS
Using The Distributed Cache

Using Hive and Pig

Hive Basics
Pig Basics

Common MapReduce Algorithms

Sorting and Searching
Indexing
Machine Learning with Mahout
Term Frequency – Inverse Document Frequency
Word Co-Occurrence

Practical Development Tips and Techniques

Testing with MRUnit
Debugging MapReduce Code
Using LocalJobRunner Mode for Easier Debugging
Eclipse development techniques
Retrieving Job Information with Counters
Logging
Splittable File Formats
Determining the Optimal Number of Reducers
Map-Only MapReduce Jobs
Implementing Multiple Mappers using ChainMapper

More Advanced MapReduce Programming

Custom Writables and WritableComparables
Saving Binary Data using SequenceFiles and Avro Files
Creating InputFormats and OutputFormats

Joining Data Sets in MapReduce Jobs

Map-Side Joins
The Secondary Sort
Reduce-Side Joins

Graph Manipulation in Hadoop

Introduction to graph techniques
Representing Graphs in Hadoop
Implementing a sample algorithm: Single Source Shortest Path

Creating Workflows with Oozie

The Motivation for Oozie
Oozie’s Workflow Definition Format

+91-81029 35454

info@greaterinsights.in

GREATERINSIGHTS LLP

Cloudera Developer

CloudLabs

Projects

Assignment

24x7 Support

Lifetime Access

Course Overview

At the end of the training, participants will be able to:

Pre-requisite

Duarion

Course Outline

Reviews

EXPLORE

All Courses

About Us

Privacy Policy

Resources

Terms & Conditions

LOCATION

GET IN TOUCH!

768, 14th Cross Rd, 2nd Stage, Kumaraswamy Layout, Bengaluru, Karnataka 560078

+91-81029 35454

info@greaterinsights.in

Need help with Corporate Training?

© Copyright 2025 by GREATERINSIGHTS LLP. All rights Reserved