Apache Hive

CloudLabs

Projects

Assignment

24x7 Support

Lifetime Access

.

Course Overview

Master core concepts on hadoop distributed file system and Understand apache pig and advanced apache hive programming concepts as you learn with our certified experts. Learn how to use Hcatalog, joining datasets in apache hive and HDFS Commands.Gain practical experience to import and export RDBMS data into HDFS, analyze clickstream data. Data using quantiles. With our cloudlabs get hands-on experience to run a YARN application, apache hive programming, analyzing big data with apache hive, join datasets with apache pig and starting an HDP cluster.

At the end of the training, participants will be able to:

  1. Explain Hadoop and the Hadoop Distributed File System (HDFS)
  2. Interpret Common HDFS Commands Types
  3. Export Table
  4. Distinguish between Relational Databases and Hadoop
  5. Explain Purpose of NameNodes, DataNode, MapReduce and Reduce Phases
  6. Differentiate Pig Latin Relation Names and Field Names
  7. Explain programming concepts using PIG and HIVE.
  8. Perform Inner, Outer and Replicated Join
  9. Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig
  10. Explain Lifecycle of YARN Applications
  11. Common use cases of Spark
  12. Load Data and Perform a Word Count
  13. Perform SQL Queries
  14. Perform DataFrame Operations
  15. Submit an Apache Oozie Workflow 

Pre-requisite

  1. Should be familiar with programming principles and have experience in software development.
  2. SQL knowledge is also helpful.
  3. No prior Hadoop knowledge is required.

Duration

2 days

Course Outline

  1. List the Three “V”s of Big Data
  2. List the Six Key Hadoop Data Types
  3. Describe Hadoop, YARN and Use Cases for Hadoop
  4. Describe Hadoop Ecosystem Tools and Frameworks
  5. Describe the Differences Between Relational Databases and Hadoop
  6. Describe What is New in Hadoop 2.x
  7. Describe the Hadoop Distributed File System (HDFS)
  8. Describe the Differences Between HDFS and an RDBMS
  9. Describe the Purpose of NameNodes and DataNodes
  10. List Common HDFS Commands
  11. Describe HDFS File Permissions
  12. List Options for Data Input
  13. Describe WebHDFS
  14. Describe the Purpose of Sqoop and Flume
  15. Describe How to Export to a Table
  16. Describe the Purpose of MapReduce
  17. Define Key/Value Pairs in MapReduce
  18. Describe the Map and Reduce Phases
  19. Describe Hadoop Streaming
  20. Starting an HDP Cluster
  21. Demonstration: Understanding Block Storage (Lab)
  22. Using HDFS Commands (Lab)
  23. Importing RDBMS Data into HDFS (Lab)
  24. Exporting HDFS Data to an RDBMS (Lab)
  25. Importing Log Data into HDFS Using Flume (Lab)
  26. Demonstration: Understanding MapReduce (Lab)
  27. Running a MapReduce Job (Lab)
  1. Describe the Purpose of Apache Pig
  2. Describe the Purpose of Pig Latin
  3. Demonstrate the Use of the Grunt Shell
  4. List Pig Latin Relation Names and Field Names
  5. List Pig Data Types
  6. Define a Schema
  7. Describe the Purpose of the GROUP Operator
  8. Describe Common Pig Operators ( ORDER BY, CASE, DISTINCT, PARALLEL, FLATTEN, FOREACH)
  9. Perform an Inner, Outer and Replicated Join
  10. Describe the Purpose of the DataFu Library
  11. Demonstration: Understanding Apache Pig (Lab)
  12. Getting Starting with Apache Pig (Lab)
  13. Exploring Data with Apache Pig (Lab)
  14. Splitting a Dataset (Lab)
  15. Joining Datasets with Apache Pig (Lab)
  16. Preparing Data for Apache Hive (Lab)
  17. Demonstration: Computing Page Rank (Lab)
  18. Analyzing Clickstream Data (Lab)
  19. Analyzing Stock Market Data Using Quantiles (Lab)
  1. Describe the Purpose of Apache Hive
  2. Describe the Differences Between Apache Hive and SQL
  3. Describe the Apache Hive Architecture
  4. Demonstrate How to Submit Hive Queries
  5. Describe How to Define Tables
  6. Describe How to Load Date Into Hive
  7. Define Hive Partitions, Buckets and Skew
  8. Describe How to Sort Data
  9. List Hive Join Strategies
  10. Describe the Purpose of HCatalog
  11. Describe the HCatalog Ecosystem
  12. Define a New Schema
  13. Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig
  14. Perform a Multi-table/File Insert
  15. Describe the Purpose of Views
  16. Describe the Purpose of the OVER Clause
  17. Describe the Purpose of Windows
  18. List Hive Analytics Functions
  19. List Hive File Formats
  20. Describe the Purpose of Hive SerDe
  21. Understanding Hive Tables (Lab)
  22. Understanding Partition and Skew (Lab)
  23. Analyzing Big Data with Apache Hive (Lab)
  24. Demonstration: Computing NGrams (Lab)
  25. Joining Datasets in Apache Hive (Lab)
  26. Computing NGrams of Emails in Avro Format (Lab)
  27. Using HCatalog with Apache Pig (Lab)
  1. Describe the Purpose HDFS Federation
  2. Describe the Purpose of HDFS High Availability (HA)
  3. Describe the Purpose of the Quorum Journal Manager
  4. Demonstrate How to Configure Automatic Failover
  5. Describe the Purpose of YARN
  6. List the Components of YARN
  7. Describe the Lifecycle of a YARN Application
  8. Describe the Purpose of a Cluster View
  9. Describe the Purpose of Apache Slider
  10. Describe the Origin and Purpose of Apache Spark
  11. List Common Spark Use Cases
  12. Describe the Differences Between Apache Spark and MapReduce
  13. Demonstrate the Use of the Spark Shell
  14. Describe the Purpose of an Resilient Distributed Dateset (RDD)
  15. Demonstrate How to Load Data and Perform a Word Count
  16. Define Lazy Evaluation
  17. Describe How to Load Multiple Types of Data
  18. Demonstrate How to Perform SQL Queries
  19. Demonstrate How to Perform DataFrame Operations
  20. Describe the Purpose of the Optimization Engine
  21. Describe the Purpose of Apache Oozie
  22. Describe Apache Pig Actions
  23. Describe Apache Hive Actions
  24. Describe MapReduce Actions
  25. Describe How to Submit an Apache Oozie Workflow
  26. Define an Oozie Coordinator Job
  27. Advanced Apache Hive Programming (Lab)
  28. Running a YARN Application (Lab)
  29. Getting Started with Apache Spark (Lab)
  30. Exploring Apache Spark SQL (Lab)
  31. Defining an Apache Oozie Workflow (Lab)

Reviews