Data Science

Course Overview

Data Science Training is a course that teaches participants how to use statistical and machine learning techniques to analyze and interpret complex data sets. The training covers topics such as data cleaning, data exploration, data visualization, and model building and evaluation. Participants will learn how to use tools such as Python and R to work with data and build predictive models. The goal of the training is to provide participants with the skills and knowledge needed to become data scientists, who are responsible for using data to gain insights and make data-driven decisions.

At the end of the training, participants will be able to:

  1. Explain the term called Data Science
  2. Explore Data Manipulation using R
  3. Extract and Interpret Data using R

Pre-requisite

There is no specific pre-requisite for the course however exposure to core Java and statistics will be beneficial.

Duration

2 days

Course Outline

  1. Introduction to Statistics
  2. Types of Statistics
  3. Measures of central tendency
  4. Measure of dispersion
  5. Visualization Techniques
  6. Bias, Skews, Percentiles and Ranges
  7. Probability
  8. Bayes Theorem and Decision Tree
  1. R packages
  2. Understanding Vectors in R
  3. Data Manipulation Techniques
  4. R functions
  1. Basic YARN and Map Reduce
  2. Pig
  3. Hive
  1. Core Spark
  2. Resilient Distributed Datasets (RDDs)
  3. RDD operations
  4. Spark Cluster Managers
  5. Spark Abstractions

Reviews