Data Science & Big Data Analytics

Course Overview

Master the core concepts of Big Data Analytics and Lifecycle and deploy  Data Analytics Lifecycle to address big data analytics projects.

Understand how to communicate analytic insights to business sponsors and analytic audiences and get practical knowledge to use tools such as R and RStudio, MapReduce/Hadoop, in-database analytics, Window and MADlib functions with EMC Data Science training.

The  EMC Data Science and Big Data Analytics training will provide candidates hands-on experience on advanced SQL and MADlib for In-database analytics and advanced analytics methods such as K Means Clustering, Association Rules,  Linear Regression, logistic regression and much more.

At the end of the training, participants will be able to:

  1. Address big data analytics projects
  2. Apply appropriate analytic techniques and tools to analyze big data, create statistical models, and identify insights that can lead to actionable results
  3. Select appropriate data visualizations to clearly communicate analytic insights to business sponsors and analytic audiences
  4. Use and work on  R and RStudio, MapReduce/Hadoop, in-database analytics, Window and Madlib functions
  5. Pass EMC Data Science and Big Data Analytics certification

Pre-requisite

  1. A strong quantitative background with a solid understanding of basic statistics
  2. Experience with a scripting language, such as Java, Perl, or Python (or R).  If candidates are looking to update their programming skills, they should check out the various SpringPeople certification training here
  3. Experience with SQL

Duration

2 days

Course Outline

  1. Big Data Overview
  2. State of the Practice in Analytics
  3. The Data Scientist
  4. Big Data Analytics in Industry Verticals
  1. Discovery
  2. Data Preparation
  3. Model Planning
  4. Model Building
  5. Communicating Results
  6. Operationalizing
  1. Using R to Look at Data – Introduction to R
  2. Analyzing and Exploring the Data
  3. Statistics for Model Building and Evaluation
  1. Acordion Content
  2. K Means Clustering
  3. Association Rules
  4. Linear and Logistic Regression
  5. Naive Bayesian Classifier
  6. Decision Trees
  7. Time Series Analysis
  8. Text Analysis

c

  1. Analytics for Unstructured Data – MapReduce and Hadoop
  2. The Hadoop Ecosystem:In-database Analytics – SQL Essentials
  3. Advanced SQL and MADlib for In-database Analytics
  1. Operationalizing an Analytics Project
  2. Creating the Final Deliverables
  3. Data Visualization Techniques
  4. Final Lab Exercise on Big Data Analytics

Reviews