Machine Learning with Mahout

Course Overview

Master the skills you need to implement ML algorithms to process large enterprise data sets with our Mahout Machine Learning course. Gain a deep understanding of core Apache Mahout algorithms, supporting infrastructure including input/output tools, and integration points with other libraries.

In SpringPeople’s Mahout machine learning certification course, you will gain mastery over the three core focus areas – Collaborative filtering, Clustering and Categorization in Apache Mahout and their real-life application in enterprises. 

In this machine learning Mahout course, you will learn to use standard Mahout libraries to create blazing fast, sequential classifiers capable of online learning in demanding environments such as processing a huge database of documents. You will also learn to implement recommendation mining to find items users might like based on their behavior.

With Cloud labs, gain hands-on experience deploying Mahout on AWS with Amazon EMR to process large data sets in Cloud. Master the use of sequential and parallel implementations of the classic ML algorithm designed to model real-world business processes.

At the end of the training, participants will be able to:

  1. Appreciate the “3 Cs” of Mahout implementation and the inter-relation of Hadoop and Mahout
  2. Setup Mahout on Hadoop
  3. Implement Supervised and Unsupervised algorithms in Mahout
  4. Implement different types of recommender systems, identify similarities and optimize them
  5. Deploy complex clustering algorithms and achieve vectorization
  6. Develop, train and evaluate classification systems using algorithms such as naive Bayes and random forest
  7. Implement Mahout on Amazon AMR to process data from Amazon EC2 instances

Pre-requisite

Fundamental level understanding of AI & Machine Learning is required.

Duration

3 days

Course Outline

  1. ML Fundamentals
  2. Apache Mahout Basics
  3. History of Mahout
  4. Supervised and Unsupervised Learning techniques
  5. Mahout and Hadoop
  6. Introduction to Clustering, Classification
  1. Mahout on Apache Hadoop setup
  2. Mahout and Myrrix
  1. Recommendations using Mahout
  2. Introduction to Recommendation systems
  3. Content Based Collaborative filtering
  4. User based, Nearest N Users, Threshold, Item based
  5. Mahout Optimizations
  1. User based recommendation
  2. User Neighbourhood
  3. Item based Recommendation
  4. Implementing a Recommender using MapReduce
  5. Platforms: Similarity Measures, Manhattan Distance
  6. Euclidean Distance, Cosine Similarity
  7. Pearson’s Correlation Similarity
  8. Log-likelihood Similarity
  9. Tanimoto, Evaluating Recommendation Engines (Online and Offline)
  10. Recommenders in Production
  1. Common Clustering Algorithms
  2. K-means
  3. Canopy Clustering
  4. Fuzzy K-means and Mean Shift etc.
  5. Representing Data
  6. Feature Selection
  7. Vectorization
  8. Representing Vectors
  9. Clustering documents through example
  10. TF-IDF, Implementing clustering in Hadoop
  1. Examples, Basics
  2. Predictor variables and Target variables
  3. Common Algorithms
  4. SGD, SVM
  5. Naive Bayes
  6. Random Forests
  7. Training and evaluating a Classifier
  8. Developing a Classifier
  1. Mahout on Amazon EMR
  2. Mahout Vs R
  3. Introduction to tools like Weka
  4. Octave
  5. Matlab, SAS

Reviews