Data Science with Python

Course Overview

Data Science with Python Training is a course designed to introduce participants to the field of data science and the Python programming language. The course covers a range of topics, including statistical analysis, machine learning, data visualization, and data manipulation. It is designed for people who want to learn how to use Python for data analysis, or for those who are already familiar with Python and want to learn more about data science. The course may include lectures, hands-on exercises, and projects to help participants develop their skills and knowledge.

At the end of the training, participants will be able to:

1.Introducing participants to the field of data science and the Python programming language, including key concepts, tools, and techniques used in data science.
2.Teaching participants how to use Python for data analysis and manipulation, including importing, cleaning, and manipulating data.
3.Providing an overview of statistical analysis and machine learning techniques, and demonstrating how to apply these techniques using Python.

Pre-requisite

Some Programming Experience

Duration

3 days

Course Outline

Introduction to Data Science

What is analytics and data science?
Common terms in analytics
Different Sectors Using Data Science
Purpose and Components of Python

Python Essentials

What is Python?
Features of Python
Why Python?
Interpreter and types
Applications of Python
“Hello World” program
Variables
Types of variable datatypes
Example programs with each type
Operators
Types of operators
Basic programs
Operator overloading
Define control statements
Types of control statements
Why Looping statements are used?
Types of looping statements
Range function
Functions
Types of functions
Global and local variables
Modules
Types of modules and use
What is Files?
Type of Files
File Access Mode
Handling I/O
Oops concept
Collection
Collection module and types
Types of error
Exception handling
Concept of Packages/Libraries – Important packages(NumPy, Pandas, Matplotlib)

Numpy

What is Numpy
What is Ndarray
Data types in NumPy
Mathematical Functions
Array manipulation
Numpy array visualization
Broadcasting

Pandas for Data Manipulation

What is Pandas
Concepts of Pandas
Why and how pandas is used for data manipulation
Cleansing Data with Python
Data Manipulation
Data manipulation tools
Python Built-in Functions (Text, numeric, date, utility functions)
Python User Defined Functions
Stripping out extraneous information
Normalizing data

Matplotlib for Data Visualization

Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
Data Analytics Conclusion or Predictions
Data Analytics Communication

Importing and Exporting Data Using Python Modules

Importing Data from various sources (csv, txt, excel, access, etc)
Connecting to database
Viewing Data objects – sub setting, methods
Exporting Data to various formats

Statistics

Basic Statistics – Measures of Central Tendencies and Variance
Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
Inferential Statistics -Sampling – Concept of Hypothesis Testing
Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations, and Chi-square
Introduction exploratory data analysis
Descriptive statistics, Frequency Tables
Univariate Analysis (Distribution of data & Graphical Analysis)

Predictive Modelling and Data Exploration

Concept of model in analytics and how it is used?
Common terminology used
Popular modelling algorithms
Types of Business problems – Mapping of Techniques
Different Phases of Predictive Modelling
EDA for exploring the data and identifying any problems with the data
Identify missing data
Identify outliers data
Visualize the data trends and patterns

Solving Regression Problems

What is regression?
Applications of regression
Types of regression
Fitting the regression line
Simple linear regression
Simple linear regression in python
Polynomial regression
Polynomial regression in python
Gradient Descent
Cost function
Regularization
Ridge and lasso Regression

Solving Classification Problems

How is classification used?
Applications of classification
Logistic Regression, Sigmoid function
Decision tree
K-Nearest Neighbors (K-NN)
SVM
Naive Bayes
Confusion Matrix
Precision, Recall
F1-score
RoC, AuC
n-fold cross validation
Measuring classifier performance
Factors affecting classifier performance
Overfitting
Ensemble Learning
Bagging and Boosting

Solving Clustering Problems

Application of Unsupervised learning, examples and applications
Clustering
Hierarchical Clustering in Python, Agglomerative and Divisive techniques
Measuring the distance between two clusters
k-means algorithm
Limitations of K-means clustering
SSE and Distortion measurements
Demo: Agglomerative Hierarchical clustering

Solving Forcasting Problems

Time Series Forecasting
Introduction – Applications
Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
Classification of Techniques(Pattern based – Pattern less)
Basic Techniques – Averages, Smoothening, etc
Advanced Techniques – AR Models, ARIMA, etc
Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc

Web Scraping with Beautiful Soup

Web Scraping and Parsing
Understanding and Searching the Tree
Navigating options
Modifying the Tree
Parsing and Printing the Document

+91-81029 35454

info@greaterinsights.in

GREATERINSIGHTS LLP