Data Science with Python
Course Overview
Data Science with Python Training is a course designed to introduce participants to the field of data science and the Python programming language. The course covers a range of topics, including statistical analysis, machine learning, data visualization, and data manipulation. It is designed for people who want to learn how to use Python for data analysis, or for those who are already familiar with Python and want to learn more about data science. The course may include lectures, hands-on exercises, and projects to help participants develop their skills and knowledge.
At the end of the training, participants will be able to:
- 1.Introducing participants to the field of data science and the Python programming language, including key concepts, tools, and techniques used in data science.
- 2.Teaching participants how to use Python for data analysis and manipulation, including importing, cleaning, and manipulating data.
- 3.Providing an overview of statistical analysis and machine learning techniques, and demonstrating how to apply these techniques using Python.
Pre-requisite
Some Programming Experience
Duration
3 days
Course Outline
- What is analytics and data science?
- Common terms in analytics
- Different Sectors Using Data Science
- Purpose and Components of Python
- What is Python?
- Features of Python
- Why Python?
- Interpreter and types
- Applications of Python
- “Hello World” program
- Variables
- Types of variable datatypes
- Example programs with each type
- Operators
- Types of operators
- Basic programs
- Operator overloading
- Define control statements
- Types of control statements
- Why Looping statements are used?
- Types of looping statements
- Range function
- Functions
- Types of functions
- Global and local variables
- Modules
- Types of modules and use
- What is Files?
- Type of Files
- File Access Mode
- Handling I/O
- Oops concept
- Collection
- Collection module and types
- Types of error
- Exception handling
- Concept of Packages/Libraries – Important packages(NumPy, Pandas, Matplotlib)
- What is Numpy
- What is Ndarray
- Data types in NumPy
- Mathematical Functions
- Array manipulation
- Numpy array visualization
- Broadcasting
- What is Pandas
- Concepts of Pandas
- Why and how pandas is used for data manipulation
- Cleansing Data with Python
- Data Manipulation
- Data manipulation tools
- Python Built-in Functions (Text, numeric, date, utility functions)
- Python User Defined Functions
- Stripping out extraneous information
- Normalizing data
- Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
- Data Analytics Conclusion or Predictions
- Data Analytics Communication
- Importing Data from various sources (csv, txt, excel, access, etc)
- Connecting to database
- Viewing Data objects – sub setting, methods
- Exporting Data to various formats
- Basic Statistics – Measures of Central Tendencies and Variance
- Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
- Inferential Statistics -Sampling – Concept of Hypothesis Testing
- Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations, and Chi-square
- Introduction exploratory data analysis
- Descriptive statistics, Frequency Tables
- Univariate Analysis (Distribution of data & Graphical Analysis)
- Concept of model in analytics and how it is used?
- Common terminology used
- Popular modelling algorithms
- Types of Business problems – Mapping of Techniques
- Different Phases of Predictive Modelling
- EDA for exploring the data and identifying any problems with the data
- Identify missing data
- Identify outliers data
- Visualize the data trends and patterns
- What is regression?
- Applications of regression
- Types of regression
- Fitting the regression line
- Simple linear regression
- Simple linear regression in python
- Polynomial regression
- Polynomial regression in python
- Gradient Descent
- Cost function
- Regularization
- Ridge and lasso Regression
- How is classification used?
- Applications of classification
- Logistic Regression, Sigmoid function
- Decision tree
- K-Nearest Neighbors (K-NN)
- SVM
- Naive Bayes
- Confusion Matrix
- Precision, Recall
- F1-score
- RoC, AuC
- n-fold cross validation
- Measuring classifier performance
- Factors affecting classifier performance
- Overfitting
- Ensemble Learning
- Bagging and Boosting
- Application of Unsupervised learning, examples and applications
- Clustering
- Hierarchical Clustering in Python, Agglomerative and Divisive techniques
- Measuring the distance between two clusters
- k-means algorithm
- Limitations of K-means clustering
- SSE and Distortion measurements
- Demo: Agglomerative Hierarchical clustering
- Time Series Forecasting
- Introduction – Applications
- Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
- Classification of Techniques(Pattern based – Pattern less)
- Basic Techniques – Averages, Smoothening, etc
- Advanced Techniques – AR Models, ARIMA, etc
- Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc
- Web Scraping and Parsing
- Understanding and Searching the Tree
- Navigating options
- Modifying the Tree
- Parsing and Printing the Document