0% found this document useful (0 votes)
107 views12 pages

Data Science & ML Syllabus

This document outlines a course on special programming classes and Python for data science. The special programming classes module covers basic programming concepts like variables, operators, and control structures over 10 hours across 5 chapters. The Python for data science module covers Python programming basics, data types, functions, modules, file I/O, exceptions, NumPy, Pandas, Matplotlib and Seaborn over 32 hours across 8 chapters. It includes assignments on Python basics, data structures, functions, NumPy, Pandas and case studies. The second module covers statistics and machine learning fundamentals over 24 hours in 2 chapters on probability, descriptive statistics, linear algebra and matrices.

Uploaded by

Aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views12 pages

Data Science & ML Syllabus

This document outlines a course on special programming classes and Python for data science. The special programming classes module covers basic programming concepts like variables, operators, and control structures over 10 hours across 5 chapters. The Python for data science module covers Python programming basics, data types, functions, modules, file I/O, exceptions, NumPy, Pandas, Matplotlib and Seaborn over 32 hours across 8 chapters. It includes assignments on Python basics, data structures, functions, NumPy, Pandas and case studies. The second module covers statistics and machine learning fundamentals over 24 hours in 2 chapters on probability, descriptive statistics, linear algebra and matrices.

Uploaded by

Aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MODULE 0 : SPECIAL PROGRAMMING CLASSES | Module

Term 1 0
10 hours Basic

Chapter 1: Introduction to Chapter 2: Jupyter notebook


Programming ( 3 hrs ) basics (1 hrs)

What is a programming language ? Different type of code editors in


Source code Vs bytecode Vs pythonIntroduction to Anaconda
machine code and jupyter notebookFlavours of
Compiler Vs Interpreter python.
C/C++, Java Vs Python

Chapter 3: Python Programming Chapter 4: Statistics basics (2 hrs)


Basics (2 hrs ) 
Introduction to statisticsMean,
Variable Vs identifiers Vs strings median, mode, Standard deviation,
Operators Vs operand Procedure AverageIntroduction to probability,
oriented Vs modular programming permutations and
combinationsIntroduction to linear
Algebra

Chapter 5: Git and GitHub (2 hrs) [NOTE]

Learn the key concepts of the Git This module 0 is for those who are
source control system from non-technical background like
Step through the entire basic Git Mechanical, BBA, MBA, B.Com,
workflow M.Com, etc.
Configure SSH for authentication Or for those who work in Non-IT
Create and use a remote repository sectors to get in-depth knowledge
on GitHub of programming and how to use it
Git Overview in Data Science.
Set up & configuration
Working with git locally

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 1
MODULE 1 : PYTHON FOR DATA SCIENCE | 40 hours
Python

1. Programming Basics & 2. Python Programming Overview


Environment Setup Python Overview
Installing Anaconda, Anaconda Python 2.7 vs Python 3
Basics and Introduction Writing your First Python Program
Get familiar with version control, Git Lines and Indentation, Python
and GitHub. Identifiers
Basic Github Commands. Various Operators and Operators
Introduction to Jupyter Notebook Precedence
environment. Basics Jupyter Getting input from
notebook Commands. User,Comments,Multi line
Programming language basics. Comments.

3. Strings, Decisions And Loop 4. Python Data Types


Control List,Tuples,Dictionaries 
Working With Numbers, Booleans Python Lists,Tuples,Dictionaries
and Strings,String types and formatting, Accessing Values,Basic Operations
String operations Indexing, Slicing, and Matrixes
Simple if Statement, if-else Statement Built-in Functions & Methods
if-elif Statement. Exercises on List,Tuples And Dictionary
Introduction to while Loops. Class hands-on :
Introduction to for Loops,Using Program to convert tuple to dictionary
continue and break. Remove Duplicate from Lists
Class hands-on : Python program to reverse a tuple
Program to add all elements in list.
6 programs/coding exercise on string,
+ 3 more programs to be covered in class
loop and conditions in classroom

5. Functions And Modules 6. File I/O And Exceptional Handling


Introduction To Functions – Why and Regular Expression
Defining Functions Opening and Closing Files
Calling Functions open Function,file Object Attributes
Functions With Multiple Arguments. close() Method ,Read,write,seek.
Anonymous Functions - Lambda Exception Handling, try-finally Clause
Using Built-In Modules,User-Defined Raising an Exceptions,User-Defined
Modules,Module Namespaces, Exceptions
Iterators And Generators Regular Expression- Search and Replace
Class hands-on : Regular Expression Modifiers
8+ Programs to be covered in class from Regular Expression Patterns,re module
functions, Lambda, modules, Generators Class hands-on :
and Packages. 10+ Programs to be covered in class from File
IO,Reg-ex and exception handling.

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 1
MODULE 1 : PYTHON FOR DATA SCIENCE | 32 hours
Python

7. Data Analysis Using Numpy And 8. Data Visualisation using Python:


Pandas Matplotlib and Seaborn
Introduction to Numpy. Array Matplotlib:
Creation,Printing Arrays, Basic Operation - Introduction,plot(),Controlling Line
Indexing,Slicing and Iterating, Shape Properties,Subplot with Functional
Manipulation - Changing shape,stacking Method, MUltiple Plot, Working with
and spliting of array Multiple Figures,Histograms
Vector stacking, Broadcasting with Numpy, Seaborn :
Numpy for Statistical Operation. Intro to Seaborn And Visualizing
Pandas : Introduction to Pandas statistical relationships , Import and
Importing data into Python Prepare data .Plotting with categorical
Pandas Data Frames,Indexing Data Frames data and Visualizing linear
,Basic Operations With Data relationships
frame,Renaming Columns,Subletting and Seaborn Exercise
filtering a data frame.

Real time Use cases in Python to be Covered in Class

3 Case Study on Numpy, Pandas , Matplotlib


1 Case Study on Pandas And Seaborn
Assessment Test in Python : 2 hour of Assesment Test in Python ( Coding & Objective Questions )

Assignment 1 (Week 1):


10 Coding exercises on Python Basics - Variables, Operators, Strings, Loops
Assignment 2 (Week 2):
10 Python Programs and practice set on List,Tuples ,Dictionaries & matrices operations

Assignment 3 (Week 3):


10 Coding exercises on Functions, File And Regular Expression

Assignment 4 (Week 4):


15 Programs and Practice set Questions on Numpy and Pandas

Assignment 5 (Week 5):


2 Case Studies using Numpy Pandas and Matplotlib.

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 2
MODULE 2 : STATISTICS FOR DATA SCIENCE | 24 hours
Stats & ML

1.  Fundamentals of Math and 2. Descriptive Statistics


Probability  Describe or sumarise a set of data
Basic understanding of linear algebra, Measure of central tendency and
Matrics, vectors measure of dispersion.
Addition and Multimplication of matrics The mean,median,mode, curtosis and
Fundamentals of Probability skewness
Probability distributed function and Computing Standard deviation and
cumulative distributed function. Variance.
Class Hand-on Types of distribution.
Problem solving using R for vector Class Handson:
manupulation 5 Point summary BoxPlot
Problem solving for probability Histogram and Bar Chart
assignments Exploratory analytics R Methods

3. Inferential Statistics
conti..
What is inferential statistics
Type-l error and Type-ll errors
Different types of Sampling techniques
P-Value and Z-Score Method
Central Limit Theorem
T-Test, Analysis of variance(ANOVA)
Point estimate and Interval estimate
and Analysis of Co variance(ANCOVA)
Creating confidence interval for
Regression analysis in ANOVA
population parameter
Class Hands-on:
Characteristics of Z-distribution and T-
Problem solving for C.L.T
Distribution
Problem solving Hypothesis Testing
Basics of Hypothesis Testing
Problem solving for T-test, Z-score
Type of test and rejection region
test
Type of errors in Hypothesis resting,
Case study and model run for ANOVA,
conti..
ANCOVA

4. Hypothesis Testing 5. Data Processing & Exploratory


Hypothesis Testing Data Analysis
Basics of Hypothesis Testing Introduction to Data Cleaning
Type of test and Rejection Region Data Pre-processing
Type o errors-Type 1 Errors,Type 2 What is Data Wrangling?
Errors How to Restructure the data?
P value method,Z score Method. What is Data Integration?
The Chi-Square Test of Independence Data Transformation
Regression EDA : Finding and Dealing with Missing
Factorial Analysis of Variance Values.What are Outliers? Using Z-
Pearson Correlation Coefficients in Depth scores to Find Outliers. Introduction to
Statistical Significance, Effect Size, and Bivariate Analysis,Scatter Plots and
Confidence Intervals Heatmaps. Introduction to Multivariate
Analysis

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 2
MODULE 3 : MACHINE LEARNING ALGORITHMS | 48 hours  Stats & ML

Introduction To Machine Learning 1. Supervised Learning


What is Machine Learning? Support Vector Machines
Introduction to Supervised and Linear regression
Unsupervised Learning Logistic regression
Introduction to SKLEARN Naive Bayes
(Classification, Regression, Linear discriminant analysis
Clustering, Dimensionality Decision tree
reduction, Model selection, k-nearest neighbor algorithm
Preprocessing) Neural Networks (Multilayer
What is Reinforcement Learning? perceptron)
Machine Learning applications Similarity learning
Difference between Machine
Learning and Deep Learning

2. Linear Regression 3. Logistic Regression


Introduction to Linear Regression Introduction to Logistic Regression.– Why
Linear Regression with Multiple Logistic Regression .
Variables Introduce the notion of classification
Disadvantage of Linear Models Cost function for logistic regression
Interpretation of Model Outputs Application of logistic regression to
Understanding Covariance and multi-class classification.
Colinearity Confusion Matrix, Odd's Ratio And ROC
Understanding Heteroscedasticity Curve
Advantages And Disadvantages of
Case Study – Application of
Logistic Regression.
Linear Regression for Housing
Case Study:To classify an email as spam
Price Prediction
or not spam using logistic Regression.

4. Decision Trees
Case Study:
Decision Tree – data set
1 Business Case Study for Kart
How to build decision tree?
Model
Understanding Kart Model
2 Business Case Study for  Random
Classification Rules- Overfitting
Forest
Problem
3 Business Case Study for  SVM
Stopping Criteria And Pruning
How to Find final size of Trees?
Model A decision Tree.
Naive Bayes
Random Forests and Support Vector
Machines
Interpretation of Model Outputs

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 2
MODULE 3 : MACHINE LEARNING ALGORITHMS | 48 hours
Stats & ML

5. Unsupervised Learning 6. Natural language Processing


Hierarchical Clustering Introduction to natural Language
k-Means algorithm for clustering – Processing(NLP).
groupings of unlabeled data points. Word Frequency Algorithms for NLP
Principal Component Analysis(PCA)- Sentiment Analysis
Data  Case Study :
Independent components analysis(ICA) Twitter data analysis using NLP
Anomaly Detection
Recommender System-collaborative
filtering algorithm
Case Study– Recommendation Engine
for e-commerce/retail chain

7. Introduction to Time Series 8. ARIMA and Multivariate Time


Forecasting Series Analysis
Basics of Time Series Analysis and Introduction to ARIMA Models,ARIMA
Forecasting ,Method Selection in Model Calculations,Manual ARIMA
Forecasting Parameter Selection,ARIMA with
Moving Average (MA) Forecast Explanatory Variables
Example,Different Components of Understanding Multivariate Time
Time Series Data ,Log Based Series and Their Structure,Checking
Differencing, Linear Regression For for Stationarity and Differencing the
Detrending MTS 
Case Study : Performing Time Series
Analysis on Stock Prices 

Important Note :
All  Machine Learning Algorithms are covered in depth with Real time case studies for each Algorithm 
Once 60% of ML is completed , Capstone Project will be released for the batch.

Assignments :
Statistics Assignments : Total 4 practice set and Assignments from Statistics
Machine Learning Assignments : Total 3 Practice Set And 2 Real time use case as Assignments

Assessment Test For Term2 :


Duration : 3 hours
Question Type : Objective & ML Case Studies

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 3
MODULE 4 : TENSORFLOW & DEEP LEARNING | 16 hours Deep Learning
& NLP

1. Introduction to Deep Learning 2. Introduction to Tensor Flow


And Tensor Flow Installing TensorFlow
Neural Network Simple Computation ,Contants And
Understaing Neural Network Model Variables
Installing TensorFlow Types of file formats in TensorFlow
Simple Computation ,Contants And Creatting A Graph - Graph
Variables Visualization
Types of file formats in TensorFlow Creating a Model - Logistic
Creatting A Graph – Graph Regression
Visualization Model Building
Creating a Model  – Logistic Regression TensorFlow Classification Examples
Model Building using tensor flow
TensorFlow Classification Examples

3.. Understanding Neural 4. Convolutional Neural


Networks With Tensor Flow Network(CNN)
Basic Neural Network Convolutional Layer Motivation
Single Hidden Layer Model Convolutional Layer Application
Multiple Hidden Layer Model Architecture of a CNN
Backpropagation – Learning Pooling Layer Application
Algorithm Deep CNN
and visual representation Understanding and Visualizing a
Understand Backpropagation – Using CNN
Neural
Network Example Project : Building a CNN for Image
TensorBoard Classification
Project on backpropagation

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 3
MODULE 5 : NATURAL LANGUAGE PROCESSING | 20 hours
Deep Learning
& NLP

Information Extraction
Machine Translation Information Retrieval

NLP
Sentiment Analysis Question Answering

1. Introduction to NLP & Text 2. Text Pre Processing Techniques


Analytics Need of Pre-Processing
Introduction to Text Analytics Various methods to Process the Text
Introduction to NLP data
What is Natural Language Processing? Tokenization ,Challenges in
What Can Developers Use NLP Tokenization
Algorithms For? Stopping ,Stop Word Removal
NLP Libraries Stemming - Errors in Stemming
Need of Textual Analytics Types of Stemming Algorithms -
Applications of Natural Language Table
Procession lookup Approach ,N-Gram Stemmers
Word Frequency Algorithms for NLP
Sentiment Analysis

3. Distance Algorithms used in Text 4. Information Retrieval Systems


Analytics Information Retrieval -
String Similarity Precision,Recall,F- score
Cosine Similarity Mechanishm - TF-IDF
Similarity KNN for document retrieval
between Two text documents K-Means for document retrieval
Levenshtein distance - measuring the Clustering for document retrieval
difference between two sequences
5. Projects And Case Studies
Applications of Levenshtein distance
a. Sentiment analysis for twitter, web
LCS(Longest Common Sequence )
articles
Problems
b. Movie Review Prediction
and solutions ,LCS Algorithms
c. Summarization of Restaurant
Reviews

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 4
MODULE 6 : SQL & MONGODB | 14 hours Value Added
Skillset

1. RDBMS And SQL Operations : 2. NoSQL Databases : 


Introduction To RDBMS  Topics - What is HBase?
Single Table Queries - HBase Architecture, HBase
SELECT,WHERE,ORDER Components,
BY,Distinct,And ,OR  Storage Model of HBase,
Multiple Table Queries:  INNER, SELF, HBase vs RDBMS
CROSS, and OUTER, Join, Left Join, Introduction to Mongo DB, CRUD
Right Join, Full Join, Union  Advantages of MongoDB over
Advance SQL Operations: RDBMS
Data Aggregations  and summarizing Use cases
the data
Ranking Functions: Top-N Analysis
Advanced SQL Queries for Analytics

3. Programming with SQL :  4. MongoDB Overview :


Mathematical Functions Where MongoDB is used?
Variables MongoDB Structures
Conditional Logic
MongoDB Shell vs MongoDB Server
Loops
Data Formats in MongoDB
Custom Functions
Grouping and Ordering MongoDB Aggregation Framework
Partitioning Aggregating Documents
Filtering Data What are MongoDB Drivers?
Subqueries

5. Basics and CRUD Operation : 6. Introduction to MongoDB :


Databases, Collection & Documents What is MongoDB?
Shell & MongoDB drivers Charateristics and Features
What is JSON Data MongoDB Ecosystem
Create, Read, Update, Delete Installation process
Finding, Deleting, Updating, Connecting to MongoDB database
Inserting Elements Introduction to NoSQL
Working with Arrays Introduction of MongoDB module
Understanding Schemas and What are ObjectIds in MongoDb
Relations

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 4
MODULE 7 : TABLEAU AND  POWER BI | 16 hours Value Added
Skillset

1. Introduction to Tableau : 2. Visual Analytics :


Connecting to data source Getting Started With Visual Analytics
Creating dashboard pages Sorting and grouping
How to create calculated columns Working with sets, set action
Different charts Filters: Ways to filter, Interactive Filters
Hands-on : Forecasting and Clustering
Hands on on connecting data source Hands-on :
and data cleansing Hands on deployment of Predictive
Hands on various charts model in visualization

3. Dashboard and Stories :


4. Mapping :
Working in Views with Dashboards
Coordinate points
and Stories
Plotting Latitude and Longitude
Working with Sheets
Custom Geocoding
Fitting Sheets
Polygon Maps
Legends and Quick Filters
WMS and Background Image
Tiled and Floating Layout
Floating Objects

5. Getting Started With Power BI : 6. Programming with Power BI : 


Installing Power BI Desktop and Working with Timeseries
Connecting to Data Understanding aggregation and
Overview of the Workflow in Power BI granularity
Desktop Filters and Slicers in Power BI
Introducing the Different Views of the Maps, Scatterplots and BI Reports
Data Mode Connecting Dataset with Power BI
Query Editor Interface Creating a Customer Segmentation
Working on Data Model Dashboard
Analyzing the Customer Segmentation
Dashboard

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 4
MODULE 8 : BIG DATA AND SPARK ANALYTICS | 12 hours Value Added
Skillset

1. Introduction To Hadoop :  2. Apache  Spark Analytics : 


Distributed Architecture - A Brief What is Spark
Overview Introduction to Spark RDD
Understanding Big Data Introduction to Spark SQL and
Introduction To Hadoop ,Hadoop Dataframes
Architecture Using R-Spark for machine learning
HDFS ,Overview of MapReduce Hands-on:
Framework installation and configuration of Spark
 Hadoop Master – Slave Architecture
MapReduce Architecture Using R-Spark for machine learning
Use cases of MapReduce programming

3. Apache  Spark Analytics :  Hands-on:


Getting to know PySpark Map reduce Use Case 1 : Youtube data
Pyspark Introduction analysis
Pyspark Environment Setup Map reduce Use Case 2:   Uber Data
pySpark - Spark context Analytics
RDD , Broadcast and
Accumulator Hands-on:
Sparkconf and Sparkfiles Spark RDD programming
Spark MLlib Overview Hands-on:
,Algorithms and utilities in Spark Spark SQL and Dataframe
Mlib programming

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co


Term 4
MODULE 9 : R PROGRAMMING | 12 hours Value Added
Skillset

1. Introduction To R :  2. Programming with R :


Installation Setup Creating an object
Quick guide to RStudio User Interface Data types in R
RStudio's GUI3 Coercion rules in R
Changing the appearance in RStudio Functions and arguments
Installing packages in R and using the Matrices
library Data Frame
Development Environment Overview Data Inputs and Outputs with R
Introduction to R basics Vectors and Vector operation
Building blocks of R Advanced Visualization
Core programming principles Using the script vs. using the
Fundamentals of R console

3. Manipulating Data : 4. Visualizing Data :


Data transformation with R - the Intro to data visualization
Dplyr package - Part Introduction to ggplot2
Data transformation with R - the Building a histogram with ggplot2
Dplyr package - Part Building a bar chart with ggplot2
Sampling data with the Dplyr Building a box and whiskers plot
package with ggplot2
Using the pipe operator in R Building a scatterplot with
Tidying data in R - gather() and ggplot2
separate()
Tidying data in R - unite() and
spread()

MODULE 10 : TRAINING AND DEPLOYING MACHINE LEARNING


MODEL USING GCP | 8 hours

1. Introduction To GCP Cloud ML 2. Training Machine Learning


Engine : Model :
Introduction to Google CloudML Developing a training application
Engine Packaging a training application
CloudML Engine in Machine Learning Running and monitoring a training
WorkFlow job
Components of Cloud ML Engine - Using hyperparameter tuning
Google Cloud Platform Console. Using GPUs for training models in
gcloud command-line tool and Rest the cloud
API

click to Whatsapp @learnvista pvt. ltd. www.learnbay.co

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy