Edyoda: Data Scientist Program

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

EdYoda

Data Scientist Program

Program Curriculum

www.edyoda.com hello@edyoda.com
Learning outcomes:

 Build strong understanding of programing using Python

 Learn to analyze data using Power BI

 Build strong understanding of data wrangling and machine learning

 Learn to build machine learning models using scikit-learn

Python Programming
1. Introduction to Python
 Useful Python Resources
 Python Tools and Utilities
 Python Features

2. Python Environment
 Local Environment Setup
 Downloads and Installations
 Setting up Environment Path

3. Executing Python
 Interactive Mode
 Scripting Mode
 Integrated Development Environment

4. Python Basic Syntax


 Python Identifiers
 Reserved Words
 Lines and Indentation

5. Python Variable Types


 Assigning Values to Variables
 Multiple Assignment
 Standard Data Types
 Data Type Conversion

6. Python Basic Operators


 Arithmetic Operators
 Comparison Operators
 Assignment Operators

www.edyoda.com hello@edyoda.com
 Bitwise Operators
 Logical Operators
 Membership Operators
 Identity Operators
 Operators Precedence

7. Python Decision Making


 IF statements
 IF...ELIF...ELSE Statements
 Nested IF statements

8. Python Loops
 While loop
 For loop
 Nested loop
 Break control statement
 Continue statement
 Pass statement

9. Python Numbers
 Number type conversion
 Mathematical function
 Random number function
 Trigonometric function

10. Python Strings


 String special operators
 String formatting operator
 Built-in string methods

11. Python Lists


 Basic list operations
 Indexing and slicing
 Built-in functions and methods

12. Python Tuples


 Basic tuple operations
 Indexing and slicing
 Built-in functions

13. Python Dictionary

www.edyoda.com hello@edyoda.com
 Basic Dictionary operations
 Built-in Functions and Methods
 Use cases

14. Python Functions


 Pass by reference and value
 Function Arguments
 Scope of variables
 Default Argument Values
 Keyword Arguments
 Arbitrary Argument Lists
 Unpacking Argument Lists
 Lambda Expressions
 Documentation Strings

15. Python Modules


 Importing Modules
 Namespaces and scoping
 Packages

16. Python Files I/O


 Writing and Parsing Text Files
 Parsing Text Using Regular Expressions
 Writing and Parsing XML Files
 Writing and Parsing JSON Files
 Writing and Parsing CSV Files

17. Python Exceptions


 The except clause with multiple exceptions
 The try-finally clause
 Argument of an Exception
 Raising an exception
 User-Defined Exceptions

18. Python Classes and Objects


 Creating Classes
 Creating instance objects
 Destroying Objects (Garbage Collection)
 Custom Classes
 Attributes and Methods
 Inheritance and Polymorphism
 Using Properties to Control Attribute Access

www.edyoda.com hello@edyoda.com
19. Functional Programming
 Lambda
 Filter
 Map
 Functools

20. Iterators and Generators


 Itertools
 Generators
 Decorators

21. Collections
 Deque
 Counter
 OrderedDict
 ChainMap

23. Debugging, Testing


 Pdb
 breakpoints

24. Regular Expressions


 Characters and Character Classes
 Quantifiers
 Grouping and Capturing
 Assertions and Flags
 The Regular Expression Module

25. Deploying Python Applications


 Pip
 Virtualenv
 The init.py files
 The setup.py file
 Installing the package
 Software deployment in Python

www.edyoda.com hello@edyoda.com
Data Analysis

1. Data Quality
 Introduction to Data Quality
 Handling different Data Quality Issues

2. Phases of Data Analysis


 Understanding different phases of a typical Data Analytics Project

3. Understanding of Data
 Intro to types of data
 Derived Facts/Dimensions
 Building dimensions from Facts (Binning)
 Granularity of Data

4. Understanding Data Operations


 Select and Filter
 Simple vs Complex
 Sort
 Group and Aggregate
 Merge
 Pivot
 Unpivot
 Windowing

5. Data Modeling
 Understanding: Unique Keys, Key References, Cardinality, ER Diagram
 Introduction to Data Quality
 The Six Dimensions of Data Quality

6. Excel Refresher
 Frequently used Excel Functions
 Useful Shortcuts for Faster Excel Analysis
 Tables in Excel
 Data Formatting in Excel
 Visualization with Excel

7. Power Query Essentials


 Data Ingestion in PowerQuery
 Data Quality Checks
 Text Processing

www.edyoda.com hello@edyoda.com
 Data Transformations in PowerQuery

8. Power BI Essentials
 Overview of Power BI Tools
 Handling Data Types and Formats
 Handling Special Data Category
 Creating Hierarchical Dimensions
 KPI Cards
 Bar Charts / Column Charts
 Filters (Simple vs Complex)
 Slicers
 Formatting & Aesthetics
 Publishing and Sharing your Dashboard
 Exploring different Chart Options
 Understanding Important Terms in a Given Visual
 Pivot/Matrix Tables
 Creating Drilldown Reports
 Introduction to DAX
 Commonly used DAX Functions
 Applications of DAX Concepts
 Exploring different types of visuals
 Publishing Modified Dashboard

9. Probability Theory
 Types of Events
 Idea of a Random Events
 Understanding via Example Datasets
 Discrete vs Continous Random Variables
 Nominal, Ordinal, Ratio/Interval Data
 Basic Probability Theory
 Idea of MECE events
 Idea of Conditional events / Independent Events
 Idea of Bayes Theorem

10. Descriptive Statistics


 Different Types of Distributions
 Understanding the Normal distribution
 Parameters defining a Normal distribution
 What is a standard normal distribution?
 The Central limit theorem
 The techniques of data summarization in Statistics
 Measures of central tendencies for univariate data

www.edyoda.com hello@edyoda.com
 Mean, Median, Mode, Variance, Co-variance, Standard Deviation etc.
 Skewness & Kurtosis of a distribution
 Meaning of left, right skewed data

11. Visualizing univariate data


 Histograms, Box-and-whiskers plot, Violin plots, Frequency distributions
 Bi-variate analysis
 Visualizing bi-variate data

12. Inferential Statistics


 Sampling - Why & How
 Understanding confidence interval and p-value
 Null & Alternate Hypothesis
 Tests of Significance
 ANOVA
 Chi-Square Test
 The Bayes Theorem
 Decision Tree - Why & How in Excel
 Multi-variate Analysis
 Applying Concepts of Stats in Regression analysis
 One-tailed vs 2-tailed tests
 Understanding R-Squared
 A/B Testing

www.edyoda.com hello@edyoda.com
Data Wrangling

1. Black Box Introduction to Machine Learning

 What is not Machine Learning


 What is Machine Learning
 Types of ML - Supervised, Unsupervised
 Supervised - Classification, Regression
 Unsupervised - Clustering, Association
 Machine Learning Pipeline

2. Essential NumPy

 Introduction to NumPy
 Creation
 Access
 Stacking and Splitting
 Methods
 Broadcasting

3. Pandas for Machine Learning

 Introduction to Pandas
 Understanding Series & DataFrames
 Loading CSV,JSON
 Connecting databases
 Descriptive Statistics
 Accessing subsets of data - Rows, Columns, Filters
 Handling Missing Data
 Dropping rows & columns
 Handling Duplicates
 Function Application - map, apply, groupby, rolling, str
 Merge, Join & Concatenate
 Stacking, Unstacking & Melting
 Pivot-tables

www.edyoda.com hello@edyoda.com
 Normalizing JSON
 Application - EDA on Employee data, sales data

4. Understanding Visualization:

 Introduction to matplotlib & seaborn


 Basic Plotting
 Title, Labels, Legends, Grid, colormap, xticks, yticks
 Color, linewidth
 Sub Plotting
 Scatter plot
 Histogram
 Bar Graphs
 Plotting distributions
 Plotting 3D data
 Fundamentals of Tableau

Machine Learning

1. Linear Models for Classification & Regression

 Simple Linear Regression using Ordinary Least Squares


 Gradient Descent Algorithm
 Regularized Regression Methods - Ridge, Lasso, Elastic Net
 Logistic Regression for Classification
 OnLine Learning Methods - Stochastic Gradient Descent & Passive Aggressive
 Robust Regression - Dealing with outliers & Model errors
 Polynomial Regression
 Bias-Variance Tradeoff
 Application - House Price, Cancer Prediction, Insurance Prediction

2. Preprocessing for Machine Learning

 Introduction to Preprocessing

www.edyoda.com hello@edyoda.com
 StandardScaler
 MinMaxScaler
 RobustScaler
 Normalization
 Binarization
 Encoding Categorical (Ordinal & Nominal) Features
 Imputation
 Polynomial Features
 Custom Transformer
 Text Processing
 CountVectorizer
 TfIdf
 HashingVectorizer
 Image using skimage

3. Decision Trees

 Introduction to Decision Trees


 The Decision Tree Algorithms
 Decision Tree for Classification
 Decision Tree for Regression
 Advantages & Limitations of Decision Trees
 Application - Cloth Prediction

4. Naive Bayes

 Introduction Bayes' Theorem


 Naive Bayes Classifier
 Gaussian Naive Bayes
 Multinomial Naive Bayes
 Bernoulli’s Naive Bayes
 Naive Bayes for out-of-core
 Application - Text Classification, Sentiment Analysis and Spam & Non-spam
classification

www.edyoda.com hello@edyoda.com
5. Composite Estimators using Pipelines & FeatureUnions

 Introduction to Composite Estimators


 Pipelines
 Transformed Target Regressor
 FeatureUnions
 ColumnTransformer
 GridSearch on pipeline
 Application - Author classification

6. Model Selection & Evaluation

 Cross Validation
 Hyperparameter Tuning
 Model Evaluation
 Model Persistence
 Validation Curves
 Learning Curves
7. Feature Selection & Dimensionality Reduction

 Introduction to Feature Selection


 Variance Threshold
 Chi-squared stats
 ANOVA using f_classif
 Univariate Linear Regression Tests using f_regression
 F-score vs Mutual Information
 Mutual Information for discrete value
 Mutual Information for continues value
 SelectKBest
 SelectPercentile
 SelectFromModel
 Recursive Feature Elimination
 PCA
 SVD

www.edyoda.com hello@edyoda.com
 Application - Credit Risk Prediction

8. Nearest Neighbors

 Fundamentals of Nearest Neighbor Algorithm


 Unsupervised Nearest Neighbors
 Nearest Neighbors for Classification
 Nearest Neighbors for Regression
 Nearest Centroid Classifier
 Application - Nearest neighbour for face inpainting

9. Clustering Techniques

 Introduction to Unsupervised Learning


 Clustering
 Similarity or Distance Calculation
 Clustering as an Optimization Function
 Types of Clustering Methods
 Partitioning Clustering - KMeans & Meanshift
 Hierarchical Clustering - Agglomerative
 Density Based Clustering - DBSCAN
 Measuring Performance of Clusters
 Comparing all clustering methods
 Application - Grouping similar customers

10. Anomaly Detection

 What are Outliers ?


 Statistical Methods for Univariate Data
 Using Gaussian Mixture Models
 Fitting an elliptic envelope
 Isolation Forest
 Local Outlier Factor
 Using clustering method like DBSCAN
 Application - Anomaly detection for credit risk prediction

www.edyoda.com hello@edyoda.com
11. Support Vector Machines

 Introduction to Support Vector Machines


 Maximal Margin Classifier
 Soft Margin Classifier
 SVM Algorithm for Classification
 SVM for Regression
 Hyper-parameters in SVM
 Application - Face recognition and breast cancer classification

12. Dealing with Imbalanced Classes

 What are imbalanced classes & their impact?


 OverSampling
 UnderSampling
 Connecting Sampler to pipelines
 Making classification algorithm aware of Imbalance
 Anomaly Detection
 Application - Fraud detection
13. Ensemble Methods

 Introduction to Ensemble Methods


 RandomForest
 AdaBoost
 Gradient Boosting Tree
 VotingClassifier
 XGBoost
 Application - Malicious data detection

14. Recommendation Engine

 Understanding distance vector calculation - cosine, euclidean, manhattan


 Types of Recommendation Engines
 Recommendation based on similarity
 Application - Grouping videos based on description, user rating prediction

www.edyoda.com hello@edyoda.com
15. Time Series Modeling

 Simple Average & Moving Average


 Single Exponential Smoothing
 Holt’s linear trend method
 Holt’s winter seasonal method
 ARIMA
16. Packaging & Deployment

 Creating Python Package


 Deploy trained model behind REST interface
 Deploy model behind API call
 Deploy on AWS cloud (optional)

Mindset for Problem Solving

1. Mathematical Aptitude
 Percentages
 Profit and Loss
 Simple Interest and Compound Interest
 Work And Time
 Probability
 Permutation and Combination
 Profit and Loss
 Time & Speed
 Ratios and Proportions
 Data Interpretation

2. Art of Learning Anything


 What is Intelligence
 Relation of success with intelligence
 Illusion of Learning
 Focussed Mode vs Diffused Mode

www.edyoda.com hello@edyoda.com
 Procrastination
 Improving Recall
 Creating Brain Links
 Visual memory & Data Memory
 Slow Thinking

3. Computational Thinking
 Thinking before Doing/Coding
 Problem Identification
 Decomposition
 Pattern Recognition
 Abstraction
 Algorithm Design
 Computational Thinking Use Case 1
 Computational Thinking Use Case 2

4. Technical Puzzles
 Why are Puzzles part of interviews?
 The Art of solving puzzles
 Approach more important than the solution
 Puzzles for Vertical Thinking
 Puzzles for Horizontal Thinking

Productivity and Decision Making

1. Art of being Super Productive


 Start with Why to make objectives clear
 Thinking Limitless
 The magic of computing returns
 Deciding what to work on
 Time Management Skills
 Measuring what matters

www.edyoda.com hello@edyoda.com
 Choosing wisely habits to inculcate

2. Effective Decision Making


 Why is decision making a key skill?
 Components of Decision Making
 Understanding common biases
 Letting emotions not clutter decision making
 Difference between quick decision making & slow decision making

Professional Communication

1. Reading comprehension & Short writing


 Building vocabulary
 Extracting insights from the textual information
 Drawing inferences from multiple stories
 Writing you inferences for others to understand

2. Book Reading & Writing Reviews
 Reading 10 books during the entire course & writing book reviews
 2 Biographies
 2 Fictions
 6 Non-Fictions

3. Effective Understanding & Articulation


 Watching 20 movies from our suggested list
 Writing 1000 words essay on those movies
 Writing a summary of the movies

4. Group Discussion for decision making

www.edyoda.com hello@edyoda.com
 Understanding why GD is so important in personal & professional life
 The objective of GD - Collectively making the right decision
 5 GD on various topics

5. Writing Professional chat/E-mail


 Writing as the most common method of professional communication
 Factors to keep in mind before starting to write
 Points to consider while writing
 Activities after writing
 Difference between chat writing & email writing

6. Making Impressive Presentation


 Why making a presentation is a professional job
 The objective of the presentation
 Attributes of good presentation
 Why research is key to the presentation
 Making a presentation interactive
 Doing 10 video/live presentation

Computer Fundamentals

1. Operating System Concepts


 Operating System Architecture
 Processes and Process Management
 Threads and Concurrency control
 Scheduling
 Memory Management
 Inter-Process Communication
 Synchronization Constructs
 I/O Management
 Resource Virtualization

www.edyoda.com hello@edyoda.com
 Remote Services
 Distributed Systems
 Introduction to Data Center Technologies

2. Linux Administration
 Introduction to Linux Operating Systems
 Basic Linux Commands
 File Management and Security
 The directory structure of Unix
 User Management
 Groups
 Shell types and basic commands
 Permissions
 sudo
 Systemd Services Start and Stop
 Resource Mgmt with systemctl
 Process Management (top, ps)
 Package Management(yum, apt, rpm)
 Managing disks (lsblk, df, mount, umount,du)
 File systems

3. Data Structures and Algorithms


 Built-in Data Type
o Integers
o Boolean
o Floating
o Character and Strings
 Derived Data Type
o List
o Array
o Stack
o Queue
 Linked List
o Singly Linked List
o Doubly Linked List
o Circular Linked List

www.edyoda.com hello@edyoda.com
 Array
 Stack
 Queue
 Tree
 Basic Operations
o Traversing
o Searching
o Sorting
o Hashing
o Insertion
o Deletion
o Merging
 Searching techniques
o Binary search
o Linear search
 Recursion
 Fibonacci series
 Sorting Algorithm
o Bubble sort
o Insertion sort
o Selection sort
o Quick sort
o Merge sort
o Bucket sort

4. Database concepts
 Introduction to Databases
 Entity Relationship Model
 Relational Model
 Relational Algebra
 Normalization
 Transactions and Concurrency Control
 DBMS Architecture 2-level 3-level
 Data Abstraction and Data Independence
 Database Objects
 Entity-Relationship Model
 Generalization

www.edyoda.com hello@edyoda.com
 Specialization
 Aggregation
 Entity Relationship Diagrams
 Keys in Relational Model
 Candidate key,
 Super key
 Primary key
 Alternate key
 Foreign key
 Strategies for Schema design
 Schema Integration
 Data modelling
 Star Schema in Data Warehouse modelling
 Data Warehouse Modeling

5. Basic SQL - Syntax


 Data Types
 Operators
 Expressions
 Create Database
 Drop Database
 Select Queries
 Create Table
 Drop Table
 Other Table Operations
 Insert Query
 Where Clause
 AND & OR Clauses
 Update operations
 Delete operations
 Order By clause
 Group By Clause
 Sorting operations

www.edyoda.com hello@edyoda.com
 SQL Constraints
 Type of Joins
 Unions Clause
 NULL Values
 Indexing
 Views

6. Software Engineering
 Software Engineering Overview
 Features of Good Software:
o Operational Features
o Transitional Features
o Maintenance Features
 Software Development:
o Requirement Gathering
o Software Design
o Programming
 Software Design
o Design
o Maintenance
o Programming
 Programming:
o Coding
o Testing
o Integration
 Software Development Life Cycle
o Requirement Gathering
o System Analysis
o Software Design
o Coding
o Testing
o Integration
o Deployment
o Operation and Maintenance
 Types of SDLC
o Waterfall model
o Iterative Model
o Spiral model
o V Model

www.edyoda.com hello@edyoda.com
 Agile Concepts
 DevOps Concepts
 Microservices Architecture
 Features of Microservices Architecture
 Software Requirements
 Software Design Basics
 Analysis & Design Tools
o Data Flow Diagram
o Flow Chart
 Design Strategies
o Function-Oriented Design
o Object-Oriented Design
 User Interface Design
o Command Line Interface(CLI)
o Graphical User Interface (GUI)
 Design Complexity
 Software Testing Overview
o Manual Vs Automated Testing
o Testing Approaches
o Black-box testing
o White-box testing
o Unit Testing
o Integration Testing
o Functionality testing
o Acceptance Testing
o Regression Testing
 Quality Control
 Deployment Methods
o Blue-Green Deployment
o Rolling Deployment
 Software Monitoring
 Software Maintenance

7. Tools
 Git
o What is Git?
o Installing Git

www.edyoda.com hello@edyoda.com
o First-Time Git Setup
o Git Basics
o Getting a Git Repository
o Recording Changes to the Repository
o Viewing the Commit History
o Undoing Things
o Working with Remotes
o Tagging
o Git Branching
o Basic Branching and Merging
o Branch Management
o Branching Workflows
o Remote Branches
o Rebasing

 Putty
o Installation
o Types of connections
o Connecting to a remote server
o Using Auth keys
o Customizing putty

 Vim
o Vim Basics
o Insert Mode
o Visual Mode
o Command Mode
o Create and Edit a file
o Search and replace in Vim
o Vim diff
o Copy operations
o .vimrc file
o Vim Commands

www.edyoda.com hello@edyoda.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy