0% found this document useful (0 votes)
2 views5 pages

Nit ML Sugg

The document covers key concepts in machine learning including eigenvalues and eigenvectors, decision boundaries, and various regression techniques. It discusses methods for clustering, handling outliers, and the importance of regularization to prevent overfitting. Additionally, it highlights the differences between bagging and boosting, and introduces reinforcement learning and its applications.

Uploaded by

stxraptor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Nit ML Sugg

The document covers key concepts in machine learning including eigenvalues and eigenvectors, decision boundaries, and various regression techniques. It discusses methods for clustering, handling outliers, and the importance of regularization to prevent overfitting. Additionally, it highlights the differences between bagging and boosting, and introduces reinforcement learning and its applications.

Uploaded by

stxraptor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NIT ML SUGG (SUVECCHHA PAUL)

1. Eigenvalues and Eigenvectors (in Machine Learning)


 Eigenvectors represent the direction in which data varies the most.
 Eigenvalues show how much the data varies in those directions.

 In PCA (Principal Component Analysis), eigenvectors are used to find new axes, and
eigenvalues tell us how much information (variance) is captured.
 Why important: They help reduce data dimensions and focus on the most useful
features.
 Keywords: direction, variance, PCA, dimensionality reduction, important features.

2. Decision Boundaries (in Classification)


 A decision boundary is a line or surface that separates different classes.
 Formed by models like SVM, Logistic Regression, or Perceptron.

 For example, it separates "spam" from "not spam" based on features.


 Helps model decide class of new data points.

 Keywords: boundary, separation, classifier, line, margin.


3. Logistic vs Linear Regression

 Linear Regression is used for predicting numbers (e.g., salary, temperature).


 Logistic Regression is used for predicting categories (e.g., yes/no, pass/fail).

 Logistic regression uses a sigmoid function to give results between 0 and 1 (as
probabilities).
 When to use:

o Linear: For continuous values.


o Logistic: For classification problems.
 Keywords: classification, regression, sigmoid, prediction, probability.

4. Principal Component Analysis (PCA)


 A technique to reduce number of features while keeping important information.

 It finds new axes (principal components) using eigenvalues and eigenvectors.


 Reduces complexity, increases speed, and avoids overfitting.

 Used in image processing, text data, etc.


 Keywords: feature reduction, dimensionality reduction, variance, eigenvalues,
eigenvectors.
5. Perceptron Update Rule

 A learning rule to update model weights when the prediction is wrong.

 Formula: w = w + learning_rate × (true - predicted) × input


 It helps to draw the best possible decision boundary.

 Repeats until it finds a line that separates classes correctly.


 Keywords: weights, learning, prediction, update, decision boundary.

6. Measuring Cluster Quality


 Silhouette Score: Measures how well a point fits in its own cluster vs others.

 Inertia: Sum of squared distances to cluster center (less is better).


 Dunn Index, Davies-Bouldin Index are other measures.
 Good clustering = similar points together, far from other clusters.

 Keywords: cluster quality, distance, similarity, evaluation.

7. KNN Advantages & Disadvantages

 ✅ Simple, easy to implement.

 ✅ No training time.

 ❌ Slow with large datasets.

 ❌ Affected by irrelevant features and noise.

 Works well with properly scaled and cleaned data.

 Keywords: neighbors, voting, distance, simplicity, sensitive.


8. Maximum Margin Classification (SVM)

 SVM tries to draw a line (or plane) that maximizes the gap (margin) between
classes.

 More margin = better generalization and accuracy.


 Support Vectors: Data points nearest to the margin.
 Helps model work well even on new unseen data.

 Keywords: SVM, margin, support vectors, separation.


9. KNN (Working)
 For classification: Looks at 'K' nearest neighbors and predicts based on majority
class.
 For regression: Takes average value of 'K' nearest neighbors.

 Depends on a good distance measure (e.g., Euclidean).

 No training needed, just memory of data.


 Keywords: K neighbors, vote, average, distance, prediction.

10. Bagging vs Boosting


 Bagging (Bootstrap Aggregating):

o Builds many independent models and averages them (like Random Forest).
o Reduces variance.

 Boosting:
o Builds models one by one, each correcting previous errors (like AdaBoost).
o Reduces bias.

 Keywords: ensemble, bagging = parallel, boosting = sequential, random forest,


AdaBoost.

11. Handling Outliers & Missing Values

 Outliers:
o Can be removed or replaced (with mean/median).

o Use Z-score or IQR to detect them.


 Missing values:

o Fill with mean/median/mode.


o Or remove rows/columns with too many missing values.

 Keywords: clean data, missing data, imputation, outlier removal.

12. Logistic Regression Applications


 Predicts probability of class belonging.

 Used for binary classification: disease/no disease, spam/not spam.


 Output is sigmoid shaped (0 to 1).

 Better than linear regression for classification tasks.


 Keywords: probability, classification, sigmoid, logistic model.

13. DBSCAN Clustering


 Density-Based Spatial Clustering of Applications with Noise.
 Groups data based on density (points close together form a cluster).

 Detects outliers naturally.

 Better than K-Means for uneven shapes and noisy data.

 Keywords: density, clustering, core points, noise, outliers.

14. Hierarchical Divisive Clustering


 Start with all data as one cluster.
 Divide recursively until small groups are formed.

 Builds a tree-like diagram (dendrogram).


 Opposite of Agglomerative (bottom-up) clustering.

 Keywords: top-down, split, tree, hierarchical clustering.


15. Spam Detection (Supervised Learning)

 Collect labeled data: spam or not spam.


 Extract features like keywords, links, sender info.

 Use classification models like Logistic Regression, Naive Bayes.

 Train the model with past data and test on new emails.
 Keywords: supervised learning, labels, features, classification.

16. Reinforcement Learning

 Learning by trial and error.

 Agent interacts with environment, gets rewards or penalties.


 Learns which actions give best long-term rewards.
 Used in games (e.g., Chess), robots, self-driving cars.

 Keywords: agent, action, reward, environment, learning loop.


17. Overfitting and Underfitting

 Overfitting: Model memorizes training data, fails on new data.


 Underfitting: Model too simple, can't learn enough from data.

 Solutions:

o Use simpler or more complex model.

o More training data.


o Use regularization.
 Keywords: generalization, performance, model fit, complexity.

18. Regularization in Regression

 Adds a penalty for large coefficients to reduce overfitting.

 Two types:

o L1 (Lasso): Some weights become zero (feature selection).


o L2 (Ridge): Shrinks all weights (no zero).
 Helps model stay simple and generalize better.

 Keywords: penalty, L1, L2, overfitting, shrink weights.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy