Ml_unit_1
Ml_unit_1
Machine Learning (ML) is a way to teach computers to learn from data instead of giving
them step-by-step instructions.
Imagine you're teaching a kid to tell the difference between cats and dogs. You don’t give them a
big list of rules — you just show them a lot of pictures of cats and dogs, and they start to notice
patterns. For example, cats usually have pointy ears and dogs often have longer snouts.
In the same way, in machine learning, we give a computer a bunch of examples (called data),
and it learns from those examples to make predictions or decisions.
You give a machine learning program a lot of emails, some labeled as spam and some as
not spam.
The program looks at the words and patterns in those emails.
After learning, it can look at a new email and guess if it’s spam or not — even though
you didn’t tell it exact rules.
In short:
Machine learning = Computers learning from experience (data) to do things like recognize
faces, understand speech, predict prices, or recommend movies.
Example:
Let’s say we want a computer to tell the difference between an apple and a banana.
Computers were super basic, but people started dreaming about teaching machines.
People realized: manually writing all the rules is super hard and doesn’t scale.
New techniques like Support Vector Machines and boosting gain popularity.
6. 2020s: Everyday AI
ML is in your phone, your car, your fridge, everywhere.
Chatbots (like me 👋), voice assistants, self-driving cars, and smart healthcare.
o Safer AI
o AI that explains itself (Explainable AI)
o Combining reasoning + learning
What it means:
The machine learns from labeled data — you give it examples with the right answers.
Goal:
Learn to predict the answer for new, unseen data.
Examples:
o Email: Spam or Not Spam
o Predicting house prices
o Handwriting recognition
Key algorithms:
Linear regression, Decision trees, Neural networks, SVM
What it means:
The machine gets no labels — just raw data. It tries to find patterns or structure on its
own.
Goal:
Discover hidden patterns, groupings, or features.
Examples:
o Grouping similar customers (clustering)
o Reducing data size for compression (dimensionality reduction)
o Topic discovery in documents
Key algorithms:
K-means clustering, PCA, Hierarchical clustering, Autoencoders
What it means:
A mix of both: you give some labeled data, and a lot of unlabeled data.
Goal:
Use the small amount of labeled data to make better use of the big pile of unlabeled data.
Examples:
o Labeling images when labeling everything is too expensive
What it means:
The machine learns by doing — it interacts with an environment, gets rewards or
penalties, and tries to improve.
Goal:
Learn a strategy or sequence of actions to maximize long-term rewards.
Examples:
o Teaching a robot to walk
o Self-driving cars
o AI playing video games or board games
Key algorithms:
Q-learning, Deep Q Networks (DQN), Policy Gradient methods
Learning by Rote
In human learning, rote learning means memorizing things exactly as they are, without really
understanding them.
Example: Memorizing math formulas or vocabulary definitions without knowing how or why
they work.
In machine learning, rote learning means the model is just memorizing the training data,
instead of learning real patterns or general rules.
High accuracy on training data, but low accuracy on new (test) data.
It behaves like a parrot: repeating answers it saw before but getting confused with
anything new.
Learning By Induction
Inductive learning means learning general rules or patterns from specific examples.
In simple terms:
The machine sees examples, finds patterns, and then makes predictions about new, unseen data based
on those patterns.
Color: Green
Shape: Round
The model hasn't seen this exact fruit before, but it uses the rule it learned and says:
Decision Trees
Logistic Regression
Neural Networks
DEPARTMENT OF CSE & AIML
MACHINE LEARNING
Naive Bayes
Support Vector Machines
They all try to learn a function from input → output based on training examples, and then
apply it to new data.
Reinforcement Learning
Reinforcement Learning is a type of machine learning where an agent learns to make decisions by
interacting with an environment, trying things out, and getting rewards or penalties.
In simple terms:
The machine tries things, learns from successes and mistakes, and improves over time.
Real-Life Analogy
Key Parts of RL
Over time, the agent gets better at choosing actions that lead to higher rewards.
Real-World Examples of RL
Games:
o AlphaGo (beat humans at Go)
o AI playing Chess or Dota 2
Robotics:
o Robots learning to walk, grab things, or navigate rooms
Self-driving cars:
o Learning to avoid obstacles, stay in lanes, or park
Recommendations:
o Learning which movie/show a user might like next
Finance:
o Algorithms that learn how to trade stocks better over time
🧠 Popular RL Algorithms
Algorithm Description
Q-learning Agent learns a table of "what’s the best thing to do" in each state
Deep Q Networks Uses deep learning (neural networks) instead of tables — great for complex
(DQN) environments
Policy Gradient Agent learns the policy directly, not just values
In machine learning, data is the core fuel that powers the entire learning process. The quality, type, and
quantity of data determine how well the model will learn and make predictions.
1. Structured Data 📋
Definition:
Data that is organized in tables with rows and columns (like in Excel or databases).
Examples:
o Customer info: name, age, country, purchase history
o Stock prices
o Student test scores
Use Case:
Common in traditional ML models like regression, decision trees, etc.
2. Unstructured Data 🌀
Definition:
Data without a clear format or structure — hard for machines to process directly without
preprocessing.
Examples:
o Text (emails, tweets, books)
o Images (photos, scans)
o Audio (music, voice)
o Videos (YouTube clips, security footage)
Use Case:
Requires techniques like NLP (natural language processing), computer vision, or audio analysis.
3. Semi-Structured Data 🧾
Definition:
Not fully structured, but still has some organization (like tags or markers).
Examples:
o JSON or XML files
o HTML documents
o NoSQL databases (MongoDB)
Use Case:
Can be converted into structured form with parsing and cleaning.
Besides format, we also look at how the data is used in machine learning:
Definition:
Data collected over time, where order matters.
Examples:
o Stock prices every hour
o Weather readings over days
o Heart rate over time
Use Case:
Forecasting, anomaly detection, financial modeling
Data Matching
Data matching in machine learning refers to the process of comparing and aligning data from different
sources or records to find corresponding or similar entries.
It’s an essential task when dealing with data that may have inconsistencies, redundancies, or need to be
merged.
Data Integration: Combining datasets from different sources (e.g., merging customer info from
two databases).
Anomaly Detection: Identifying mismatched or unusual data patterns.
Data Cleansing: Ensuring consistency and resolving duplicates in datasets.
The process of machine learning typically involves several key stages. Here's a simplified breakdown of
the stages:
1. Problem Definition
Goal: Define the problem you want to solve and the desired outcome (e.g., classification,
regression, clustering).
Example: Predict house prices based on features like size and location.
2. Data Collection
Goal: Gather the data needed to train the model. The quality and quantity of data are crucial.
Example: Collect historical house prices, location details, and house features.
3. Data Preprocessing
Goal: Clean the data and prepare it for analysis. This includes handling missing values,
normalizing features, encoding categorical variables, and dealing with outliers.
Example: Remove rows with missing values, scale numerical features, and convert categorical
data into numeric form.
4. Feature Engineering
Goal: Create new features or modify existing ones to improve the model’s performance.
Example: Combining "house size" and "number of rooms" into a new feature like "house
density."
5. Model Selection
Goal: Choose the right machine learning algorithm for the task (e.g., decision trees, linear
regression, neural networks).
Example: For predicting house prices, you might start with linear regression or a decision tree
model.
6. Model Training
Goal: Train the model on the data using the selected algorithm. The model learns patterns from
the data during this phase.
Example: The model uses the historical data to understand relationships between features and
target (house prices).
7. Model Evaluation
Goal: Assess the model's performance using evaluation metrics like accuracy, precision, recall,
F1 score (for classification), or mean squared error (for regression).
Example: Test the model using a separate test set (data the model hasn’t seen) to check its
prediction accuracy.
8. Hyperparameter Tuning
Goal: Optimize the model by adjusting hyperparameters (parameters set before training), such
as the learning rate or tree depth.
Example: Tuning the depth of a decision tree to improve accuracy without overfitting.
9. Model Deployment
Goal: Deploy the trained model into a production environment where it can make predictions
on new, unseen data.
Example: Deploying a house price prediction model to a website where users can input house
details and get price estimates.
Goal: Continuously monitor the model’s performance over time. Update and retrain it as new
data becomes available.
Example: If the house market changes, the model may need retraining with new data to
maintain accuracy.
1. Problem Definition
2. Data Collection
3. Data Preprocessing
4. Feature Engineering
5. Model Selection
6. Model Training
7. Model Evaluation
8. Hyperparameter Tuning
9. Model Deployment
10. Model Monitoring and Maintenance
small and simple example of the machine learning process — using a classic problem:
🏠 Problem:
🔢 Step-by-Step:
1. Problem Definition
→ Predict house price (regression problem)
2. Data Collection
→ Example data:
1500 300,000
2000 400,000
3. Data Preprocessing
→ No missing values, so we’re good to go!
4. Model Selection
→ Use Linear Regression (simple model for numerical prediction)
5. Model Training
→ The model learns:
Price = 200 × Size
6. Model Evaluation
→ Test with a new size: 1800 sq ft
→ Model predicts:
Price = 200 × 1800 = 360,000
7. Prediction
→ Input: Size = 1800
→ Output: Price = $360,000 ✅
Data acquisition is the process of gathering or collecting data from various sources to be used for
training and testing machine learning models.
Feature Engineering
🧠 In simple terms: You take raw data and turn it into something more useful for the model.
right features = better model accuracy. Even a simple model can perform well with strong features.
3. Normalization / Scaling
Data representation in machine learning means transforming real-world data into numerical form
(numbers, vectors, matrices) that algorithms can work with.
🧠 Computers can’t understand text, images, or sound directly — so we turn them into numbers!
Example:
32 55,000 No
Represented using:
o Bag of Words
o TF-IDF
o Word Embeddings (e.g., Word2Vec, GloVe, BERT)
Example:
"I love cats" → [0.2, 0.4, 0.9] ← vector representing the sentence
3. Image Data
4. Audio Data
Example:
[120, 125, 130, 128, 132] ← stock price over 5 days
Model selection is the process of choosing the best machine learning algorithm for your specific task
and dataset.
Model Evaluation
Model evaluation is the process of testing how well your machine learning model performs.
It helps you understand:
Model prediction is when a trained machine learning model takes new input data and gives a
prediction or output based on what it has learned.
Let’s say you want to predict the price of a house based on its size.
Now you give the model new data it hasn’t seen before:
💰 Price = $360,000
The model uses what it learned to guess the price of the new house.
Searching in machine learning refers to the process of finding the best solution or model by exploring
different possibilities. It's like searching for the best route on a map.
Example:
Imagine you want to predict the price of a house. You might try several different models (e.g., Linear
Regression, Decision Trees, etc.) and search for the one that gives the best predictions. This search helps
you find the model that works best for your specific problem.
Learning is when a machine (model) studies data and gets better over time. It’s like teaching a machine
to recognize patterns or trends from past examples so it can make predictions on new data.
Example:
You give the model examples of houses with size and price.
The model learns that larger houses tend to have higher prices.
The more examples (data) you give it, the smarter the model becomes at predicting prices.
🔑 Key Difference:
Searching is about finding the best approach (model or solution) to solve the problem.
Learning is about the model improving its ability to make predictions by studying data.
Summary:
A dataset is a collection of data that is used to train and test a machine learning model. It consists of
many examples (rows) and features (columns).
Examples are the individual data points (like a house, an image, or a customer).
Features are the characteristics or properties of each example (like house size, age, color, etc.).
🧩 Simple Analogy:
Imagine you're teaching a robot to recognize different fruits. You’ll give it a dataset of fruits where each
fruit has features like:
Color
Weight
Size
Types of Datasets:
1. Training Dataset: This is the dataset used to teach the model. It contains examples from which
the model learns.
2. Test Dataset: This is a separate dataset used to test the model's performance after it has been
trained. It helps ensure the model can generalize well to new, unseen data.
Public datasets: Free datasets available online (like from Kaggle, UCI, etc.).
Company datasets: Internal datasets (e.g., customer data, sales data).
APIs or Web Scraping: You can collect data from APIs or websites.
✅ Summary: