0% found this document useful (0 votes)
4 views

Ml_unit_1

Machine Learning (ML) is a method for teaching computers to learn from data rather than following explicit instructions, allowing them to make predictions or decisions based on patterns. The evolution of ML spans several decades, from early rule-based systems to the current era of deep learning and everyday AI applications. Key paradigms of ML include supervised, unsupervised, semi-supervised, and reinforcement learning, each with distinct approaches to data and learning processes.

Uploaded by

geethasri2k1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Ml_unit_1

Machine Learning (ML) is a method for teaching computers to learn from data rather than following explicit instructions, allowing them to make predictions or decisions based on patterns. The evolution of ML spans several decades, from early rule-based systems to the current era of deep learning and everyday AI applications. Key paradigms of ML include supervised, unsupervised, semi-supervised, and reinforcement learning, each with distinct approaches to data and learning processes.

Uploaded by

geethasri2k1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 29

MACHINE LEARNING

Machine Learning (ML) is a way to teach computers to learn from data instead of giving
them step-by-step instructions.

Imagine you're teaching a kid to tell the difference between cats and dogs. You don’t give them a
big list of rules — you just show them a lot of pictures of cats and dogs, and they start to notice
patterns. For example, cats usually have pointy ears and dogs often have longer snouts.

In the same way, in machine learning, we give a computer a bunch of examples (called data),
and it learns from those examples to make predictions or decisions.

Here's a real-world example:

 You give a machine learning program a lot of emails, some labeled as spam and some as
not spam.
 The program looks at the words and patterns in those emails.
 After learning, it can look at a new email and guess if it’s spam or not — even though
you didn’t tell it exact rules.

In short:

Machine learning = Computers learning from experience (data) to do things like recognize
faces, understand speech, predict prices, or recommend movies.

Example:

Let’s say we want a computer to tell the difference between an apple and a banana.

1. We give it lots of pictures of apples and bananas.


2. We label them so it knows:
o This is an apple, this is a banana.
3. It starts learning:
o "Apples are usually round and red."
o "Bananas are long and yellow."
4. Then we show it a new fruit picture it hasn’t seen.
5. It says: "Based on what I’ve learned… that looks like a banana!"

💡 The more data it sees, the better it gets!

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

The Evolution of Machine Learning

1. 1950s–1960s: The Birth of the Idea


 Alan Turing asked: “Can machines think?” 💭 (The famous Turing Test)
 First ML programs were built to play games like checkers and solve math problems.

 Computers were super basic, but people started dreaming about teaching machines.

2. 1970s–1980s: Rule-Based Systems


 Early "AI" was built on if-this-then-that rules (called expert systems).
 No learning yet – just hard-coded logic.

 People realized: manually writing all the rules is super hard and doesn’t scale.

3. 1990s: The Rise of Learning Algorithms


 Real machine learning starts taking off!
 Algorithms like decision trees and neural networks come into the spotlight.

 Computers get faster + more data becomes available.


 First real use cases: handwriting recognition, spam filters, basic speech recognition.

4. 2000s: The Era of Big Data


 Internet explodes = tons of data everywhere 📶
 Google, Amazon, Netflix start using ML for search, recommendations, ads.

 New techniques like Support Vector Machines and boosting gain popularity.

5. 2010s: Deep Learning Revolution


 Boom! 💥 Neural networks become really deep (many layers = “deep learning”).
 GPUs (graphics cards) help crunch data faster.

 Machines start doing things like:


o Recognizing faces in photos
o Translating languages
o Beating humans in games like Go (AlphaGo by DeepMind in 2016)

6. 2020s: Everyday AI
 ML is in your phone, your car, your fridge, everywhere.
 Chatbots (like me 👋), voice assistants, self-driving cars, and smart healthcare.

 ML models become huge, like GPT, DALL·E, and BERT.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

7. 2024+ and Beyond: Smarter, Faster, More Human-like


 Focus is shifting to:
o Smaller, efficient models (so they can run on phones)

o Safer AI
o AI that explains itself (Explainable AI)
o Combining reasoning + learning

Paradigms of Machine Learning

4 Main Paradigms of Machine Learning:

1.supervised Learning (labeled data)

2.unsupervised Learning (unlabeled data)

3.semi supervised Learning (mix of labeled and unlabeled)

4. Reinforcement Learning (learning from experience)

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

1. Supervised Learning (👩‍🏫 Teacher + Student Style)

 What it means:
The machine learns from labeled data — you give it examples with the right answers.
 Goal:
Learn to predict the answer for new, unseen data.
 Examples:
o Email: Spam or Not Spam
o Predicting house prices
o Handwriting recognition
 Key algorithms:
Linear regression, Decision trees, Neural networks, SVM

2. Unsupervised Learning (🧩 Puzzle Solver Style)

 What it means:
The machine gets no labels — just raw data. It tries to find patterns or structure on its
own.
 Goal:
Discover hidden patterns, groupings, or features.
 Examples:
o Grouping similar customers (clustering)
o Reducing data size for compression (dimensionality reduction)
o Topic discovery in documents
 Key algorithms:
K-means clustering, PCA, Hierarchical clustering, Autoencoders

3. Semi-Supervised Learning (🧠 A little help)

 What it means:
A mix of both: you give some labeled data, and a lot of unlabeled data.
 Goal:
Use the small amount of labeled data to make better use of the big pile of unlabeled data.
 Examples:
o Labeling images when labeling everything is too expensive

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

o Voice assistants improving with small corrections


 Why it’s useful:
Real-world data is often mostly unlabeled — labeling is hard and expensive!

4. Reinforcement Learning (🎮 Trial-and-Error Style)

 What it means:
The machine learns by doing — it interacts with an environment, gets rewards or
penalties, and tries to improve.
 Goal:
Learn a strategy or sequence of actions to maximize long-term rewards.
 Examples:
o Teaching a robot to walk
o Self-driving cars
o AI playing video games or board games
 Key algorithms:
Q-learning, Deep Q Networks (DQN), Policy Gradient methods

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Learning by Rote

What is "Learning by Rote"?

In human learning, rote learning means memorizing things exactly as they are, without really
understanding them.
Example: Memorizing math formulas or vocabulary definitions without knowing how or why
they work.

🤔 What is "Rote Learning" in Machine Learning?

In machine learning, rote learning means the model is just memorizing the training data,
instead of learning real patterns or general rules.

This can be a bad thing, because:

 It works only for examples it has seen before.


 It fails on new or slightly different data.
 It means the model hasn't actually "learned", just copied.

📉 Example of Rote Learning in ML:

Let’s say you're training a model to recognize animals from pictures.

 You show it 1,000 pictures of cats and dogs.


 Instead of learning what features make a cat vs. a dog, it just memorizes each image.
 If you show it a new cat picture it hasn't seen before, it might fail — because it doesn't
generalize.

This problem is called overfitting.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

🔍 How Do We Know a Model Is Rote Learning?

 High accuracy on training data, but low accuracy on new (test) data.
 It behaves like a parrot: repeating answers it saw before but getting confused with
anything new.

✅ The Goal: Real Learning (Generalization)

We want machine learning models to:

 Learn underlying patterns


 Make good predictions on new, unseen data
 Be flexible, not just memorizing examples

Learning By Induction

What is Learning by Induction?

Inductive learning means learning general rules or patterns from specific examples.

In simple terms:

The machine sees examples, finds patterns, and then makes predictions about new, unseen data based
on those patterns.

Inductive Learning Example: Fruit Classifier

Imagine you want to teach a computer to tell if a fruit is an apple or a banana.


You give it a few training examples like this:

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Now, the machine starts looking for patterns in the examples:

 Apples are usually red or green and round.


 Bananas are yellow and long-ish.

It builds a general rule like:

If the fruit is round and not yellow, it's probably an apple.

🆕 Then you show it a new fruit:

 Color: Green
 Shape: Round

The model hasn't seen this exact fruit before, but it uses the rule it learned and says:

“Based on what I’ve seen, this is likely an apple.”

That’s inductive learning:


It generalized from the examples and applied the logic to something new.

Machine Learning Models That Use Induction:

Pretty much all classic ML models use induction to some extent:

 Decision Trees
 Logistic Regression
 Neural Networks
DEPARTMENT OF CSE & AIML
MACHINE LEARNING

 Naive Bayes
 Support Vector Machines

They all try to learn a function from input → output based on training examples, and then
apply it to new data.

Reinforcement Learning

Reinforcement Learning is a type of machine learning where an agent learns to make decisions by
interacting with an environment, trying things out, and getting rewards or penalties.

In simple terms:
The machine tries things, learns from successes and mistakes, and improves over time.

Real-Life Analogy

Think about how you learn to ride a bike:

 You try to balance → you fall → you learn.


 You try again → you stay up longer → yay! 🎉
 You keep practicing → eventually, you get really good.

That’s reinforcement learning: trial, feedback, improvement.

Key Parts of RL

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

How it Works (Step by Step)


1. Agent observes the current state of the environment.
2. Agent chooses an action based on its current policy.
3. The action changes the state of the environment.
4. The environment gives the agent a reward (positive or negative).
5. The agent learns from the result and updates its policy.
6. Repeat again and again!

Over time, the agent gets better at choosing actions that lead to higher rewards.

Real-World Examples of RL
 Games:
o AlphaGo (beat humans at Go)
o AI playing Chess or Dota 2
 Robotics:
o Robots learning to walk, grab things, or navigate rooms
 Self-driving cars:
o Learning to avoid obstacles, stay in lanes, or park

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

 Recommendations:
o Learning which movie/show a user might like next
 Finance:
o Algorithms that learn how to trade stocks better over time

🧠 Popular RL Algorithms
Algorithm Description

Q-learning Agent learns a table of "what’s the best thing to do" in each state

Deep Q Networks Uses deep learning (neural networks) instead of tables — great for complex
(DQN) environments

Policy Gradient Agent learns the policy directly, not just values

Actor-Critic Combines the best of Q-learning and Policy Gradients

Data in machine learning

In machine learning, data is the core fuel that powers the entire learning process. The quality, type, and
quantity of data determine how well the model will learn and make predictions.

1. Structured: Organized in tables (e.g., spreadsheets, databases).


2. Unstructured: Raw data like text, images, audio.
3. Semi-structured: Some organization (e.g., JSON, XML).

1. Structured Data 📋

 Definition:
Data that is organized in tables with rows and columns (like in Excel or databases).
 Examples:
o Customer info: name, age, country, purchase history
o Stock prices
o Student test scores

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

 Use Case:
Common in traditional ML models like regression, decision trees, etc.

2. Unstructured Data 🌀

 Definition:
Data without a clear format or structure — hard for machines to process directly without
preprocessing.
 Examples:
o Text (emails, tweets, books)
o Images (photos, scans)
o Audio (music, voice)
o Videos (YouTube clips, security footage)
 Use Case:
Requires techniques like NLP (natural language processing), computer vision, or audio analysis.

3. Semi-Structured Data 🧾

 Definition:
Not fully structured, but still has some organization (like tags or markers).
 Examples:
o JSON or XML files
o HTML documents
o NoSQL databases (MongoDB)
 Use Case:
Can be converted into structured form with parsing and cleaning.

Other Ways to Classify Data

Besides format, we also look at how the data is used in machine learning:

4. Labeled Data ✅ (Used in Supervised Learning)

 Has input AND the correct output.


 Example:

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

o A picture of a cat, labeled as “cat” 🐱


o A house with its sale price
 📘 Think of it as: "Training with answers."

5. Unlabeled Data ❓ (Used in Unsupervised Learning)

 Has input, but no correct output.


 The model tries to find patterns on its own.
 Example:
o Grouping similar products based on reviews
o Clustering customer behavior

6. Time Series Data

 Definition:
Data collected over time, where order matters.
 Examples:
o Stock prices every hour
o Weather readings over days
o Heart rate over time
 Use Case:
Forecasting, anomaly detection, financial modeling

7. Image, Audio, and Video Data 🎨🎧🎬

 Visual Data (images, video):


o Used in computer vision (face recognition, object detection)

 Audio Data (music, speech):


o Used in voice assistants, music genre classification

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Data Matching

Data matching in machine learning refers to the process of comparing and aligning data from different
sources or records to find corresponding or similar entries.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

It’s an essential task when dealing with data that may have inconsistencies, redundancies, or need to be
merged.

Key Concepts of Data Matching:

1. Record Linkage (or Entity Resolution):


o Matching different records that refer to the same entity (e.g., two records for the same
customer from different systems).
o Example: Finding two records with the same person’s name but slight variations in
spelling.
2. Duplicate Detection:
o Identifying and merging duplicate entries in a dataset.
o Example: Two listings for the same product on an e-commerce website with different
IDs.
3. Fuzzy Matching:
o Matching data that is not exact but is close in some way (e.g., "John Smith" vs "Jon
Smith").
o Commonly used in string matching, using algorithms like Levenshtein distance (edit
distance).

Applications of Data Matching in ML:

 Data Integration: Combining datasets from different sources (e.g., merging customer info from
two databases).
 Anomaly Detection: Identifying mismatched or unusual data patterns.
 Data Cleansing: Ensuring consistency and resolving duplicates in datasets.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

stages in machine learning

The process of machine learning typically involves several key stages. Here's a simplified breakdown of
the stages:

1. Problem Definition

 Goal: Define the problem you want to solve and the desired outcome (e.g., classification,
regression, clustering).
 Example: Predict house prices based on features like size and location.

2. Data Collection

 Goal: Gather the data needed to train the model. The quality and quantity of data are crucial.
 Example: Collect historical house prices, location details, and house features.

3. Data Preprocessing

 Goal: Clean the data and prepare it for analysis. This includes handling missing values,
normalizing features, encoding categorical variables, and dealing with outliers.
 Example: Remove rows with missing values, scale numerical features, and convert categorical
data into numeric form.

4. Feature Engineering

 Goal: Create new features or modify existing ones to improve the model’s performance.
 Example: Combining "house size" and "number of rooms" into a new feature like "house
density."

5. Model Selection

 Goal: Choose the right machine learning algorithm for the task (e.g., decision trees, linear
regression, neural networks).
 Example: For predicting house prices, you might start with linear regression or a decision tree
model.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

6. Model Training

 Goal: Train the model on the data using the selected algorithm. The model learns patterns from
the data during this phase.
 Example: The model uses the historical data to understand relationships between features and
target (house prices).

7. Model Evaluation

 Goal: Assess the model's performance using evaluation metrics like accuracy, precision, recall,
F1 score (for classification), or mean squared error (for regression).
 Example: Test the model using a separate test set (data the model hasn’t seen) to check its
prediction accuracy.

8. Hyperparameter Tuning

 Goal: Optimize the model by adjusting hyperparameters (parameters set before training), such
as the learning rate or tree depth.
 Example: Tuning the depth of a decision tree to improve accuracy without overfitting.

9. Model Deployment

 Goal: Deploy the trained model into a production environment where it can make predictions
on new, unseen data.
 Example: Deploying a house price prediction model to a website where users can input house
details and get price estimates.

10. Model Monitoring and Maintenance

 Goal: Continuously monitor the model’s performance over time. Update and retrain it as new
data becomes available.
 Example: If the house market changes, the model may need retraining with new data to
maintain accuracy.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Summary of Stages in Machine Learning:

1. Problem Definition
2. Data Collection
3. Data Preprocessing
4. Feature Engineering
5. Model Selection
6. Model Training
7. Model Evaluation
8. Hyperparameter Tuning
9. Model Deployment
10. Model Monitoring and Maintenance

small and simple example of the machine learning process — using a classic problem:

🎯 Predicting House Prices

🏠 Problem:

We want to predict the price of a house based on its size.

🔢 Step-by-Step:

1. Problem Definition
→ Predict house price (regression problem)
2. Data Collection
→ Example data:

Size (sq ft) Price ($)


1000 200,000

1500 300,000

2000 400,000

3. Data Preprocessing
→ No missing values, so we’re good to go!
4. Model Selection
→ Use Linear Regression (simple model for numerical prediction)

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

5. Model Training
→ The model learns:
Price = 200 × Size

6. Model Evaluation
→ Test with a new size: 1800 sq ft
→ Model predicts:
Price = 200 × 1800 = 360,000

7. Prediction
→ Input: Size = 1800
→ Output: Price = $360,000 ✅

Data Acquisition ……(also called as data collection)

Data acquisition is the process of gathering or collecting data from various sources to be used for
training and testing machine learning models.

📡 Sources of Data in Machine Learning:


1. Manual Data Entry
o Surveys, forms, or spreadsheets.

o ✅ Easy for small datasets


o ❌ Time-consuming and error-prone
2. Public Datasets
o Available online for free.
o Examples:
 Kaggle (https://www.kaggle.com/)
 UCI ML Repository (https://archive.ics.uci.edu/)
 Google Dataset Search
3. APIs
o Connect to external services to fetch live data.
o Example: Twitter API, Weather API, Google Maps API
4. Web Scraping
o Collect data from websites using tools like Python’s BeautifulSoup or Scrapy.
o ⚠️Be careful with terms of service.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

5. Sensors and IoT Devices


o For real-time physical data (e.g., temperature, motion, GPS).
6. Databases and Data Warehouses
o Internal company data stored in SQL or NoSQL databases.
7. Logs and Transactions
o Website clicks, purchase history, or system logs.

Once you have the data, you usually:

 Clean it (remove errors, fill missing values)


 Format it (normalize, encode, transform)
 Split it (into training, validation, and test sets)

Feature Engineering

Feature Engineering is the process of creating, transforming, or selecting input variables


(features) that help a machine learning model learn better and perform well.

🧠 In simple terms: You take raw data and turn it into something more useful for the model.

right features = better model accuracy. Even a simple model can perform well with strong features.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Common Feature Engineering Techniques:

1. Creating New Features

 Combine or extract useful information.


 Example:
o From "date of birth" → create "age"
o From "first name" + "last name" → create "full name"

2. Encoding Categorical Variables

 Convert text categories into numbers.


 Examples:
o Label Encoding:
Red = 0, Blue = 1, Green = 2
o One-Hot Encoding:
Red = [1, 0, 0], Blue = [0, 1, 0], Green = [0, 0, 1]

3. Normalization / Scaling

 Make numerical features fall within a similar range.


 Example:
o Scale heights from centimeters (150–200) → 0 to 1 range

4. Handling Missing Data

 Fill in missing values with:


o Mean/Median

o Most frequent value


o Or drop the row/column

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

What is Data Representation?

Data representation in machine learning means transforming real-world data into numerical form
(numbers, vectors, matrices) that algorithms can work with.

🧠 Computers can’t understand text, images, or sound directly — so we turn them into numbers!

🧾 Common Forms of Data Representation

1. Tabular Data (Rows & Columns)

 Most common in structured datasets.


 Each row = a data point
 Each column = a feature (input variable)

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Example:

Age Salary Bought Product


25 40,000 Yes

32 55,000 No

 Represented as a matrix of numbers (features)


 Target/output column used for supervised learning

2. Text Data (NLP)

 Represented using:
o Bag of Words

o TF-IDF
o Word Embeddings (e.g., Word2Vec, GloVe, BERT)

Example:

"I love cats" → [0.2, 0.4, 0.9] ← vector representing the sentence

3. Image Data

 Images are stored as a grid of pixels.


 Each pixel = a number (grayscale) or RGB values (3 numbers).
 Represented as a 3D matrix (Height × Width × Channels)

Example: A 28×28 grayscale image = matrix of 784 numbers

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

4. Audio Data

 Converted into numerical signals.


 Often represented as waveforms, spectrograms, or MFCCs.

5. Time Series Data

 Data collected over time.


 Represented as a sequence of values (e.g., daily stock prices)

Example:
[120, 125, 130, 128, 132] ← stock price over 5 days

What is Model Selection?

Model selection is the process of choosing the best machine learning algorithm for your specific task
and dataset.

Process for Model Selection


1. Understand the problem and data
2. Try multiple models (start simple)
3. Evaluate them using metrics
4. Tune hyperparameters (optional)
5. Pick the best-performing model

Common ML Models by Task:

🔸 For Classification (predicting categories)

 Logistic Regression — simple, fast, interpretable


 Decision Trees / Random Forests — great for tabular data

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

 Support Vector Machine (SVM) — good for small/medium datasets


 K-Nearest Neighbors (KNN) — simple, works well with clean data
 Neural Networks — powerful, but need more data and tuning

🔹 For Regression (predicting numbers)

 Linear Regression — good starting point


 Decision Tree Regressor
 Random Forest Regressor
 Gradient Boosting (XGBoost, LightGBM)
 Neural Networks (for complex problems)

🔹 For Clustering (grouping similar items)

 K-Means — most popular and simple


 DBSCAN — finds clusters of different shapes
 Hierarchical Clustering — builds a tree of clusters

Model Evaluation

Model evaluation is the process of testing how well your machine learning model performs.
It helps you understand:

✅ Is the model making accurate predictions?


❌ Is it making too many mistakes?
⚖️Is it balanced or biased?

Common Evaluation Metrics:

🔹 For Classification (e.g., Spam vs Not Spam):

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

What is Model Prediction?

Model prediction is when a trained machine learning model takes new input data and gives a
prediction or output based on what it has learned.

Example: Predicting House Prices

Let’s say you want to predict the price of a house based on its size.

✅ Step 1: Training Data (What the model learns from)

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

Size (sq ft) Price ($)


1000 200,000
1500 300,000
2000 400,000

The model learns:


📈 Bigger house = Higher price

✅ Step 2: Make a Prediction

Now you give the model new data it hasn’t seen before:

🏡 House Size = 1800 sq ft

📣 The Model Predicts:

💰 Price = $360,000

The model uses what it learned to guess the price of the new house.

searching and learning in machine learning

Searching in Machine Learning:

Searching in machine learning refers to the process of finding the best solution or model by exploring
different possibilities. It's like searching for the best route on a map.

Example:

Imagine you want to predict the price of a house. You might try several different models (e.g., Linear
Regression, Decision Trees, etc.) and search for the one that gives the best predictions. This search helps
you find the model that works best for your specific problem.

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

📚 Learning in Machine Learning:

Learning is when a machine (model) studies data and gets better over time. It’s like teaching a machine
to recognize patterns or trends from past examples so it can make predictions on new data.

Example:
 You give the model examples of houses with size and price.
 The model learns that larger houses tend to have higher prices.

 The more examples (data) you give it, the smarter the model becomes at predicting prices.

🔑 Key Difference:

 Searching is about finding the best approach (model or solution) to solve the problem.
 Learning is about the model improving its ability to make predictions by studying data.

Summary:

Concept Simple Meaning

Searching Looking for the best solution or model.

Learning The model improving its predictions by studying data.

Datasets in machine learning

What is a Dataset in Machine Learning?

A dataset is a collection of data that is used to train and test a machine learning model. It consists of
many examples (rows) and features (columns).

 Examples are the individual data points (like a house, an image, or a customer).
 Features are the characteristics or properties of each example (like house size, age, color, etc.).

🧩 Simple Analogy:

Imagine you're teaching a robot to recognize different fruits. You’ll give it a dataset of fruits where each
fruit has features like:

 Color
 Weight

DEPARTMENT OF CSE & AIML


MACHINE LEARNING

 Size

The dataset could look like this:

Types of Datasets:

1. Training Dataset: This is the dataset used to teach the model. It contains examples from which
the model learns.
2. Test Dataset: This is a separate dataset used to test the model's performance after it has been
trained. It helps ensure the model can generalize well to new, unseen data.

Where Do Datasets Come From?

 Public datasets: Free datasets available online (like from Kaggle, UCI, etc.).
 Company datasets: Internal datasets (e.g., customer data, sales data).
 APIs or Web Scraping: You can collect data from APIs or websites.

✅ Summary:

 A dataset is a collection of data used to train or test a machine learning model.


 It’s made up of examples (rows) and features (columns).
 Good datasets = better predictions!

DEPARTMENT OF CSE & AIML

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy