0% found this document useful (0 votes)
3 views10 pages

2 notes

The document provides an overview of statistics, emphasizing its role in data collection, analysis, and interpretation, particularly in AI/ML applications. It covers basic statistical concepts, probability distributions, regression analysis, and key metrics for model evaluation, highlighting their importance in making informed decisions and predictions. Additionally, it outlines the stages in the statistical process, from problem definition to deployment.

Uploaded by

ghiblified0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

2 notes

The document provides an overview of statistics, emphasizing its role in data collection, analysis, and interpretation, particularly in AI/ML applications. It covers basic statistical concepts, probability distributions, regression analysis, and key metrics for model evaluation, highlighting their importance in making informed decisions and predictions. Additionally, it outlines the stages in the statistical process, from problem definition to deployment.

Uploaded by

ghiblified0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

1.

Introduction to Statistics

Definition: Statistics data ko collect, analyse, interpret aur present karne ka method hai. Ye AI/ML
mein data ko samajhne aur better decision lene ke liye use hota hai.

Example: Agar students ke marks ka analysis karna hai, toh hum unka average (mean), highest (max)
aur lowest (min) marks calculate karte hain.

2. Importance of Statistics in AI/ML

Patterns aur outliers identify karta hai.

Features choose karne mein help karta hai (Feature Selection).

Probability models jaise Bayesian Classifiers banata hai.

Fraud detection aur recommendations mein kaam aata hai.

Example: Netflix recommendations ke liye statistics use hota hai taaki tumhare interest ke hisaab se
shows suggest kare.

3. Role of Statistics in Data Analysis

Data sampling, summary (mean, median), pattern detection karta hai.

Hypothesis testing aur confidence intervals se generalization karta hai.

Model evaluate karta hai using accuracy, precision, recall, etc.

Example: Ek model bana rahe ho jo students ke marks predict karta hai. Tum statistical metrics se
check karoge ki model sahi predict kar raha hai ya nahi.

4. Basic Statistical Concepts for AI/ML

📌 Measures of Central Tendency

Mean: Sab values ka average.


Example: [10, 20, 30] → Mean = (10+20+30)/3 = 20

Median: Middle value jab data ko sort karo.

Example: [10, 20, 30] → Median = 20

Mode: Most frequent value.

Example: [10, 10, 20] → Mode = 10

📌 Variance & Standard Deviation

Variance: Data kitna spread hai from mean.

Standard Deviation: Variance ka square root – consistency batata hai.

Example: Agar sab students ke marks similar hain, toh SD low hoga (consistent performance).

🔢 5. Probability Distributions – Detailed Hinglish Notes

Definition:

Probability distribution batata hai ki kisi event (ya outcome) ke hone ka kya chance (probability) hai.
Ye ek mathematical function hoti hai jo batata hai ki har possible outcome ka kya probability hai.

📌 1. Normal Distribution (Gaussian Distribution)

Definition:

Ye ek bell-shaped curve hoti hai. Most data centre ke around hota hai, aur extremes (bahut high ya
bahut low values) rare hote hain.

Key Features:

Symmetrical curve (left aur right equal)


Mean = Median = Mode

Real-life data jaise height, weight, test scores isme fit hote hain

Example:

Agar class ke students ke test marks plot karein, toh zyada students average marks ke paas honge,
kuch very high aur kuch very low marks wale – ye banayega bell curve.

📌 2. Uniform Distribution

Definition:

Is distribution mein sab possible outcomes ka equal chance hota hai.

Key Features:

Probability same hoti hai har outcome ke liye

Graph rectangle jaisa dikhta hai

Example:

Ek fair dice roll karte ho – har number (1 se 6) aane ka chance 1/6 hai. Yani 1, 2, 3, 4, 5, 6 sabka same
probability = 1/6.

📌 3. Poisson Distribution

Definition:

Poisson distribution time ya space ke andar hone wale events ko model karta hai, jab events
randomly hote hain.

Key Features:

Event ek fixed interval mein ho raha ho (e.g. per hour)

Mean ≈ Variance
Events ek dusre se independent hote hain

Example:

1 hour mein kitne customers ek store mein aayenge?

1 minute mein website pe kitne log visit karenge?

Agar ek website pe average 5 log har minute visit karte hain, toh Poisson distribution batayega ki kisi
random minute mein 0, 1, 2, 3... kitne log aane ke kya chances hain.

📌 4. Binomial Distribution

Definition:

Jab kisi experiment mein sirf 2 outcomes possible ho (Success ya Failure), aur tum ye experiment
multiple times repeat karo, tab use binomial distribution bolte hain.

Key Features:

Fixed number of trials (n)

Har trial ka outcome: Success ya Failure

Probability of success (p) fixed hoti hai

Example:

Ek coin ko 10 baar toss karte ho. Har toss ka result: Heads (Success) ya Tails (Failure). Ab tumhare
questions ho sakte hain:

10 mein se 6 baar heads aane ka kya chance hai?

📌 5. Exponential Distribution
Definition:

Ye batata hai next event hone mein kitna time lagega. Mostly Poisson process ke saath use hota hai.

Key Features:

"Waiting time" measure karta hai

Memoryless property: past ka effect future event par nahi hota

Example:

Next customer kitne seconds mein store mein aayega?

Machine fail hone mein kitna time lagega?

🔷 6. Correlation and Covariance

✅ Correlation:

Definition:

Correlation batata hai ki do variables ke beech kya relationship hai aur wo kitna strong hai.

Range: -1 to +1

+1 → Strong positive relation

0 → No relation

-1 → Strong negative relation

Example:

Study time aur exam marks ka relation. Agar tum zyada padhte ho aur tumhare marks badhte hain →
positive correlation.
✅ Covariance:

Definition:

Covariance batata hai ki do variables ek saath kis direction mein change ho rahe hain.

Positive covariance → dono variables ek hi direction mein badh rahe hain

Negative covariance → ek badh raha hai, dusra ghat raha hai

Example:

Agar temperature badhta hai aur ice cream ki sales bhi badhti hai → positive covariance.

🔷 7. Regression Analysis

✅ Linear Regression

Definition:

Ye ek simple model hota hai jahan ek variable (X) se dusre (Y) ko predict karte hain using a straight
line.

Formula:

Y = b₀ + b₁X + e

(Where b₀ = intercept, b₁ = slope, e = error)

Example:

Ghar ka size (X) se uski price (Y) predict karna.

✅ Logistic Regression

Definition:

Ye classification model hai, jahan output sirf 0 ya 1, yaani yes/no hota hai.

Example:

Email spam hai ya nahi? (1 = spam, 0 = not spam)


✅ Polynomial Regression

Definition:

Jab relationship X aur Y ke beech non-linear ho, toh curved line fit karne ke liye polynomial
regression use hota hai.

Example:

Speed aur fuel efficiency ke beech curve relation hota hai — speed badhne par fuel efficiency pehle
badhti hai, phir girti hai.

✅ Ridge & Lasso Regression

Definition:

Ye regularization techniques hain jo model ko overfitting se bachati hain.

Ridge: Sab coefficients ko shrink karta hai

Lasso: Kuch coefficients ko zero bana deta hai → feature selection bhi karta hai

Example:

Agar model bahut complex ho gaya ho (overfitting), toh Lasso ya Ridge use kar ke usse simplify kiya
jata hai.

🔷 8. Bayesian Statistics

✅ Definition:

Bayesian statistics naye evidence aane par purani probability ko update karta hai.

✅ Naive Bayes Classifier:

Ek ML algorithm hai jo probabilities use karta hai for classification. Simple assumption ye hai ki
features independent hote hain.

Example:

Email spam detection. Har word ke basis pe model probability nikalta hai ki mail spam hai ya nahi.
🔷 9. Confidence Intervals & Hypothesis Testing

✅ Confidence Interval

Definition:

Ye batata hai ki kisi population parameter (like mean) ka estimate kis range mein aayega with certain
confidence (like 95%).

Example:

Agar sample mean = 70 hai aur confidence interval = [65, 75], iska matlab hai ki 95% chance hai ki
true mean 65-75 ke beech hoga.

✅ Hypothesis Testing

Definition:

Isse hum check karte hain ki do data sets ke beech kya actual difference hai ya sirf random chance se
aaya hai.

✅ Common Tests:

Chi-Square Test:

Categorical variables ke beech association check karta hai.

Example: Gender vs. Product Preference

ANOVA (Analysis of Variance):

3 ya usse zyada groups ke beech mean compare karta hai.

Example: 3 alag teaching methods ka effect compare karna.

🔷 10. Regression Performance Metrics

✅ R-squared (R²):

Definition:

Batata hai ki model ne data ke kitne variation ko explain kiya hai.

Value: 0 to 1 → 1 ka matlab perfect fit.

Example:
R² = 0.9 means model ne 90% variation explain kiya.

✅ MSE (Mean Squared Error) & RMSE (Root MSE):

Definition:

MSE: Sab prediction errors ka square average

RMSE: MSE ka square root → real error ka idea deta hai

Lower value = Better model

🔷 11. Stages in Statistical Process

Problem Definition

Kya predict/analyze karna hai?

Example: Sales forecast karni hai.

Data Collection

Data lana via surveys, sensors, APIs, etc.

Data Cleaning

Missing values, errors remove karna

Example: NULL marks ko average se replace karna

Exploratory Data Analysis (EDA)

Graphs, summaries se pattern samajhna


Example: Pie charts, histograms

Model Building

Regression, Classification algorithms apply karna

Model Evaluation

Accuracy, RMSE, R² se model check karna

Insights & Deployment

Business ya application ko decision dena

Example: Recommendation system banana aur launch karna

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy