2 notes
2 notes
Introduction to Statistics
Definition: Statistics data ko collect, analyse, interpret aur present karne ka method hai. Ye AI/ML
mein data ko samajhne aur better decision lene ke liye use hota hai.
Example: Agar students ke marks ka analysis karna hai, toh hum unka average (mean), highest (max)
aur lowest (min) marks calculate karte hain.
Example: Netflix recommendations ke liye statistics use hota hai taaki tumhare interest ke hisaab se
shows suggest kare.
Example: Ek model bana rahe ho jo students ke marks predict karta hai. Tum statistical metrics se
check karoge ki model sahi predict kar raha hai ya nahi.
Example: Agar sab students ke marks similar hain, toh SD low hoga (consistent performance).
Definition:
Probability distribution batata hai ki kisi event (ya outcome) ke hone ka kya chance (probability) hai.
Ye ek mathematical function hoti hai jo batata hai ki har possible outcome ka kya probability hai.
Definition:
Ye ek bell-shaped curve hoti hai. Most data centre ke around hota hai, aur extremes (bahut high ya
bahut low values) rare hote hain.
Key Features:
Real-life data jaise height, weight, test scores isme fit hote hain
Example:
Agar class ke students ke test marks plot karein, toh zyada students average marks ke paas honge,
kuch very high aur kuch very low marks wale – ye banayega bell curve.
📌 2. Uniform Distribution
Definition:
Key Features:
Example:
Ek fair dice roll karte ho – har number (1 se 6) aane ka chance 1/6 hai. Yani 1, 2, 3, 4, 5, 6 sabka same
probability = 1/6.
📌 3. Poisson Distribution
Definition:
Poisson distribution time ya space ke andar hone wale events ko model karta hai, jab events
randomly hote hain.
Key Features:
Mean ≈ Variance
Events ek dusre se independent hote hain
Example:
Agar ek website pe average 5 log har minute visit karte hain, toh Poisson distribution batayega ki kisi
random minute mein 0, 1, 2, 3... kitne log aane ke kya chances hain.
📌 4. Binomial Distribution
Definition:
Jab kisi experiment mein sirf 2 outcomes possible ho (Success ya Failure), aur tum ye experiment
multiple times repeat karo, tab use binomial distribution bolte hain.
Key Features:
Example:
Ek coin ko 10 baar toss karte ho. Har toss ka result: Heads (Success) ya Tails (Failure). Ab tumhare
questions ho sakte hain:
📌 5. Exponential Distribution
Definition:
Ye batata hai next event hone mein kitna time lagega. Mostly Poisson process ke saath use hota hai.
Key Features:
Example:
✅ Correlation:
Definition:
Correlation batata hai ki do variables ke beech kya relationship hai aur wo kitna strong hai.
Range: -1 to +1
0 → No relation
Example:
Study time aur exam marks ka relation. Agar tum zyada padhte ho aur tumhare marks badhte hain →
positive correlation.
✅ Covariance:
Definition:
Covariance batata hai ki do variables ek saath kis direction mein change ho rahe hain.
Example:
Agar temperature badhta hai aur ice cream ki sales bhi badhti hai → positive covariance.
🔷 7. Regression Analysis
✅ Linear Regression
Definition:
Ye ek simple model hota hai jahan ek variable (X) se dusre (Y) ko predict karte hain using a straight
line.
Formula:
Y = b₀ + b₁X + e
Example:
✅ Logistic Regression
Definition:
Ye classification model hai, jahan output sirf 0 ya 1, yaani yes/no hota hai.
Example:
Definition:
Jab relationship X aur Y ke beech non-linear ho, toh curved line fit karne ke liye polynomial
regression use hota hai.
Example:
Speed aur fuel efficiency ke beech curve relation hota hai — speed badhne par fuel efficiency pehle
badhti hai, phir girti hai.
Definition:
Lasso: Kuch coefficients ko zero bana deta hai → feature selection bhi karta hai
Example:
Agar model bahut complex ho gaya ho (overfitting), toh Lasso ya Ridge use kar ke usse simplify kiya
jata hai.
🔷 8. Bayesian Statistics
✅ Definition:
Bayesian statistics naye evidence aane par purani probability ko update karta hai.
Ek ML algorithm hai jo probabilities use karta hai for classification. Simple assumption ye hai ki
features independent hote hain.
Example:
Email spam detection. Har word ke basis pe model probability nikalta hai ki mail spam hai ya nahi.
🔷 9. Confidence Intervals & Hypothesis Testing
✅ Confidence Interval
Definition:
Ye batata hai ki kisi population parameter (like mean) ka estimate kis range mein aayega with certain
confidence (like 95%).
Example:
Agar sample mean = 70 hai aur confidence interval = [65, 75], iska matlab hai ki 95% chance hai ki
true mean 65-75 ke beech hoga.
✅ Hypothesis Testing
Definition:
Isse hum check karte hain ki do data sets ke beech kya actual difference hai ya sirf random chance se
aaya hai.
✅ Common Tests:
Chi-Square Test:
✅ R-squared (R²):
Definition:
Example:
R² = 0.9 means model ne 90% variation explain kiya.
Definition:
Problem Definition
Data Collection
Data Cleaning
Model Building
Model Evaluation