ML Important Topic
ML Important Topic
=Machine learning me, "well-posed learning problem" ek aisa problem hai jisme hume
ek machine learning model ko train karna hota hai, jisse woh input data se kuch
specific output predict kar sake. Is tarah ka problem well-posed kaha jata hai
kyunki ismein kuch important elements maujood hote hain:
Input Data: Humein well-posed problem mein ek set of input data diya jata hai. Ye
data machine learning model ke liye examples ka kaam karta hai, jisse woh pattern
aur trends seekh sake.
Desired Output: Har input data point ke liye, hume ek desired output ya target
output diya jata hai. Hum machine learning model ko train karte hai ki woh input
data se desired output ko predict kar sake.
Training Data: Well-posed problem mein hume training data provide ki jati hai, jo
input data points aur unke corresponding desired outputs se milta hai. Ye data
model ko train karne ke liye istemal hota hai.
Well-posed learning problem me, hum input data, desired output, training data,
algorithms, evaluation metrics aur generalization ko define karte hai, jisse hum
machine learning model ko train karke accurate predictions kar sakte hai.
Isse yeh samajhna chahiye ki har optimization problem ke liye, humein problem ki
specific details, structure, aur constraints ko samajhna aur uss par based hokar
tailor-made strategies aur algorithms develop karna hoga. Hum har problem ko
analyze karte hai aur uski characteristics ke hisab se optimization techniques ka
selection karte hai.
Imagine karein ki aap ek algorithm ya hypothesis class ka use karke data points ko
classify karne ki koshish kar rahe hain. VC dimension batata hai ki aap kitne data
points ko bilkul sahi tarike se classify kar sakte hain aur kitne data points ke
liye algorithm fail ho sakta hai.
Sochiye aapke paas ek straight line hai aur aapko red aur blue rang ke points ko
classify karna hai. Agar aapki line se aap red aur blue points ko perfect tarike se
separate kar sakte hain, toh aapke algorithm ya hypothesis class ka VC dimension 1
hai. Lekin agar aapki line se kuch points ko classify karne mein dikkat ho rahi
hai, jaise ki kuch red points ko blue mein classify kar diya ya blue points ko red
mein classify kar diya, toh aapke algorithm ya hypothesis class ka VC dimension 2
ya usse jyada ho sakta hai.
Ek simple example samjhte hain: Suppose hum ek dataset consider karte hain jismein
2 features hain, X1 aur X2. Agar har feature binary ho, yani sirf 0 aur 1 ki values
le sakta hai, toh Natarajan dimension 2 hoga. Is case mein hum do features ke
combinations par based decision boundaries create kar sakte hain. Lekin agar har
feature ki 3 possible values hain (0, 1, aur 2), toh Natarajan dimension 9 hoga,
kyunki har feature ke liye 3 possible values hain aur hum unke combinations par
decision boundaries create kar sakte hain.
Is technique ka istemal karne ke liye, hum multiple classifiers ko train karte hain
alag-alag datasets par. Fir, jab humein prediction karna hota hai, har classifier
apne predictions deta hai. Fir hum un predictions ka average lete hain ya fir unhe
weightage dete hain aur unka combination karke final prediction generate karte
hain. Is tarah, hum multiple classifiers ko ek unified decision-making process me
shaamil karke improved accuracy achieve kar sakte hain.
Yeh classifier kaam kaise karta hai? Sabse pehle, hum multiple models train karte
hai, jaise ki decision trees, support vector machines, logistic regression, aur
anya algorithms. Phir, jab hum kisi naye input sample ka classification karna
chahte hai, hum sabhi models ko uss sample ke liye predictions generate karne ke
liye use karte hai.
Jab humein predictions milte hai, hum majority voting scheme ko apply karte hai.
Ismein har model ko ek vote diya jata hai, jis mein wo model apni prediction ke
hisab se "haan" ya "naa" vote karta hai. Fir, sabhi votes count kiye jate hai, aur
jis class ko majority votes milte hai, usse final decision maana jata hai.
In other words, machine learning algorithms apne aap me inherently superior nahi
hote hai. Unhe acche results dene ke liye pehle training data di jati hai, jisme
desired outcomes aur examples hote hai. Algorithms us training data ko analyze
karte hai aur patterns aur correlations ko dhundhne ka prayas karte hai.
Phir unhe testing data di jati hai jisme unknown examples hote hai, aur algorithms
ko yeh pata lagana hota hai ki woh training data se seekhe hue patterns ko kaise
apply kare. Agar training data sahi tarah se represent ho rahi hai aur algorithm
sahi tarah se train ho gaya hai, tab woh achhe results dene ki koshish karta hai.
Lekin yeh process kafi samay aur efforts ka demand karta hai. Algorithm ko sahi
tarike se train karna, sahi training data collect karna aur uski quality ko
maintain karna, sab challenges hote hai. Isliye machine learning me initially
algorithms ko inherent superiority nahi hoti hai, balki usko achieve karne ke liye
consistent efforts aur optimization ki zarurat hoti hai.
Q8]Bagging and Boosting
Boosting: Boosting bhi ek ensemble learning technique hai, jisme weak learners
(models) ko sequence me train kiya jata hai. Boosting kehte hai kyunki yeh
technique har iteration me weak learner ko previous weak learners ke mistakes par
focus karne ke liye "boost" karta hai. Har weak learner ke predictions ko analyze
karke, un instances ko jin par weak learners ki performance kam thi, next weak
learner ko focus karne ke liye weightage diya jaata hai. Is tarike se, har
iteration me model ki performance improve hoti hai. Final prediction nikalte waqt,
weak learners ke predictions ke hisaab se weightage diya jaata hai. Boosting
overfitting se bachata hai aur high accuracy achieve karne me madad karta hai.
Famous boosting algorithms, AdaBoost, Gradient Boosting, aur XGBoost hai.
Samanya taur par, bagging multiple models ko parallel train karta hai aur unke
predictions ko combine karta hai, jabki boosting weak learners ko sequence me train
karta hai aur unke predictions ko combine karta hai.
Lekin, kabhi kabhi yeh ho sakta hai ki randomization process ke doran kuchh chunav
features limited ho jaye. Iska matlab hai ki har decision tree sirf limited set of
features se train ho raha hai, jisse woh specific features ke upar zyada focused ho
jate hain. Yeh ek problem hai kyunki agar koi important feature random selection ke
process mein miss ho jata hai, toh model us feature ke importance ko samajh nahi
pata hai aur uski accuracy pe asar hota hai.
Is problem ko solve karne ke liye, ek tarika hai ki zyada features ko include karke
randomization process ko aur robust banaya jaye. Aur dusra tarika hai ki agar koi
feature importance ko measure karke aur select karke use kiya jaye, taki important
features ka model ko pata chale aur unka consideration sahi tareeke se ho sake.
Q10]support vector classifier and regressor
Support Vector Classifier (SVC) aur Support Vector Regressor (SVR) machine learning
mein istemal hone wale algorithms hai. Ye dono algorithms supervised learning ka
hissa hai, jahan par humein labeled training data diya jata hai aur hum model ko
training karte hai, taki woh naye data ko classify (SVC) ya predict (SVR) kar sake.
Support Vector Classifier (SVC) ek classification algorithm hai, jiski madad se hum
data points ko kisi predefined category mein classify karte hai. Is algorithm mein,
data points ko n-dimensional space mein represent kiya jata hai aur ek decision
boundary (hyperplane) tayyar kiya jata hai, jo alag-alag categories ko separate
karta hai. Decision boundary ko tayyar karne ke liye, SVC support vectors ka upyog
karta hai, jo training data mein kuch selected data points hote hai. Ye support
vectors decision boundary ke paas hote hai aur usko define karte hai. SVC ka
uddeshya hota hai ki ye decision boundary ko maximize kare, jisse classification
accuracy badh jati hai.
Support Vector Regressor (SVR) ek regression algorithm hai, jiski madad se hum
continuous numerical values ko predict karte hai. Ye algorithm bhi data points ko
n-dimensional space mein represent karta hai, lekin ismein hum ek line (hyperplane)
tayyar karte hai, jo data points ke paas se guzarti hai. SVR ka uddeshya hota hai
ki ye line ko data points ke saath jitna ho sake sambandhit kare, taki accurate
predictions ki ja sake. Is algorithm mein bhi support vectors ka istemal kiya jata
hai, jo line ke paas hote hai aur usko define karte hai.
Q11]R2 Score
=R2 Score, ya R-Squared Score, ek pramukh machine learning evaluation metric hai jo
regression models ke performance ko quantify karta hai. R2 Score, ek regression
model ke dwara diye gaye predictions ke accuracy ko represent karta hai.
Aap ise bahut hi simple tarike se samajh sakte hain. R2 Score ka value 0 se 1 tak
hota hai. Agar R2 Score 1 hai, to yah matlab hai ki model ke dwara kiye gaye
predictions, actual data ke saath ekdum perfect match karte hain. Iska matlab hai
ki model dwara explain kiya gaya variation, yani ki outcome ke changes ko samjhane
ka capability, bahut accha hai.
Agar R2 Score 0 hai, to yah matlab hai ki model ke dwara kiye gaye predictions,
actual data ke saath koi bhi correlation nahi rakhte hain. Iska matlab hai ki model
outcome ko samajhane mein bilkul asafal hai.
Agar R2 Score negative hai, to yah matlab hai ki model ke dwara kiye gaye
predictions, actual data se bhi behtar nahi hain.
Mujhe ummid hai ki aapko R2 Score ke bare mein samajh aa gaya hoga. Kripya mujhe
batayein yadi aapko aur kuch madad chahiye ho.
MAE ko calculate karne ke liye, hum har ek data point ke liye predicted value se
actual value ka absolute difference lete hain, aur fir sabhi differences ka average
calculate kar dete hain. Isse humein pata chalta hai ki model ke predictions kitne
accurate ya inaccurate hai.
Mathematically, MAE ko niche diye gaye formula se represent kiya jata hai:
Yahan,
MSE ko calculate karne ke liye, sabse pehle hum model ke predictions aur asal
(actual) values ke beech ki difference ko lete hain. Phir hum in differences ko
square karke average (mean) nikalte hain. Is tarah se humara MSE score calculate ho
jata hai.
MSE ki value hamesha non-negative hoti hai. Agar MSE ki value zero hoti hai, to yeh
indicate karta hai ki model ke predictions perfect hain aur actual values se bilkul
match kar rahe hain. Lekin, generally, yeh bahut kam cases mein hota hai.
Yahan,
MSLE ka use MSE (Mean Squared Error) se thoda alag hai. MSE me hum target variable
ki predicted value aur actual value ke beech ke square difference ka average lete
hai. MSLE me, hum target variable aur predicted value ka log transformation lete
hai, fir inka square difference ka average lete hai.
Yahaan, y_pred regression model dwara predict ki gayi value hai, aur y_true actual
target value hai. log() ek logarithmic function hai, jo predicted aur actual values
ka log transformation karta hai. n number of data points hai.
MAPE ko calculate karne ke liye, hum predicted values aur actual values ka
difference percentage mein lete hai, aur phir un differences ka average calculate
karte hai. Ye formula use hota hai:
Yahan, 'n' actual values ki total count hai, 'y' actual value hai, aur 'y_pred'
predicted value hai.
Regression models ki madad se hum ek dependent variable (jise hum predict karna
chahte hai) ko independent variables (features) se relate karte hai. Jab hum apne
model ko test dataset par evaluate karte hai, to EVS hume batata hai ki model ne
kitna variance ko samjha hai ya explain kar paya hai.
EVS ki range 0 se 1 tak hoti hai, jahan 1, perfect prediction ko indicate karta
hai, aur 0, kisi bhi variance ko explain nahi karne ko indicate karta hai. EVS ke
higher values better performance ko darshate hai.
Q17]linear regression
=Linear regression ek prakar ka supervised machine learning algorithm hai, jo ek
dependent variable (y) aur ek ya usse adhik independent variables (x1, x2, ..., xn)
ke beech ke linear relationship ko model karta hai. Iska matalab hai ki y aur x ke
beech ek straight line ka sambandh dhunda jata hai, jis se hum y ko predict kar
sakte hain, jab hum x ki maan jante hain.
Ise aasan shabdon mein samjhane ke liye, ek example lete hain. Maan lijiye aap ek
real estate agent hain aur aapko gharon ke daamon ko predict karna hai. Aapke paas
kuch gharon ke size (square footage) ke data points hai (x1, x2, ..., xn), aur uske
corresponding price (y) bhi hai. Linear regression ka upyog karke aap ek straight
line dhund sakte hain, jis se aap size ke base par price ko predict kar sakte hain.
Iske liye, aap linear regression algorithm ko train karte hain, jisme aap data
points ka upyog karte hain, aur algorithm ko sikhate hain ki kaise ek straight line
ko optimize karke best fit banaya ja sakta hai. Is optimization process mein,
algorithm apne parameters (intercept aur slope) ko adjust karta hai, taaki line
possible sabhi data points ke bahut kareeb se guzar sake.
Ek baar jab model train ho jaye, aap naye size values (x) ko input de sakte hain
aur model aapko corresponding predicted price (y) dega. Is tarah, aap linear
regression ka upyog karke gharon ke daam ko predict kar sakte hain.
Q18]RANSAc
=RANSAC (Random Sample Consensus) ek machine learning technique hai jiska upyog
outliers ko identify karne aur unhe ignore karne ke liye kiya jata hai. Iska use
computer vision, image processing, aur computer graphics jaise domains me hota hai.
RANSAC ka basic idea yeh hai ki, agar humare paas ek dataset hai jismein kuchh
outliers (anomalous data points) hote hain, to hum RANSAC ka istemal karke un
outliers ko detect kar sakte hain. Yeh technique, robust parameter estimation ke
liye kaafi upyogi hai.
RANSAC algorithm, do important steps se milkar bana hai: "random sample selection"
aur "model fitting".
Model Fitting:
Is step mein, sample ke data points ka upyog karke ek model banaya jata hai. Model
fitting ka tarika alag-alag ho sakta hai, jaise linear regression, polynomial
fitting, homography fitting, etc., jispar dataset aur problem domain par depend
karta hai.
Fir, model ke basis par, RANSAC outliers ko identify karta hai. Yeh outliers, jo
dataset me hote hain, wo data points hote hain jo model se acche se fit nahi hote
hai. RANSAC un points ko outliers ke roop me identify karke unhe ignore kar deta
hai.
Q19]Coorealation matrix
=Correlation matrix ek machine learning concept hai, jisme hum ek dataset ke
features (gunvatta) ke beech mein correlation (sambandh) ko samjhne ke liye ek
matrix ka upyog karte hain. Ye matrix hume ye batata hai ki har feature dusre
features ke saath kis tarah se sambandhit hai.
Correlation matrix hume ye pata lagane mein madad karta hai ki kaun se features ek
dusre se strongly ya weakly correlated hai. Ye hamare liye bhavnatmak information
pradan karta hai, jise hum model ko train karne, features ko select karne, ya
duplicate features ko hatakar performance ko badhane ke liye istemal kar sakte
hain.
Q20]polynomial regression
=Polynomial regression ek aisa machine learning ka regression analysis technique
hai jahan par independent variable(s) aur dependent variable ke beech ka sambandh
ek nth degree polynomial ke roop mein model kiya jata hai. Sade shabdo mein kaha
jaye toh ye ek aisa tarika hai jisme hum data points ki set ke liye ek curve ko fit
karne ke liye polynomial functions ka istemal karte hain.
Bayes optimal classifier bhi isi principle par adharit hota hai. Ye algorithm
probabilities ka istemal karta hai, jisme har ek possible class ke liye ek
probability assign ki jati hai. Classifier, diye gaye input features ko analyze
karke, har class ki probability calculate karta hai. Phir ye classifier input ko us
class se jodta hai jiske liye probability sabse adhik hoti hai.
Aasaan shabdon mein, Bayes optimal classifier ek input ko classify karne ke liye
probability ka istemal karta hai. Ye classifier training data se sikhta hai aur
uski base par probabilities calculate karta hai. Ye probabilities, input ke
features ke base par sahi class ko estimate karne mein madad karte hai.
Is algorithm mein hum training data ka use karte hain jisme har data point ke saath
associated label (class) hota hai. Navie Bayes classifier, features ke
probabilities par adharit hai jisse wo predictions banata hai.
Is algorithm mein, sabse pehle hum training data ka statistical analysis karte
hain. Is analysis mei har feature ka probability calculate kiya jata hai us feature
ke saath har possible label ke liye. Ye probabilities humare model ke "prior
probabilities" hote hain.
Q23]classification Algorithm
=Machine learning me classification algorithm ek tarah ka statistical model hota
hai jo data ko categories ya labels me classify karta hai. Is algorithm ka main
objective hai, diye gaye input features se ek sahi category ya label assign karna,
jise hum target variable kehte hain.
Is algorithm ke liye, humare paas labeled training data hota hai, jisme har data
point ke saath sahi category ya label available hoti hai. Classification algorithm
is training data ko analyze karta hai aur ek decision boundary ya classification
rule tay karta hai, jiske saath naye, anlabelled data points ko classify kiya ja
sakta hai.
Yeh algorithm different input features ka use karte hai, jaise numerical values,
categorical variables, text, images, audio, etc. In features ko analyze karne ke
baad, algorithm ek decision rule tay karta hai, jise hum model kehte hai. Model ko
naye data points par apply karke unhe sahi category me classify kiya ja sakta hai.
Logistic Regression: Yeh algorithm binary classification ke liye use hota hai,
jahan output category sirf do options me hoti hai.
Naive Bayes: Yeh algorithm probabilistic approach ka use karta hai aur text
classification me bahut upyogi hota hai.
Support Vector Machines (SVM): Yeh algorithm linear aur non-linear classification
ke liye use hota hai. Isme, data points ko hyperplane ke saath classify kiya jata
hai.
Decision Trees: Yeh algorithm tree-like structure ka use karta hai, jaha har node
ek feature ko represent karta hai aur har edge ek decision rule ko.
Random Forest: Yeh algorithm decision trees ka ensemble hai, jisme kai decision
trees milkar ek result produce karte hai.
Q24]scikit-learn
=Scikit-learn, jo ki Python ke liye banaya gaya ek machine learning library hai,
machine learning tasks ko asaan banane ke liye use hota hai. Is library mein bahut
saare algorithms aur tools available hote hain jo ki data ko analyze, preprocess,
aur model training ke liye istemal kiye jaate hain.
Scikit-learn ke use karne ke liye, aapko dataset ko load karna hota hai. Iske baad
aap features aur target variables ko alag-alag variables mein store kar sakte hain.
Aap dataset ke saath preprocess karte hue, jaise ki feature scaling, missing value
handling, aur categorical variable encoding, kar sakte hain.
Q25] Perceptrone
=Perceptron ek prakar ka binary classifier hai, jo machine learning mein istemal
kiya jata hai. Ye ek simple linear binary classification algorithm hai, jo ek
single-layer neural network hai. Perceptron ke dwara diye gaye input features se
base banakar, ye algorithm input ko do classes mein classify karne ki koshish karta
hai.
Perceptron ke input features ko weights (bharon) ke saath multiply kiya jata hai
aur fir unhe jodkar ek linear combination banaya jata hai. Is linear combination ko
bias term ke saath add kiya jata hai aur isse final output nikala jata hai. Agar ye
output threshold (seema) ke paar hota hai, toh perceptron ek class ko represent
karega, aur agar threshold ke neeche hota hai, toh dusri class ko represent karega.
SVM kaam karte samay, input data ko ek n-dimensional space mein represent karta
hai, jahan har feature ek dimension hota hai. SVM ki koshish hoti hai ki ek
hyperplane (ek flat decision boundary) ko tayyar kare jo input data points ko sahi
tarike se alag-alag classes mein divide kar sake. Hyperplane ka selection aise hota
hai ki usse maximum margin distance ho, yaani ki woh data points se jyada door ho.
Agar data linearly separable hai, yani ki ek straight line se alag-alag classes ko
separate kiya ja sakta hai, toh SVM linear kernel ka upyog karta hai. Lekin agar
data linearly separable nahi hai, toh SVM kernel trick ka upyog karta hai, jisme
data ko ek higher-dimensional space mein map karte hai, jahan woh linearly
separable ho sakta hai.
Simpler shabdon mein samjhaen toh decision tree learning ek tarika hai decisions
lene ka jahan ek series of questions puche jate hain. Sochiye aapke paas ek dataset
hai jisme alag-alag features aur unke corresponding labels hai. Lakshya hai ki
naye, anjan datapoints ke liye labels ki prediction karein, training dataset mein
dekhe gaye patterns par adharit.
Q28]CART Algorithm
=CART, ya Classification and Regression Trees, ek prakar ka machine learning
algorithm hai jo supervised learning mein istemal hota hai. Ye algorithm decision
tree ko istemal karta hai aur classification aur regression dono prakar ke problems
ko solve karne ke liye upyogi hai.
CART algorithm, ek dataset par decision tree banata hai jisme har node par ek
feature ka selection hota hai aur us feature ke values ke aadhar par dataset ko
split kiya jata hai. Ye process tree ke saath recursively repeat hota hai jab tak
saare features explore na ho jayein ya koi termination criterion pura na ho jaye.
Q29]ID3
=ID3 (Iterative Dichotomiser 3) ek decision tree algorithm hai, jo ki machine
learning mein istemal kiya jata hai. Ye algorithm supervised learning mein istemal
hota hai, jahan par humare paas labeled data hoti hai, yaani ki input features ke
saath unke corresponding output labels hote hain.
ID3 algorithm ka main objective hai ek decision tree banane ka, jisme har internal
node ek feature ko represent karta hai, jo input data ko divide karta hai. Har leaf
node ya terminal node ek class label ko represent karta hai, jisse humein final
output predict karna hota hai.
ID3 algorithm iterative tareeke se kaam karta hai. Ye har iteration mein input data
ka ek feature select karta hai, jisse sabse jyada information gain ho sake.
Information gain feature ka selection criteria hai, jisse humein ye pata chalta hai
ki wo feature kitni acchi tarah se classes ko distinguish kar sakta hai. Jis
feature se sabse jyada information gain milta hai, wo feature root node ke roop
mein select hota hai.
Q30] C4.5
=C4.5 machine learning algorithm ek classification algorithm hai jo Ross Quinlan ne
develop kiya hai. Ye ID3 algorithm ka ek extension hai aur supervised learning
tasks mein decision trees banane ke liye widely use hota hai.
Sochiye aapke paas ek dataset hai jisme alag-alag objects ke baare mein information
hai, aur aapko ek model banana hai jo naye objects ke attributes ke basis par unke
class ya category ko predict kar sake. Jaise ki aapke paas phal (fruit) ke baare
mein data hai jisme rang, size aur shape jaise attributes hote hain, aur aapko ye
predict karna hai ki ek naya phal seb (apple) hai ya santara (orange) hai.
C4.5 algorithm aapko ek decision tree banane mein madad karta hai jisme har node
par data ko split karne ke liye sabse jyada informative attribute ko select karta
hai. Ye algorithm "information gain" naamak ek measure ka upyog karta hai jisse ye
decide karta hai ki kaunsa attribute data ko split karne ke liye sabse achha hai.
Information gain ye measure karta hai ki ek attribute kitni jankari provide karta
hai class labels ke baare mein aur uncertainty ko reduce karne mein.
Yeh algorithm supervised learning ka ek example hai, jisme hum training data set ka
use karke model ko train karte hai. KNN ka basic concept yeh hai ki similar
features ya properties wale data points aapas mein jyada similar hote hai.
Data Preparation: Pehle, hume apne data set ko prepare karna hota hai. Yeh include
karta hai ki numerical values ko normalize karna, categorical values ko encode
karna, aur data ko train aur test sets mein divide karna.
Distance Calculation: KNN mein, hume distance metric calculate karna hota hai jaise
Euclidean distance. Euclidean distance, do data points ke beech ki geometric
distance ko measure karta hai.
KNN Model Training: Model training ke liye, hume training set ka use karte hai.
Hume apne data points ko feature space mein represent karna hota hai, jisme har
feature ek dimension hota hai. Uske baad, hum distance metric ka use karke, har
test point ke k-nearest neighbors ko khojte hai.
KNN Classification: KNN ka use classification problems mein kiya jata hai. Jab ek
naya data point aata hai, toh hum us point ke k-nearest neighbors ko dekhte hai.
Majority voting ka use karke, un neighbors ki class labels ke basis par, hum naye
data point ka class label predict karte hai.
KNN Regression: KNN ka use regression problems mein bhi kiya jata hai. Regression
mein, hum test point ke k-nearest neighbors ke average ya weighted average ko
predict karte hai, jisse hum us point ka continuous value estimate kar sakte hai.
KNN algorithm ka ek disadvantage hai ki uska inference time jyada ho sakta hai,
kyunki test point ke har prediction ke liye, hume distance measure karna padta hai.
Isliye, large datasets par KNN ka use karna thoda challenging ho sakta hai.
Q32]PAC Learning
=PAC (Probably Approximately Correct) learning is a concept in machine learning
that aims to analyze the learning process and provide guarantees on the accuracy of
the learned model. In simple terms, PAC learning is about learning from a set of
labeled examples and making predictions that are "probably approximately correct."
Approximately: PAC learning also recognizes that the learned model might not
perfectly represent the true underlying pattern or distribution of the data. It
allows for some degree of approximation, understanding that the learned model might
not be 100% accurate.
Correct: The goal of PAC learning is to ensure that the learned model makes
predictions with a certain level of correctness. It aims to minimize the number of
incorrect predictions by providing statistical guarantees.
In a nutshell, PAC learning is a framework that deals with the trade-off between
the number of training examples, the complexity of the hypothesis space (possible
models), and the acceptable level of error in the learned model. It provides
theoretical guarantees on the performance of the learned model and helps ensure
that the predictions made by the model are probably approximately correct.
Training set: Ye data set, model ko train karne ke liye istemal hota hai. Isme
humare paas labeled examples hote hain, yaani ki input data ke saath sahi output
values bhi hote hain. Model training mein, hum input data ko model mein feed karte
hain aur model uss data se patterns aur relationships seekhta hai. Training set ka
main objective hai model ko sahi tarah se sikhana, jisse wo future mein anjaane
input data ko bhi sahi tareeke se classify, predict ya analyze kar sake.
Test set: Test set, model ke performance ko evaluate karne ke liye istemal hota
hai. Ye data set training set se alag hota hai aur isme bhi labeled examples hote
hain. Hum test set ka istemal karte hain, taki hum dekh sake ki model ne kitna sahi
tareeke se sikha hai. Hum test set mein input data ko model mein feed karte hain
aur model usko process karke predictions banata hai. Fir hum in predictions ko sahi
output values se compare karte hain aur dekhte hain ki kitni accurate predictions
hue hai. Test set se model ke performance metrics jaise ki accuracy, precision,
recall, aur F1-score calculate kiye jaate hain.
Training aur test set ka alag hona zaroori hai, taaki hum model ko sahi tareeke se
evaluate kar sake. Agar hum training set ka istemal test karne ke liye karenge, toh
model ki performance biased ho sakti hai. Isliye, training aur test set ko alag
alag rakha jata hai, jisse model ko generalize karne ki ability ko test kiya ja
sake.
Q34]Normalization
=Machine learning me normalization ek data preprocessing technique hai jiska upyog
karte hain taki ham data ko ek consistent range me la sake. Is technique ka upyog
hum karte hain taaki alag-alag features ke numerical values ko ek saman scale par
le aaye aur unhe compare karne me asani ho.
Normalization karne se data ka distribution aur variation saman ho jata hai. Isse
model training process improve hoti hai aur accurate predictions ho sakte hain.
Ek aam normalization technique hai "min-max normalization". Isme, hum har feature
ke numerical values ko unke minimum aur maximum values ke beech me rescale karte
hain. Yeh values typically 0 aur 1 ke beech hote hain. Isme, har value se feature
ka minimum value subtract kiya jata hai aur fir usko feature ka maximum value se
divide kiya jata hai. Isse numerical values ka range 0 aur 1 ke beech me aa jata
hai.