0% found this document useful (0 votes)
35 views

Classification Models

The document discusses several classification models: Logistic regression is used for binary classification problems and models the probability of class membership. Discriminant analysis finds linear combinations of features that best separate classes, assuming normal distributions. Naive Bayes assumes independence between predictors. Support vector machines find the optimal separating hyperplane between classes. Plots like ROC curves and decision boundaries are used to evaluate some models.

Uploaded by

Meis Educational
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Classification Models

The document discusses several classification models: Logistic regression is used for binary classification problems and models the probability of class membership. Discriminant analysis finds linear combinations of features that best separate classes, assuming normal distributions. Naive Bayes assumes independence between predictors. Support vector machines find the optimal separating hyperplane between classes. Plots like ROC curves and decision boundaries are used to evaluate some models.

Uploaded by

Meis Educational
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Classification Models

Logistic Regression:

 Explanation:

 used for binary classification problems (i.e., response variable is binary (0 or 1)).

 It models the probability that an instance belongs to a particular category.

 In this case, we are predicting whether a car's miles per gallon (mpg) is above or below the mean value.

 The logistic function (sigmoid) is used to map predictions to probabilities.

 When to Use:

 When the relationship between the predictor variables and the response variable is approximately
linear.

 Logistic Regression is chosen when the response variable is categorical, and in this example, it's whether
the mpg is above or below the mean.

 Suitable for problems where the outcome is binary, like whether an email is spam or not.

 Predictors:

 Predictor variables should be numeric or categorical.

# =======================================

# R code: Logistic Regression

# =======================================

Step 1: Load Libraries

library(caret)

library(dplyr)

library(zoo) # used in finding and replacing NA values with mean

Step 2: Load Dataset

data <- mtcars

Step 3: Handle Missing Values, Scaling, and Normalization

# Check for missing values

summary(data)

# If there are missing values:

1) use na.omit() (bad) or 2) replace them with mean or median (BEST)

# Specify pre-processing methods

preprocess_params <- preProcess(data, method = c("mean", "dummy")) # uses mean

preprocess_params <- preProcess(data, method = c("medianImpute", "dummy")) # uses median

# Apply the pre-processing to replace missing values


data <- predict(preprocess_params, newdata = data)

# If scaling or normalization is needed, you can use:

# data <- scale(data) # for scaling

# data <- scale(data, center = FALSE) # for normalization

Step 4: Data Splitting

# Set seed for reproducibility

set.seed(123)

# Split the data into training (80%) and testing (20%) sets

train_index <- createDataPartition(data$mpg, p = 0.8, list = FALSE)

train_data <- data[train_index, ]

test_data <- data[-train_index, ]

Step 5: Build Logistic Regression Model

log_model <- glm(mpg ~., data = train_data, family = "binomial")

Step 6: Model Summary or Plots

# Summary statistics

summary(log_model)

# Or you can create plots if applicable

Step 7: Make Predictions

predictions <- predict(log_model, newdata = test_data, type = "response")

Step 8: Model Evaluation Metrics

# Evaluate model accuracy and performance

conf_matrix <- confusionMatrix(predictions > 0.5, test_data$mpg > mean(data$mpg))

# Display the confusion matrix and other metrics

conf_matrix

=======================

Discriminant Analysis:

 Explanation:

 Discriminant Analysis is used when there are two or more classes and the goal is to find the linear
combination of features that best separates them.

 Assumes normal distribution of predictor variables within each class.


 When to Use:

 When you have more than two classes and you want to classify new observations into one of them.

 Predictors:

 Assumes continuous predictors that are normally distributed.

Naive Bayes Classifier:

 Explanation:

 Naive Bayes is a probabilistic algorithm based on Bayes' theorem, assuming independence between
predictors.

 Despite its "naive" assumption, it performs surprisingly well in many real-world situations.

 When to Use:

 Particularly effective for text classification (spam detection, sentiment analysis).

 Predictors:

 Works well with both categorical and continuous predictors.

Support Vector Machines (SVM):

 Explanation:

 SVM is a powerful classification algorithm that finds the hyperplane that best separates data points of
different classes.

 It works well in high-dimensional spaces and is effective in cases where the number of dimensions is
greater than the number of samples.

 When to Use:

 Useful for both linear and non-linear data.

 Effective when there is a clear margin of separation between classes.

 Predictors:

 Works with numeric predictors; it's essential to scale the data for SVM.

Plots:

 Logistic Regression and Discriminant Analysis:

 Commonly used plots include ROC curves, confusion matrices, and decision boundaries.

 SVM:

 SVM often involves visualizing decision boundaries in feature space.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy