Uni T - 2 - R Programming
Uni T - 2 - R Programming
summary(data)
• The summary() function automatically calculates the following summary statistics for the
vector:
• The following code shows how to use the summary() function to summarize the values in a
vector
• Logistic regression in R can be performed using the glm() function, which fits generalized
linear models.
Logistic regression is commonly used in various fields, including:
• Social sciences: Predicting voter behavior, crime rates, or social network outcomes.
p = 1 / (1 + e^(-z))
• Family - An R object which specifies the details of the model, and its value is binomial for
logistic regression.
• The row headers in the confusion matrix represent predicted values and column headers are
used to represent actual values.
• The Confusion matrix contains four cells as shown in the below image.
• True Positive – Indicates how many positive values are predicted as positive only by the
model.
• False Positive – Indicates how many negative values are predicted as positive values by the
model.
• False Negative – Indicates how many positive values are predicted as negative values by the
model.
• True Negative – Indicates how many negative values are predicted as negative only by the
model.
ConfusionMatrix() function
where
Factor()
• Factors are data structures that are implemented to categorize the data or represent
categorical data and store it on multiple levels.
Example 1
1 - Example 2
• Sensitivity and specificity are performance metrics used to evaluate the accuracy of a
classification model in R programming.
• The goal of a classification model is to learn the relationship between the input features and
the target class label, so that it can make accurate predictions on new, unseen data.
• Definition: Proportion of true positives (correctly predicted instances) among all actual
positive instances.
• Definition: Proportion of true negatives (correctly predicted instances) among all actual
negative instances.
Where:
• True Positive Rate (TPR): The proportion of actual positive instances correctly identified by
the model.
• False Positive Rate (FPR): The proportion of actual negative instances incorrectly identified
as positive by the model.
• Threshold: The cutoff value used to determine whether a prediction is positive or negative.
• Precision
• Recall (Sensitivity)
• Specificity
• F1-score
2.5. Recitation
• Reiteration: Repeating a process or operation, often using loops (e.g., for, while, repeat).
• Data Structures:
- Vector: A sequence of elements of the same data type, created using c()
- Data Frame: A table where columns can have different data types, created using
data.frame()
2. Basic Operators
• Assignment Operators: <-, = for assigning values to variables; <<- for global assignment
within functions
3. Control Structures
• Conditional Statements:
• Loops:
- for (variable in sequence) { ... }: Repeats code for each element in the sequence
4. Functions
• Defining Functions: Functions are defined using function(), allowing code to be reused:
Copy code
# Code block
return(result)
• Scope of Variables: Variables defined within functions are local by default, unless assigned
globally using <<-
5. Data Manipulation
• Data Frames: