Aiml QB With Ans - 075736
Aiml QB With Ans - 075736
Instance X X Class
No. 1 2 label
1 T T 1
2 T T 1
3 T F 0
4 F F 1
5 F T 0
6 F T 0
Q4 How does the random forest tree work for classification? CO3 BL2 5
1. Data Sampling: Random Forest uses a technique
called "Bootstrap Aggregating" or "Bagging." It
creates multiple subsets of the original training
data by randomly sampling with replacement.
Each subset is of the same size as the original
data but may contain some duplicate instances
and exclude some others.
2. Building Decision Trees: For each subset of the
data, a decision tree is constructed. However,
Random Forest introduces an additional
randomization element during the construction
of each tree.
3. Feature Randomness: At each node of the
decision tree, instead of considering all the
available features, Random Forest randomly
selects a subset of features to be considered for
splitting. This feature randomness helps to
introduce diversity among the trees.
4. Decision Tree Construction: Using the subset of
features, the decision tree is constructed by
recursively selecting the best attribute to split the
data based on a criterion like information gain or
Gini index.
5. Voting: Once all the decision trees are
constructed, when a new instance needs to be
classified, it is passed through each decision tree.
Each tree independently provides a prediction or
class label. The class label that receives the
majority of votes from the trees is considered the
final prediction of the Random Forest.
Bagging:
Boosting:
Q7 Use Naïve Bayes algorithm to determine whether a Red SUV CO3 BL3 10
Domestic car is a stolen car or not using the following data:
Q10 Differentiate between logistic regression and linear regression? CO3 BL2 5
Logistic regression and linear regression are both
statistical modeling techniques used for different types
of problems. Here are the key differences between
logistic regression and linear regression:
Q11 How does K Nearest Neighbour (KNN) algorithm works? CO3 BL2 6
The K-Nearest Neighbors (KNN) algorithm is a
supervised learning algorithm used for classification and
regression tasks. It makes predictions based on the
similarity or proximity of instances in the feature space.
Here's an explanation of how the KNN algorithm works:
K-Means:
Q2 Compare Training data vs. Validation data vs. Test data. CO4 BL2 6
Training data, validation data, and test data are all
subsets of the overall dataset that serve different
purposes in the machine learning workflow. Here's a
comparison of these three types of data:
1. Training Data:
• Purpose: Training data is used to build the
machine learning model. It is the primary
dataset used to train the model's
parameters and learn the underlying
patterns or relationships between the
features and the target variable.
• Size: The training data is typically the
largest subset of the dataset, often
comprising 60% to 80% of the total data.
• Label Availability: The training data
includes both the input features and the
corresponding known output (target or
label). It is labeled data in supervised
learning tasks.
• Model Usage: The model learns from the
training data and adjusts its internal
parameters to minimize the training error.
The model aims to capture the patterns
present in the training data to make
accurate predictions on new, unseen data.
2. Validation Data:
• Purpose: Validation data is used to tune
the hyperparameters and evaluate the
performance of the model during the
training phase. It helps in selecting the
best configuration of the model and
avoiding overfitting.
• Size: The validation data is a smaller
subset of the dataset, typically around
10% to 20% of the total data.
• Label Availability: Like the training data,
the validation data is labeled data in
supervised learning. It includes both the
input features and the corresponding
known output.
• Model Usage: During training, the model's
hyperparameters are adjusted based on its
performance on the validation data. The
model's generalization ability is assessed
on the validation data, and adjustments
are made to improve its performance on
unseen data.
3. Test Data:
• Purpose: Test data is used to evaluate the
final performance and generalization
capability of the trained model. It provides
an unbiased assessment of the model's
predictive ability on unseen data.
• Size: The test data is also a separate
subset of the dataset, usually around 10%
to 20% of the total data.
• Label Availability: The test data includes
the input features but does not include
the corresponding output labels. It is
unlabeled data in most cases, as it is used
to assess the model's performance.
• Model Usage: The trained model is used
to make predictions on the test data, and
its performance metrics (such as accuracy,
precision, recall, or mean squared error)
are calculated. The test data provides an
estimate of the model's performance in
real-world scenarios.
It is important to maintain the independence and
integrity of each subset. The validation and test data
should not be used during the model training process
to avoid biased performance estimation and overfitting.
Proper division of data into these subsets helps in
assessing the model's performance, selecting the best
model configuration, and ensuring its ability to
generalize to new, unseen data.
Q4 Explain 1. Overfitting 2. Under fitting 3. Exact Fit Model with CO4 BL2 6
suitable sketch
2. Overfitting: Overfitting occurs when a machine
learning model learns the training data too well,
to the point that it captures the noise or random
fluctuations in the data, instead of the underlying
patterns. In other words, the model becomes too
complex and specific to the training data, and as
a result, it fails to generalize well to new, unseen
data.
3. Characteristics:
1. The model performs very well on the
training data but poorly on the testing or
validation data.
2. It shows low bias but high variance.
3. The model may exhibit excessive
complexity, capturing noise or outliers as
important patterns.
4. Overfitting can lead to poor
generalization, making the model
ineffective in making accurate predictions
on new data.
4. Underfitting: Underfitting occurs when a
machine learning model is too simplistic to
capture the underlying patterns in the training
data. It fails to learn the relationships between
the features and the target variable, resulting in
poor performance both on the training and
testing data.
5. Characteristics:
1. The model shows high bias but low
variance.
2. It fails to capture the complexity or
nuances of the data, resulting in an overly
generalized model.
3. Underfitting can occur when the model is
too simple or lacks the necessary features
to capture the underlying relationships.
4. The model may have high error rates both
on the training and testing data.
6. Exact Fit Model: An exact fit model refers to a
situation where the machine learning model
perfectly matches the training data without any
errors. In this case, the model has learned the
patterns and relationships present in the training
data with 100% accuracy. However, an exact fit
model is not desirable as it is highly unlikely to
generalize well to new, unseen data.
Characteristics:
• The model accurately predicts the target variable
for every instance in the training data.
• There is no error or discrepancy between the
model's predictions and the true values in the
training data.
• An exact fit model is often an indication of
overfitting since it fails to account for the noise
or randomness in the data.
^ Overfitting
| .
| .
| .
| .
| .
|.
+----------------------------->
Model Complexity
^ Underfitting
|
|
|
|
|
|
+----------------------------->
Model Complexity
^ Exact Fit
|
|
|
|
|
|
+----------------------------->
Model Complexity
Q5 What is hyper parameter tuning? Enlist different hyper CO4 BL3 6
parameter tuning algorithms. Explain any two hyper parameters
in Random Forest algorithm.
Hyperparameter tuning refers to the process of
selecting the optimal values for the hyperparameters of
a machine learning algorithm. Hyperparameters are
settings or configurations of the model that are not
learned from the data but are set by the user before the
training process. Tuning these hyperparameters is
crucial to finding the best performing model and
improving its generalization ability.
Q6 What do you understand from On policy and Off policy CO5 BL2 6
algorithm in reinforcement learning? Explain Q- learning
algorithm with flow diagram.
In reinforcement learning, on-policy and off-policy
algorithms are two different approaches for learning
and updating the agent's policy.
sqlCopy code
Start Initialize Q -function arbitrarily Repeat until convergence:
Select an action based on exploration or the learned Q -function
Execute the action in the environment Observe the next state and
immediate reward Update Q -value using the Q - learning update
rule End
σ(x) = 1 / (1 + exp(-x))
1|
0.5| *****
| ***
| ***
| ***
| ***
| ***
| ***
| ***
0 |________________________
-5 0 5
Q11 A neuron with 4 inputs has the weights 1; 2; 3; 4 and bias 0. The CO5 BL3 6
activation function is linear, say the function f(x) = 2x. If the
inputs are 4; 8; 5; 6, compute the output. Draw a diagram
representing the neuron.
To compute the output of the neuron, we'll use the
given weights, bias, and activation function.
Inputs: 4, 8, 5, 6
Input 1 (4)
|
|-------\
| \
| \
Weight 1 (1) \
| \
| \
+--\ Neuron
| \ Output
| +--\
| | \ /
Weight 2 (2) +---\ /
| | X2 * Activation
| | / \ Function
+--/ \
| /
| +--/
| |
Weight 3 (3) |
| |
| \
| +--\
| | \
Weight 4 (4) |
| |
| X4
| /
+--/
|
Input 2 (8)
Input 3 (5)
Input 4 (6)
In the diagram, each input is multiplied by its corresponding
weight, and the results are summed together at the neuron.
The activation function is applied to the weighted sum to
compute the output.
Q12 Explain working of Convolutional Neural Network (CNN) with CO5 BL2 6
flow diagram. Define 1. Padding 2. Striding in CNN
Convolutional Neural Networks (CNNs) are a specialized
type of neural network commonly used for image
recognition and computer vision tasks. They are
designed to automatically learn and extract hierarchical
features from input data. Here's a simplified explanation
of how CNNs work, along with flow diagrams:
1. Convolutional Layer:
• The input to a CNN is an image or a
feature map from a previous layer.
• The convolutional layer applies multiple
filters (also known as kernels) to the input.
• Each filter convolves across the input,
performing element-wise multiplication
and summing the results to produce a
feature map.
• The filters learn to detect specific features
or patterns, such as edges, corners, or
textures.
• The resulting feature maps capture local
spatial information.
Flow diagram for a convolutional layer:
Input --> Convolutional Layer --> Feature Maps
Activation Function:
• Each element in the feature map passes through
an activation function, typically a rectified linear
unit (ReLU).
• The activation function introduces non-linearity,
allowing the network to learn complex
relationships between features.
Flow diagram for an activation function:
Feature Maps --> Activation Function
Pooling Layer:
• The pooling layer reduces the spatial dimensions
of the feature maps, reducing the computational
complexity.
• Common pooling techniques include max
pooling or average pooling.
• Pooling helps to make the network invariant to
small translations and variations in the input.
• It also aids in extracting higher-level features by
summarizing the information in local
neighborhoods.
Flow diagram for a pooling layer:
Feature Maps --> Pooling Layer --> Pooled Feature Maps
Fully Connected Layer:
• After several convolutional and pooling layers,
the network typically ends with one or more fully
connected layers.
• Each neuron in the fully connected layer is
connected to every neuron in the previous layer.
• These layers capture global information from the
extracted features and make predictions based
on the learned representations.
• Activation functions, such as ReLU or softmax,
are applied to the outputs of the fully connected
layers.
Flow diagram for a fully connected layer:
Pooled Feature Maps --> Fully Connected Layer --> Output
Padding: Padding is a technique used in CNNs to
preserve spatial dimensions during convolutional
operations. It involves adding additional border pixels
around the input image or feature map with zero values.
Padding helps to retain information at the borders and
reduces the loss of spatial resolution. It is commonly
used to ensure that the output size matches the input
size and to mitigate the shrinking of feature maps.
Q4 Explain the steps involved in material inspection? How machine CO6 BL2 6
learning can be implemented in material inspection.
Material inspection involves assessing the quality and
properties of materials to ensure they meet specific
standards and requirements. Machine learning can be
implemented in material inspection to automate and
enhance the inspection process. Here are the steps
involved in material inspection and how machine
learning can be applied:
Q5 Explain different applications in health care where AIML can be CO6 BL2 5
used.
Artificial Intelligence (AI) and Machine Learning (ML)
have numerous applications in healthcare that can
revolutionize the industry. Here are some areas where
AI and ML can be used in healthcare: