Avcce QB Aml JSD - Ice
Avcce QB Aml JSD - Ice
Avcce QB Aml JSD - Ice
Objectives of machine learning – Human learning/ Machine learning – Types of Machine learning:-
Supervised Learning – Unsupervised learning – Regression – Classification – The Machine Learning
Process:- Data Collection and Preparation – Feature Selection – Algorithm Choice – Parameter and
Model Selection – Training – Evaluation – Bias-Variance Tradeoff – Underfitting and Over fitting
Problems.
2 – Marks
30. What does the term "evaluation metric" mean in machine learning?
31. List two common evaluation metrics for classification problems.
32. How is the performance of a regression model evaluated?
33. Define bias in machine learning models.
34. Define variance in machine learning models.
35. What is the bias-variance tradeoff?
36. What happens when a model has high bias?
37. What is overfitting?
38. Define underfitting.
39. How can overfitting be prevented?
40. Mention one method to deal with underfitting.
13-Mark Questions
1. Compare human learning and machine learning. Explain the objectives of machine
learning in detail with real-world examples.
o Discuss the differences between how humans learn and how machines learn.
o Mention various real-world applications of machine learning and how it achieves its
objectives.
2. Explain the types of machine learning with suitable examples.
o Discuss supervised learning, unsupervised learning, and reinforcement learning.
o Provide examples of problems solved by each type.
3. What is supervised learning? Differentiate between regression and classification. Give
examples of both.
o Define supervised learning.
o Explain regression and classification, providing detailed comparisons with examples.
4. Describe the machine learning process from data collection to evaluation.
o Explain each step in the machine learning process: data collection, preparation, feature
selection, algorithm choice, model selection, training, and evaluation.
5. What are the challenges of overfitting and underfitting in machine learning? How can
these be addressed?
o Define overfitting and underfitting.
o Discuss how each problem affects the model and how they can be prevented using
methods like cross-validation, regularization, and pruning.
6. Explain the importance of feature selection in machine learning. Discuss various
techniques for feature selection.
o Define feature selection and why it is essential.
o Explain methods like filter methods, wrapper methods, and embedded methods for
feature selection.
7. What is the bias-variance tradeoff? How does it affect model performance? Discuss
strategies to manage bias and variance.
o Define bias and variance.
o Explain how the bias-variance tradeoff impacts model performance, and mention
strategies like model complexity control, regularization, and cross-validation to balance
them.
15-Mark Questions
UNIT 2
Data quality – Data preprocessing: - Data Cleaning:– Handling missing data and noisy data – Data
integration:- Redundancy and correlation analysis – Continuous and Categorical Variables – Data
Reduction:- Dimensionality reduction (Linear Discriminant Analysis – Principal Components Analysis)
2 – Marks
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
4
26. Mention one key advantage of using PCA for dimensionality reduction.
27. Define feature selection.
28. What is the role of data preprocessing in machine learning?
29. Mention two challenges faced in data preprocessing.
30. What is the difference between structured and unstructured data?
31. What is the purpose of automated data preprocessing pipelines?
32. Why is handling missing data important in data preprocessing?
33. Mention two techniques for handling noisy data.
34. Why is correlation analysis important in data integration?
35. What is the role of PCA in reducing dimensionality in high-dimensional datasets?
13-Mark Questions
1. Discuss the various techniques for handling missing data and noisy data in the data
preprocessing stage.
o Provide a detailed explanation of the methods used to handle missing and noisy data, and
discuss how they affect the quality of a dataset.
2. What is data integration? Explain the challenges and techniques used in data integration,
with special emphasis on redundancy and correlation analysis.
o Discuss the significance of data integration in data preprocessing, the challenges that arise
when integrating data from multiple sources, and how correlation analysis helps detect
redundancy.
3. Explain the steps involved in Principal Component Analysis (PCA) and its importance in
dimensionality reduction.
o Discuss the mathematical concepts of PCA, including eigenvalues, eigenvectors, and how PCA
reduces dimensionality without losing significant information.
4. Differentiate between continuous and categorical variables. Discuss how these variables
are preprocessed for machine learning models.
o Compare continuous and categorical variables and describe techniques such as binning,
encoding (label, one-hot), and scaling, which are used to prepare them for machine learning
models.
5. What is data reduction? Explain the various methods used for data reduction, with a
focus on dimensionality reduction techniques such as PCA and LDA.
o Discuss the concept of data reduction, including feature selection and extraction techniques.
Explain in detail the workings of PCA and LDA for reducing the dimensionality of large
datasets.
15-Mark Questions
1. Discuss the importance of data quality in machine learning. Explain in detail how data
cleaning methods such as handling missing data, dealing with noisy data, and integration
techniques can improve data quality.
o Discuss how poor data quality affects model performance. Explain techniques such as missing
value imputation, handling outliers, and integrating data from multiple sources to enhance
data quality.
2. Describe the process of data integration in detail. How does redundancy and correlation
analysis play a role in improving the quality of integrated data?
o Explain how data from multiple sources is integrated into a unified dataset, the issues that
arise during integration, and the role of redundancy and correlation analysis in identifying and
resolving duplicate or irrelevant data.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
5
UNIT - 3
Linearly separable and nonlinearly separable populations – Logistic Regression – Radial Basis Function
Network – Support Vector Machines: - Kernels – Risk and Loss Functions - Support Vector Machine
Algorithm – Multi Class Classification – Support Vector Regression.
2-mark
13-Mark Questions
1. Explain the difference between linearly separable and nonlinearly separable populations.
Discuss how this affects the choice of a classification algorithm.
o Define linearly and nonlinearly separable data with examples.
o Discuss algorithms suited for each type, such as Perceptron for linear and SVM with kernel for
nonlinear cases.
2. Describe the logistic regression algorithm in detail. How does it differ from linear
regression, and how is the sigmoid function used to model probabilities?
o Explain the logistic regression model, the sigmoid function, and its use in binary classification.
o Contrast logistic regression with linear regression, especially in terms of output and decision
boundary.
3. Explain the Radial Basis Function Network (RBFN) architecture in detail. How does it
differ from other neural networks, and how is it used for classification tasks?
o Detail the structure of an RBFN, including the role of radial basis functions.
o Compare RBFN to traditional feedforward neural networks like MLP.
o Discuss the application of RBFN in classification problems.
4. Explain the concept of Support Vector Machines (SVM) for binary classification. Discuss
the importance of support vectors, margin maximization, and the role of the kernel trick
in handling nonlinearly separable data.
o Provide a detailed explanation of the SVM algorithm.
o Discuss the importance of support vectors and maximizing the margin.
o Explain the kernel trick and different types of kernels (linear, polynomial, RBF).
5. Discuss the concept of loss and risk functions in machine learning, particularly in the
context of Support Vector Machines. Explain the hinge loss function and its role in SVM
optimization.
o Define loss and risk in supervised learning.
o Explain the hinge loss function and how SVM minimizes risk through structural risk
minimization.
6. Explain the One-vs-All (OvA) and One-vs-One (OvO) approaches for multi-class
classification using Support Vector Machines. Compare their computational complexity
and performance.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
7
15-Mark Questions
1. Explain in detail the Support Vector Machine (SVM) algorithm for binary classification.
How does SVM find the optimal hyperplane, and what role do support vectors play in the
algorithm? Discuss the concept of the soft margin and its importance in handling
nonlinearly separable data.
o Provide a complete explanation of the SVM algorithm, including mathematical formulations.
o Explain the significance of support vectors and how SVM maximizes the margin.
o Discuss the soft margin and its importance for non-linearly separable data.
2. Discuss logistic regression in depth, focusing on its application to binary and multi-class
classification problems. Explain how logistic regression works, the concept of maximum
likelihood estimation, and how it is extended for multi-class classification.
o Provide a thorough explanation of logistic regression, including its mathematical model.
o Explain maximum likelihood estimation (MLE) for parameter estimation.
o Discuss techniques such as One-vs-All (OvA) and softmax regression for multi-class
classification.
3. Compare and contrast Support Vector Machines (SVM) with Radial Basis Function
Networks (RBFN). Discuss their differences in terms of architecture, training,
performance, and suitability for different types of data.
o Provide a detailed comparison of SVM and RBFN.
o Discuss how each algorithm handles nonlinearly separable data.
o Compare their architecture, learning processes, and applications.
4. Explain the concept of kernels in SVM in detail. Discuss different types of kernels, such
as linear, polynomial, and Radial Basis Function (RBF) kernels, and their role in
transforming data into higher-dimensional spaces. Provide examples of when each kernel
would be appropriate.
o Explain the kernel trick in SVM and how it helps handle nonlinear data.
o Discuss different types of kernels, including their mathematical formulations.
o Provide examples where each kernel type is effective, such as text data, image classification,
etc.
5. Discuss the application of Support Vector Regression (SVR) in real-world regression
problems. Provide a step-by-step explanation of the SVR algorithm, and describe
scenarios where SVR is preferred over traditional regression methods.
o Provide a step-by-step explanation of the SVR algorithm.
o Discuss the practical advantages of SVR, such as its ability to handle nonlinearly distributed
data and outliers.
o Provide real-world examples where SVR outperforms traditional regression methods (e.g.,
financial time-series prediction, stock price prediction).
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
8
2 Marks Questions:
13 Marks Questions:
1. Explain the K-means algorithm with an example. Discuss its advantages and limitations.
2. Describe the process of Mean Shift Clustering and explain how it differs from K-means.
3. Compare and contrast partitioning methods with hierarchical clustering.
4. Explain Gaussian Mixture Models (GMM) for clustering. How do GMMs handle overlapping
clusters?
5. Discuss the problems and challenges faced while clustering high-dimensional data.
6. Describe hierarchical clustering in detail and explain how it can be visualized using a
dendrogram.
7. Explain how clustering can be used in real-world applications such as market segmentation or
image recognition.
8. Discuss how different distance metrics (e.g., Euclidean, Manhattan) affect clustering results.
15 Marks Questions:
1. Discuss in detail the K-means clustering algorithm. Highlight its working mechanism,
convergence criteria, and how to handle the initial centroids problem.
2. Describe the hierarchical clustering approach in depth. Explain the differences between
agglomerative and divisive hierarchical clustering with suitable diagrams.
3. Explain in detail the concept of Gaussian Mixture Models (GMM). How does GMM improve
over K-means in handling non-spherical clusters?
4. Provide a detailed explanation of clustering high-dimensional data. Discuss the curse of
dimensionality and how algorithms are adapted to deal with these challenges.
5. Compare K-means, Mean Shift, and Hierarchical clustering in terms of their approach,
efficiency, and use cases.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
9
2 Marks Questions:
13 Marks Questions:
1. Explain the architecture of a Multi-Layer Perceptron (MLP) and how it differs from a
single-layer perceptron. Discuss how each layer contributes to the overall function of the
network.
2. Describe the backpropagation algorithm in detail, including its working process for
updating weights. Illustrate with an example of a simple neural network.
3. Compare different activation functions (such as sigmoid, ReLU, and tanh) used in neural
networks. Discuss their advantages, disadvantages, and typical use cases.
4. Explain the types of loss functions used in neural networks. Discuss the difference between
mean squared error (MSE) and cross-entropy loss, and when to use each.
5. What is the gradient descent algorithm? Explain how it is used to minimize the loss function
in neural networks. Discuss the concept of convergence and the impact of learning rate.
6. Describe stochastic gradient descent (SGD). Explain how it differs from the traditional
gradient descent method and when it is beneficial to use.
7. Discuss the role of optimization in neural networks. Explain how optimization techniques
like gradient descent are used to improve the model's performance and accuracy.
8. Case Study: Discuss a real-world case study involving neural networks (such as handwriting
recognition or image classification). Explain how the neural network was trained, the
architecture used, and the results achieved.
15 Marks Questions:
1. Develop and explain a case study involving a neural network applied to a real-world problem
(e.g., fraud detection, speech recognition, etc.). Include the problem statement, dataset
description, network architecture, training process, and evaluation metrics used.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
10
2. Explain the gradient descent and stochastic gradient descent algorithms in detail.
Compare their convergence behavior, computational efficiency, and how they handle large
datasets.
3. Explain the Backpropagation Learning Algorithm in-depth. Illustrate with a flowchart, and
discuss how error signals are propagated through the layers. Provide an example of how this
algorithm works step by step.
4. Discuss the types of loss functions and activation functions commonly used in deep learning
models. Explain how the choice of these functions affects the model’s performance and output
with examples.
5. Describe the complete training process of a Multi-Layer Perceptron (MLP), including
initialization, forward propagation, loss calculation, backpropagation, and weight updates using
gradient descent. Include examples and illustrations to support your answer.
*****************
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING