Avcce QB Aml JSD - Ice

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

1

A.V.C. COLLEGE OF ENGINEERING


MAYILADUTHURAI
DEPARTMENT OF INSTRUMENTATION AND CONTROL ENGINEERING

QUESTION BANK (REGULATION 2021)

EI3752 APPLIED MACHINE LEARNING

Prepared By: Dr. J. Sharmila Devi, AP/ICE

UNIT I INTRODUCTION TO MACHINE LEARNING

Objectives of machine learning – Human learning/ Machine learning – Types of Machine learning:-
Supervised Learning – Unsupervised learning – Regression – Classification – The Machine Learning
Process:- Data Collection and Preparation – Feature Selection – Algorithm Choice – Parameter and
Model Selection – Training – Evaluation – Bias-Variance Tradeoff – Underfitting and Over fitting
Problems.

2 – Marks

1. Define machine learning.


2. What are the main objectives of machine learning?
3. How does human learning differ from machine learning?
4. Mention two real-world applications of machine learning.
5. What is supervised learning?
6. Define unsupervised learning.
7. List two differences between supervised and unsupervised learning.
8. What is reinforcement learning?
9. What is regression in machine learning?
10. Define classification in the context of supervised learning.
11. Give an example of a regression problem.
12. Give an example of a classification problem.
13. What is clustering in unsupervised learning?
14. Name two clustering algorithms.
15. What is dimensionality reduction?
16. List one application of unsupervised learning.
17. What are the key stages of the machine learning process?
18. Why is data collection important in machine learning?
19. Define feature selection.
20. Mention one reason why data preparation is important.
21. What is the significance of choosing the right algorithm in machine learning?
22. List two commonly used algorithms in machine learning.
23. What is the purpose of the k-Nearest Neighbors (k-NN) algorithm?
24. What kind of problems can Decision Trees be applied to?
25. Differentiate between hyperparameters and model parameters.
26. What is cross-validation?
27. Why is model selection important in machine learning?
28. Define grid search in hyperparameter tuning.
29. What is training in machine learning?
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
2

30. What does the term "evaluation metric" mean in machine learning?
31. List two common evaluation metrics for classification problems.
32. How is the performance of a regression model evaluated?
33. Define bias in machine learning models.
34. Define variance in machine learning models.
35. What is the bias-variance tradeoff?
36. What happens when a model has high bias?
37. What is overfitting?
38. Define underfitting.
39. How can overfitting be prevented?
40. Mention one method to deal with underfitting.

13-Mark Questions

1. Compare human learning and machine learning. Explain the objectives of machine
learning in detail with real-world examples.
o Discuss the differences between how humans learn and how machines learn.
o Mention various real-world applications of machine learning and how it achieves its
objectives.
2. Explain the types of machine learning with suitable examples.
o Discuss supervised learning, unsupervised learning, and reinforcement learning.
o Provide examples of problems solved by each type.
3. What is supervised learning? Differentiate between regression and classification. Give
examples of both.
o Define supervised learning.
o Explain regression and classification, providing detailed comparisons with examples.
4. Describe the machine learning process from data collection to evaluation.
o Explain each step in the machine learning process: data collection, preparation, feature
selection, algorithm choice, model selection, training, and evaluation.
5. What are the challenges of overfitting and underfitting in machine learning? How can
these be addressed?
o Define overfitting and underfitting.
o Discuss how each problem affects the model and how they can be prevented using
methods like cross-validation, regularization, and pruning.
6. Explain the importance of feature selection in machine learning. Discuss various
techniques for feature selection.
o Define feature selection and why it is essential.
o Explain methods like filter methods, wrapper methods, and embedded methods for
feature selection.
7. What is the bias-variance tradeoff? How does it affect model performance? Discuss
strategies to manage bias and variance.
o Define bias and variance.
o Explain how the bias-variance tradeoff impacts model performance, and mention
strategies like model complexity control, regularization, and cross-validation to balance
them.

15-Mark Questions

1. Discuss in detail the types of machine learning: supervised, unsupervised, and


reinforcement learning. Include algorithms, applications, and challenges.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
3

oProvide a detailed explanation of the three types of learning.


oInclude examples of algorithms (e.g., linear regression, K-means, Q-learning),
applications, and the challenges each type faces.
2. What is the machine learning process? Discuss each phase of the process with examples
and explain how these phases interact with each other.
o Go through the stages of the machine learning process (data collection, preparation,
feature selection, algorithm choice, model training, evaluation).
o Explain with examples how these phases are interrelated and affect one another.
3. Compare different supervised learning algorithms (e.g., Decision Trees, k-Nearest
Neighbors, Support Vector Machines) with respect to their strengths, weaknesses, and
use cases.
o Explain the algorithms in detail.
o Compare their strengths, weaknesses, and typical scenarios where each is most
appropriate.

UNIT 2

UNIT II DATA PREPROCESSING

Data quality – Data preprocessing: - Data Cleaning:– Handling missing data and noisy data – Data
integration:- Redundancy and correlation analysis – Continuous and Categorical Variables – Data
Reduction:- Dimensionality reduction (Linear Discriminant Analysis – Principal Components Analysis)

2 – Marks

1. What is data quality?


2. List two common problems that affect data quality.
3. Define data cleaning.
4. What are missing values in a dataset?
5. Name two methods to handle missing data.
6. What is noisy data?
7. What is binning in handling noisy data?
8. How does clustering help in handling noisy data?
9. What is data integration?
10. Define redundancy in the context of data integration.
11. What is correlation analysis?
12. Differentiate between positive and negative correlation.
13. How can correlation help in detecting redundant features?
14. What is the difference between continuous and categorical variables?
15. Define continuous variables with an example.
16. Define categorical variables with an example.
17. What is the process of binning?
18. What is label encoding in categorical variable transformation?
19. What is one-hot encoding in categorical variable transformation?
20. What is data reduction?
21. Define dimensionality reduction.
22. What is Principal Component Analysis (PCA)?
23. What is Linear Discriminant Analysis (LDA)?
24. Differentiate between PCA and LDA.
25. What are eigenvalues and eigenvectors in PCA?

Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
4

26. Mention one key advantage of using PCA for dimensionality reduction.
27. Define feature selection.
28. What is the role of data preprocessing in machine learning?
29. Mention two challenges faced in data preprocessing.
30. What is the difference between structured and unstructured data?
31. What is the purpose of automated data preprocessing pipelines?
32. Why is handling missing data important in data preprocessing?
33. Mention two techniques for handling noisy data.
34. Why is correlation analysis important in data integration?
35. What is the role of PCA in reducing dimensionality in high-dimensional datasets?

13-Mark Questions

1. Discuss the various techniques for handling missing data and noisy data in the data
preprocessing stage.
o Provide a detailed explanation of the methods used to handle missing and noisy data, and
discuss how they affect the quality of a dataset.
2. What is data integration? Explain the challenges and techniques used in data integration,
with special emphasis on redundancy and correlation analysis.
o Discuss the significance of data integration in data preprocessing, the challenges that arise
when integrating data from multiple sources, and how correlation analysis helps detect
redundancy.
3. Explain the steps involved in Principal Component Analysis (PCA) and its importance in
dimensionality reduction.
o Discuss the mathematical concepts of PCA, including eigenvalues, eigenvectors, and how PCA
reduces dimensionality without losing significant information.
4. Differentiate between continuous and categorical variables. Discuss how these variables
are preprocessed for machine learning models.
o Compare continuous and categorical variables and describe techniques such as binning,
encoding (label, one-hot), and scaling, which are used to prepare them for machine learning
models.
5. What is data reduction? Explain the various methods used for data reduction, with a
focus on dimensionality reduction techniques such as PCA and LDA.
o Discuss the concept of data reduction, including feature selection and extraction techniques.
Explain in detail the workings of PCA and LDA for reducing the dimensionality of large
datasets.

15-Mark Questions

1. Discuss the importance of data quality in machine learning. Explain in detail how data
cleaning methods such as handling missing data, dealing with noisy data, and integration
techniques can improve data quality.
o Discuss how poor data quality affects model performance. Explain techniques such as missing
value imputation, handling outliers, and integrating data from multiple sources to enhance
data quality.
2. Describe the process of data integration in detail. How does redundancy and correlation
analysis play a role in improving the quality of integrated data?
o Explain how data from multiple sources is integrated into a unified dataset, the issues that
arise during integration, and the role of redundancy and correlation analysis in identifying and
resolving duplicate or irrelevant data.

Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
5

3. Explain dimensionality reduction in detail with a focus on Principal Component Analysis


(PCA) and Linear Discriminant Analysis (LDA). Compare and contrast these two
techniques in terms of their objectives, methodology, and applications.
o Provide a detailed explanation of PCA and LDA, including their mathematical basis and steps.
Highlight how they differ in terms of their goals (variance maximization vs class separation)
and when to apply each method in real-world scenarios.
4. What are the key challenges involved in preprocessing continuous and categorical
variables? Describe the different techniques used to transform and preprocess both types
of variables for machine learning models.
o Discuss the differences between continuous and categorical variables, and the challenges in
transforming them for analysis. Explain techniques such as normalization, standardization,
binning, and encoding methods (label encoding, one-hot encoding) in detail.
5. Explain the concept of data reduction and why it is important in machine learning.
Discuss the various methods of data reduction, with a focus on dimensionality reduction
techniques and their impact on model performance.
o Discuss the need for reducing large datasets to improve computational efficiency and model
performance. Explain both feature selection and extraction methods, highlighting
dimensionality reduction techniques like PCA, LDA, and their practical applications.

UNIT - 3

Linearly separable and nonlinearly separable populations – Logistic Regression – Radial Basis Function
Network – Support Vector Machines: - Kernels – Risk and Loss Functions - Support Vector Machine
Algorithm – Multi Class Classification – Support Vector Regression.

2-mark

1. What is meant by linearly separable data?


2. Define nonlinearly separable data.
3. What is the importance of linear separability in supervised learning?
4. Give one example of a linearly separable dataset.
5. Mention one technique to classify nonlinearly separable data.
6. What is logistic regression?
7. Write the equation of the logistic regression model.
8. What is the sigmoid function in logistic regression?
9. State one advantage of logistic regression over linear regression.
10. How does logistic regression handle binary classification?
11. What is the purpose of the likelihood function in logistic regression?
12. What is the decision boundary in logistic regression?
13. Name two techniques for extending logistic regression to multi-class classification.
14. What is a Radial Basis Function Network (RBFN)?
15. What is the role of the radial basis function in RBFN?
16. Mention one difference between RBFN and Multi-Layer Perceptron (MLP).
17. What is the Gaussian function used for in RBFN?
18. What is the hidden layer in an RBFN responsible for?

19. What is the kernel trick in SVM?


20. Name two types of kernel functions used in SVM.
21. What is the role of a kernel in SVM?
22. Define a linear kernel in SVM.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
6

23. What does the RBF kernel do in SVM?


24. What is the hinge loss function in SVM?
25. What is empirical risk in supervised learning?
26. What is structural risk minimization in SVM?
27. What are support vectors in SVM?
28. What is the objective of the SVM algorithm?
29. Define a soft margin in SVM.
30. What is the role of the parameter CCC in SVM?
31. What is a hyperplane in the context of SVM?
32. What is One-vs-All (OvA) in multi-class classification?
33. What is One-vs-One (OvO) in multi-class classification?
34. How does SVM handle multi-class classification?
35. What is the main challenge in applying SVM to multi-class classification?
36. What is Support Vector Regression (SVR)?
37. What is the epsilon-insensitive loss function in SVR?
38. Mention one key difference between SVR and traditional linear regression.
39. What is the role of support vectors in SVR?
40. How does SVR handle outliers differently from traditional regression methods?

13-Mark Questions

1. Explain the difference between linearly separable and nonlinearly separable populations.
Discuss how this affects the choice of a classification algorithm.
o Define linearly and nonlinearly separable data with examples.
o Discuss algorithms suited for each type, such as Perceptron for linear and SVM with kernel for
nonlinear cases.
2. Describe the logistic regression algorithm in detail. How does it differ from linear
regression, and how is the sigmoid function used to model probabilities?
o Explain the logistic regression model, the sigmoid function, and its use in binary classification.
o Contrast logistic regression with linear regression, especially in terms of output and decision
boundary.
3. Explain the Radial Basis Function Network (RBFN) architecture in detail. How does it
differ from other neural networks, and how is it used for classification tasks?
o Detail the structure of an RBFN, including the role of radial basis functions.
o Compare RBFN to traditional feedforward neural networks like MLP.
o Discuss the application of RBFN in classification problems.
4. Explain the concept of Support Vector Machines (SVM) for binary classification. Discuss
the importance of support vectors, margin maximization, and the role of the kernel trick
in handling nonlinearly separable data.
o Provide a detailed explanation of the SVM algorithm.
o Discuss the importance of support vectors and maximizing the margin.
o Explain the kernel trick and different types of kernels (linear, polynomial, RBF).
5. Discuss the concept of loss and risk functions in machine learning, particularly in the
context of Support Vector Machines. Explain the hinge loss function and its role in SVM
optimization.
o Define loss and risk in supervised learning.
o Explain the hinge loss function and how SVM minimizes risk through structural risk
minimization.
6. Explain the One-vs-All (OvA) and One-vs-One (OvO) approaches for multi-class
classification using Support Vector Machines. Compare their computational complexity
and performance.
Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
7

o Describe the OvA and OvO methods in detail.


o Compare the advantages and disadvantages of each approach in terms of efficiency and
complexity.
7. Describe the Support Vector Regression (SVR) algorithm and how it extends SVM to
regression tasks. Explain the epsilon-insensitive loss function and its significance in
handling outliers.
o Detail the SVR algorithm and how it differs from classification-focused SVM.
o Explain the role of the epsilon-insensitive loss function and how it handles noise and outliers
in regression tasks.

15-Mark Questions

1. Explain in detail the Support Vector Machine (SVM) algorithm for binary classification.
How does SVM find the optimal hyperplane, and what role do support vectors play in the
algorithm? Discuss the concept of the soft margin and its importance in handling
nonlinearly separable data.
o Provide a complete explanation of the SVM algorithm, including mathematical formulations.
o Explain the significance of support vectors and how SVM maximizes the margin.
o Discuss the soft margin and its importance for non-linearly separable data.
2. Discuss logistic regression in depth, focusing on its application to binary and multi-class
classification problems. Explain how logistic regression works, the concept of maximum
likelihood estimation, and how it is extended for multi-class classification.
o Provide a thorough explanation of logistic regression, including its mathematical model.
o Explain maximum likelihood estimation (MLE) for parameter estimation.
o Discuss techniques such as One-vs-All (OvA) and softmax regression for multi-class
classification.
3. Compare and contrast Support Vector Machines (SVM) with Radial Basis Function
Networks (RBFN). Discuss their differences in terms of architecture, training,
performance, and suitability for different types of data.
o Provide a detailed comparison of SVM and RBFN.
o Discuss how each algorithm handles nonlinearly separable data.
o Compare their architecture, learning processes, and applications.
4. Explain the concept of kernels in SVM in detail. Discuss different types of kernels, such
as linear, polynomial, and Radial Basis Function (RBF) kernels, and their role in
transforming data into higher-dimensional spaces. Provide examples of when each kernel
would be appropriate.
o Explain the kernel trick in SVM and how it helps handle nonlinear data.
o Discuss different types of kernels, including their mathematical formulations.
o Provide examples where each kernel type is effective, such as text data, image classification,
etc.
5. Discuss the application of Support Vector Regression (SVR) in real-world regression
problems. Provide a step-by-step explanation of the SVR algorithm, and describe
scenarios where SVR is preferred over traditional regression methods.
o Provide a step-by-step explanation of the SVR algorithm.
o Discuss the practical advantages of SVR, such as its ability to handle nonlinearly distributed
data and outliers.
o Provide real-world examples where SVR outperforms traditional regression methods (e.g.,
financial time-series prediction, stock price prediction).

Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
8

UNIT IV CLUSTERING AND UNSUPERVISED LEARNING

Introduction – Clustering:- Partitioning Methods:- K-means algorithm – Mean Shift Clustering –


Hierarchical clustering – Clustering using Gaussian Mixture Models – Clustering High-Dimensional
Data:- Problems – Challenges

2 Marks Questions:

1. What is unsupervised learning?


2. Define clustering.
3. What is the main objective of clustering?
4. List two partitioning methods for clustering.
5. What are the key steps in the K-means algorithm?
6. Define Mean Shift Clustering.
7. What is hierarchical clustering?
8. What is the role of Gaussian Mixture Models (GMM) in clustering?
9. Mention one challenge in clustering high-dimensional data.
10. How is distance measured in clustering algorithms?
11. What is the significance of the initialization step in K-means?
12. Give an example of a real-world application of clustering.
13. What is dendrogram in hierarchical clustering?

13 Marks Questions:

1. Explain the K-means algorithm with an example. Discuss its advantages and limitations.
2. Describe the process of Mean Shift Clustering and explain how it differs from K-means.
3. Compare and contrast partitioning methods with hierarchical clustering.
4. Explain Gaussian Mixture Models (GMM) for clustering. How do GMMs handle overlapping
clusters?
5. Discuss the problems and challenges faced while clustering high-dimensional data.
6. Describe hierarchical clustering in detail and explain how it can be visualized using a
dendrogram.
7. Explain how clustering can be used in real-world applications such as market segmentation or
image recognition.
8. Discuss how different distance metrics (e.g., Euclidean, Manhattan) affect clustering results.

15 Marks Questions:

1. Discuss in detail the K-means clustering algorithm. Highlight its working mechanism,
convergence criteria, and how to handle the initial centroids problem.
2. Describe the hierarchical clustering approach in depth. Explain the differences between
agglomerative and divisive hierarchical clustering with suitable diagrams.
3. Explain in detail the concept of Gaussian Mixture Models (GMM). How does GMM improve
over K-means in handling non-spherical clusters?
4. Provide a detailed explanation of clustering high-dimensional data. Discuss the curse of
dimensionality and how algorithms are adapted to deal with these challenges.
5. Compare K-means, Mean Shift, and Hierarchical clustering in terms of their approach,
efficiency, and use cases.

Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
9

UNIT V NEURAL NETWORKS

Multi-Layer Perceptron – Backpropagation Learning Algorithm – Neural Network fundamentals –


Activation functions – Types of Loss Function – Optimization: Gradient Descent Algorithm – Stochastic
Gradient Descent – one case study.

2 Marks Questions:

1. Define a Multi-Layer Perceptron (MLP).


2. What is the purpose of an activation function in a neural network?
3. What is the backpropagation learning algorithm?
4. Differentiate between supervised and unsupervised learning.
5. What is a loss function in neural networks?
6. Define gradient descent.
7. What is the significance of the learning rate in gradient descent?
8. Differentiate between Gradient Descent and Stochastic Gradient Descent.
9. Explain overfitting in the context of neural networks.
10. What are the key types of activation functions used in neural networks?
11. Define the term "epoch" in neural network training.
12. What is the role of bias in a neural network?
13. Give an example of a real-world application of neural networks.

13 Marks Questions:

1. Explain the architecture of a Multi-Layer Perceptron (MLP) and how it differs from a
single-layer perceptron. Discuss how each layer contributes to the overall function of the
network.
2. Describe the backpropagation algorithm in detail, including its working process for
updating weights. Illustrate with an example of a simple neural network.
3. Compare different activation functions (such as sigmoid, ReLU, and tanh) used in neural
networks. Discuss their advantages, disadvantages, and typical use cases.
4. Explain the types of loss functions used in neural networks. Discuss the difference between
mean squared error (MSE) and cross-entropy loss, and when to use each.
5. What is the gradient descent algorithm? Explain how it is used to minimize the loss function
in neural networks. Discuss the concept of convergence and the impact of learning rate.
6. Describe stochastic gradient descent (SGD). Explain how it differs from the traditional
gradient descent method and when it is beneficial to use.
7. Discuss the role of optimization in neural networks. Explain how optimization techniques
like gradient descent are used to improve the model's performance and accuracy.
8. Case Study: Discuss a real-world case study involving neural networks (such as handwriting
recognition or image classification). Explain how the neural network was trained, the
architecture used, and the results achieved.

15 Marks Questions:

1. Develop and explain a case study involving a neural network applied to a real-world problem
(e.g., fraud detection, speech recognition, etc.). Include the problem statement, dataset
description, network architecture, training process, and evaluation metrics used.

Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING
10

2. Explain the gradient descent and stochastic gradient descent algorithms in detail.
Compare their convergence behavior, computational efficiency, and how they handle large
datasets.
3. Explain the Backpropagation Learning Algorithm in-depth. Illustrate with a flowchart, and
discuss how error signals are propagated through the layers. Provide an example of how this
algorithm works step by step.
4. Discuss the types of loss functions and activation functions commonly used in deep learning
models. Explain how the choice of these functions affects the model’s performance and output
with examples.
5. Describe the complete training process of a Multi-Layer Perceptron (MLP), including
initialization, forward propagation, loss calculation, backpropagation, and weight updates using
gradient descent. Include examples and illustrations to support your answer.

*****************

Prepared By: Dr. J. Sharmila Devi, AP/ICE EI3752 APPLIED MACHINE LEARNING

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy