DS Long Answers
DS Long Answers
DS Long Answers
1. Define Data science. What are the traits of Data science? Discuss the applications of Data science with suitable examples -Mid
1 Question
2. Write a brief note on various measures of data similarity and dissimilarity
3. What is Data matrix? Explain using an example how to find a Dissimilarity matrix
4. Using an example discuss similarity of Binary variables
5. Using an example Table below discuss similarity of any two types of variables what you have identified
6. Define Proximity matrix. Find the similarity matrix for given DS Lab continuous evaluation grades (Ordinal attribute) data set
7. What is data pre-processing and why do we need it? Explain cleaning of data in brief.
8. Write a python code for reading a dataset and removing the NaN values of filling the NaN values.
9. Explain data cleaning and munging with a suitable example
10. Explain various ways of data transformation and data reduction techniques.
11. Explain the process of data discretization with a suitable example
12. Explain the various ways of preparing the data for analysis.
13. What is the need for data visualization. Write on the libraries supported by python for data visulizations
14. Write a python code for plotting 5 different graphs with an example
15. Write a brief note on scrapping the web using twitter data API
16. a )List the visualization tools in python. b). Discuss the steps needed to perform Web scrapping to retrieve the III-B.Tech-I sem
students results from CVR website.
DS Unit 2 Essay Questions
1. Explain the least square methods for estimation of coefficients of linear multiple regression.
2. Estimate R2 and adjusted R2 value for the following data set
3. Define Regression. Design a Multiple Linear regression model to find final score for given dataset of student placement marks
and perform R2 , Adjusted R2 to test the performance of the model
4. Define Regression. Design a Multiple Linear regression model to find final score for given dataset of student placement marks
and Find the Mean Square Error (MSE) of the model. -> Mid Question
6. What is the need of Dimensionality reduction. How do you identify the Principal components from given data set?
7. What are the key differences between PCA and MDS dimensionality reduction approaches. Give the steps in classical Multi-
Dimensional Scaling algorithm
8. Explain in brief the steps involved to implement Multidimensional scaling
9. Define & Explain Stochastic process. Explain different types of stochastic process with suitable examples.
10. Write short notes on i) Stochastic process, ii) Markov chain process, iii) Transition probability matrix
11. Explain Markov Chain and their classification with example and How to implement in Python?
DS Unit 4 Essay Questions
1. What do you mean by well posed learning problem? Explain with example
2. What is a Well-posed problem? Discuss in detail the steps involved in designing a learning task
3. What is Machine learning? Discuss different approaches in Machine Learning with suitable examples
4. Discuss the Issues in machine learning
5. Prove that the LMS weight update rule performs a gradient descent to minimize the squared error
6. Illustrate general-to-specific ordering of hypothesis in concept learnin
7. Implement an algorithm demonstrating tic-tac-toe learning approach
8. Write FIND-S algorithm. Apply the FIND-S algorithm on given training samples to find the maximally specific hypothesis.
9. Define Version space? Find the hypothesis for given training examples using Candidate elimination algorithm
10. Consider the given below training example which finds malignant tumours from MRI Scans
1. Discuss representational power of two-layer perceptron model versus multilayer perceptron model.
2. How is ANN useful in making a machine intelligent?
3. Explain Supervised and Un-Supervised learning?
4. Do you have any idea about the deep neural network?
5. Explain the advantage of Artificial Neural Network?
6. Explain the feed-forward neural network?
7. What is the convolutional neural network?
8. What are Neural Networks? What are the types of Neural networks?
9. Explain appropriate problem for Neural Network Learning with its characteristics.
10. Explain the concept of a Perceptron with a neat diagram.
11. Explain the single perceptron with its learning algorithm.
12. How a single perceptron can be used to represent the Boolean functions such as AND,
13. Write a note on (i) Perceptron Training Rule (ii) Gradient Descent and Delta Rule
14. Write Gradient Descent algorithm for training a linear unit.
15. Derive the Gradient Descent Rule
16. Write Stochastic Gradient Descent algorithm for training a linear unit.
17. Differentiate between Gradient Descent and Stochastic Gradient Descent
18. Write Stochastic Gradient Descent version of the Back Propagation algorithm for feedforward networks containing two layers
of sigmoid units.
19. Derive the Back Propagation Rule
20. Explain different types of activation functions.
21. Explain the followings w.r.t Back Propagation algorithm a. Convergence and Local Minima b. Representational Power of
Feedforward Networks c. Hypothesis Space Search and Inductive Bias d. Hidden Layer Representations e. Generalization,
Overfitting, and Stopping Criterion
22. What is an Artificial Neural Network? Describe the architecture of ANN and explain the topological units of ANN. Mid 2
23. Write short notes on i) Weights, ii) Bias, iii) Activation function, iv) Inductive Bias, Mid 2
24. What is neuron? How is the ANN motivated with biological concept?
25. Define activation function. Explain different types of activation functions. Justify, ‘ANN without activation function is a linear
regression model’.
26. explain backpropagation algorithm with an example network
27.
1. What is an Artificial Neural Network? Describe the architecture of ANN and explain the topological units of ANN.
2. types of Ann,
3. activation functions
4. back propogation
DS Unit 1 Essay Answers.
1. Define Data science. What are the traits of Data science? Discuss the applications of Data science with suitable
examples -Mid 1 Question
Answer :
Data science
is a multi-disciplinary field that uses scientific methods, processes, algorithms, systems to extract knowledge, insights
from structured and unstructured data
Data Science is the science which uses in computer science, statistics, machine learning, visualization and human-
computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products
Traits of Data science / Big Data
1. Volume
How much data is there?
refers to data generated from many sources
2. Variety
How diverse are different types of data?
data can be structures, unstructured or semi structured
3. Velocity
At what speed is new data generated? -> ie speed of generation of data
4. Veracity
How accurate is the data? -> ie how much the data is reliable
5. Value
ability to transform big data into valuable data and store
Applications of Data science
1. Advanced Image Recognition
Eg : Face Mask detection
2. Recommendation System
Eg : in youtube, google news recommendation
3. Banking
Eg : Fraud detection, NPA risk modeling
4. Transport
Eg : Self driving cars
Answer :
Data Similarity
is Numerical measure of how alike two data objects are.
It is higher when objects are more alike.
it often falls in the range [0,1]
Data Dissimilarity
Numerical measure of how different are two data objects
it is lower when objects are more alike
Minimum dissimilarity is often 0
Upper limit varies
Measures of Similarity/Dissimilarity for Simple Attributes
Distances
Minkowski distance
3. What is Data matrix? Explain using an example how to find a Dissimilarity matrix
Answer :
Data Matrix
representing n data points with p dimensions
Dissimilarity matrix
is a triangular matrix which represents n data points, but registers only the distance
Example using Eucledian distance
Considering below data Matrix
Solution
1. Calculating Eucledian distances
Answer :
Answer :
Answer :
Take variable
1. one as Half-Yearly -this is ordinal attribute and solve in same way as 6 th question
2. second as Final -this is numeric attribute so use formula and solve in same way as 3rd question
6. Define Proximity matrix. Find the similarity matrix for given DS Lab continuous evaluation grades (Ordinal attribute)
data set
Answer :
Proximity matrix
is a square matrix in which the entry in cell (j, k) is some measure of the similarity (or distance) between the items to
which row j and column k correspond.
Proximity matrices form the data for multidimensional scaling
7. What is data pre-processing and why do we need it? Explain cleaning of data in brief.
Answer :
Data preprocessing
is a technique that involves transforming raw data into useful and efficient format so that data mining analytics can be
applied
Major Tasks in Data Preprocessing
1. Data cleaning
2. Data integration
3. Data reduction
4. Data transformation and data discretization
Why Preprocess the Data?
1. Accuracy
correct or wrong, accurate or not
2. Completeness
not recorded, unavailable, …
3. Consistency
some modified but some not, dangling, …
4. Timeliness
timely update?
5. Believability
how trustable the data are correct?
6. Interpretability
how easily the data can be understood?
Explain cleaning of data in brief
Data Cleaning
data can have many irrelevant and missing parts so data cleaning is done which involves handling of missing
data, noisy data or resolving the inconsistencies in the data
1. Missing Data:
This situation arises when some data is missing in the data. It can be handled in various ways.
Some of them are:
Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple values are missing within a
tuple.
Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values manually, by attribute mean or
the most probable value
2. Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated due to faulty data
collection, data entry errors etc.
It can be handled in following ways :
Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided into segments of equal size
and then various methods are performed to complete the task. Each segmented is handled separately. One can
replace all data in a segment by its mean or boundary values can be used to complete the task.
1. smoothing by bin means
2. smoothing by bin medians
3. smoothing by bin boundaries
Regression:
Here data can be made smooth by fitting it to a regression function.The regression used may be
1. linear (having one independent variable) or
2. multiple (having multiple independent variables).
Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or it will fall outside the
clusters.
8. Write a python code for reading a dataset and removing the NaN values of filling the NaN values.
Answer :
# Importing dataset
df = pd.read_csv('user_data.csv')
#use this command for all those attributes which has NaN values
df.X.fillna(np.random.randint(0,2),inplace=True) // here X is the attribute which has NaN values
Data munging, also known as data wrangling, is the data preparation process of manually transforming and
cleansing/cleaning the data for better decision making.
Data Munging includes the following steps:
1. Data exploration: In this process, the data is studied, analyzed and understood by visualizing representations of data.
2. Dealing with missing values: Most of the datasets having a vast amount of data contain missing values of NaN, they
are needed to be taken care of by replacing them with mean, mode, the most frequent value of the column or simply by
dropping the row having a NaN value.
3. Reshaping data: In this process, data is manipulated according to the requirements, where new data can be added or
pre-existing data can be modified.
4. Filtering data: Some times datasets are comprised of unwanted rows or columns which are required to be removed or
filtered
5. Other: After dealing with the raw dataset with the above functionalities we get an efficient dataset as per our
requirements and then it can be used for a required purpose like data analyzing, machine learning, data visualization,
model training etc.
Eg : Refer this link -> https://www.geeksforgeeks.org/data-wrangling-in-python/
10. Explain various ways of data transformation and data reduction techniques
Answer :
Data Transformation
data are transformed into forms appropriate for data analytic processing
Data transformation tasks:
1. Smoothing
Remove the noise from the data.
Techniques includes Binning, Regression, Clustering.
2. Normalization
the attribute data are scaled so as to fall within a small specified range, such as -1.0 to 1.0, 0.0 to 1.0
Types
1. Min-max normalization to [new_minA , new_maxA ]
Data reduction
is a technique used to obtain a reduced representation of the data set that is much smaller in volume but yet produces
the same (or almost the same) analytical results
Data reduction strategies
1. Data compression
apply transformations to obtain reduced or compressed representation of original data
they are of 2 types
1. Lossless
If the original data can be reconstructed from the compressed data without any loss of
information
2. Lossy
If the original data can be reconstructed from the compressed data with loss of information, then
the data reduction is called lossy
Eg :
Wavelet transforms
Principal components analysis.
2. Dimensionality reduction-remove unimportant attributes/variables Eliminate the redundant attributes: which are weekly
important across the data.
Wavelet transforms
is a linear signal processing technique that, when applied to a data vector, transforms it to a numerically
different vector, of wavelet coefficients. The two vectors are of the same length. When applying this
technique to data reduction, we consider each tuple as an ndimensional data vector, that is, X=(x1 ,x,
…,xn ), depicting n measurements made on the tuple from n database attributes
Eg : using Fourier transform to reduce the data
Answer :
2. Histogram analysis
Top-down split
is an unsupervised discretization technique because it does not use class information
A histogram partitions the values of an attribute, A, into disjoint ranges called buckets or bins.
histogram partitions the values of an attribute, A, into disjoint ranges called buckets or bins
Eg :
for the dataset
we do Histogram Analysis in below way
3. Clustering analysis
Either top-down split or bottom-up merge, unsupervised
4. Entropy-based discretization
supervised, top-down split
Eg : if want example then refer this https://natmeurer.com/a-simple-guide-to-entropy-based-discretization/
Answer :
1. Questionnaire checking:
Questionnaire checking involves eliminating unacceptable questionnaires. These questionnaires may be incomplete,
instructions not followed, little variance, missing pages, past cutoff date or respondent not qualified.
2. Editing
Editing looks to correct illegible, incomplete, inconsistent and ambiguous answers.
3. Coding
Coding typically assigns alpha or numeric codes to answers that do not already have them so that statistical techniques
can be applied.
4. Transcribing
Transcribing data involves transferring data so as to make it accessible to people or applications for further processing.
5. Cleaning
Cleaning reviews data for consistencies. Inconsistencies may arise from faulty logic, out of range or extreme values.
6. Statistical adjustments
Statistical adjustments applies to data that requires weighting and scale transformations.
7. Analysis strategy selection
Finally, selection of a data analysis strategy is based on earlier work in designing the research project but is finalized
after consideration of the characteristics of the data that has been gathered.
13. What is the need for data visualization. Write on the libraries supported by python for data visulizations
Answer :
Data Visualization
is the graphical representation of information and data by using visual elements like charts, graphs, and maps
data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data
Need for Data Visualisation
1. To make easier in understand and remember
2. To discover unknown facts, outliers, and trends
3. To visualize relationships and patterns quickly
4. To ask a better question and make better decisions
5. To competitive analyze
6. To improve insights
7. Data visualization can identify areas that need improvement or modifications
8. Data visualization can clarify which factor influence customer behavior
9. Data visualization helps you to understand which products to place where
10. Data visualization can predict sales volumes
libraries supported by python for data visulizations
1. Matplotlib
used for exploration & data visualisation
can do charts, plots & also can be customised
2. Seaborn
visualisation for large data
has advanced plots
3. Plotly
provides hgih quality plots
provides more advanced plots & features than MATPLOTLIB
4. Bokeh
5. Altair
6. ggplo
14. Write a python code for plotting 5 different graphs with an example
Answer :
1. Line Plot
Code
Output
2. Bar Chart
Code
Output
3. Histogram
Code
Output
4. Scatter plot
Code
Output
5. Pie Plot
Code
Output
15. Write a brief note on scrapping the web using twitter data API
Answer :
16. a )List the visualization tools in python. b). Discuss the steps needed to perform Web scrapping to retrieve the III-
B.Tech-I sem students results from CVR website.
Answer :
Visualisation tools - provide an accessible way to see and understand trends, outliers, and patterns in data
1. Matplotlib
used for exploration & data visualisation
can do charts, plots & also can be customised
2. Seaborn
visualisation for large data
has advanced plots
3. Plotly
provides hgih quality plots
provides more advanced plots & features than MATPLOTLIB
Discuss the steps needed to perform Web scrapping to retrieve the III-B.Tech-I sem students results from CVR website
Instead of Amazon use CVR Website -> https://www.youtube.com/watch?v=ecAJfHHppVs
DS Unit 2 Essay Answers
1. Explain the least square methods for estimation of coefficients of linear multiple regression.
Answer :
Solution
2. Estimate R2 and adjusted R2 value for the following data set
Answer :
3. Define Regression. Design a Multiple Linear regression model to find final score for given dataset of student
placement marks and perform R2 , Adjusted R2 to test the performance of the model
Answer :
Regression is a statistical technique that is used to model the relationship of a dependent variable with respect to one or
more independent variables
4. Define Regression. Design a Multiple Linear regression model to find final score for given dataset of student
placement marks and Find the Mean Square Error (MSE) of the model. -> Mid Question
Answer :
Regression is a statistical technique that is used to model the relationship of a dependent variable with respect to one or
more independent variables
5. Explain Logistic regression with example and How to implement in Python?
Answer :
Logistic Regression
Logistic regression is a Machine Learning algorithms, which comes under the Supervised Learning technique. It is used
for predicting the categorical dependent variable using a given set of independent variable
Logistic regression is used for solving the classification problems
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two
maximum values (0 or 1)
Code:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report, confusion_matrix
LR=LogisticRegression()
LR.fit(x_train,y_train)
pred=LR.predict(x_test)
print("Logistic Regression accuracy::","\n",accuracy_score(y_test,pred))
print("\n")
print(confusion_matrix(y_test,pred))
print("\n")
print(classification_report(y_test,pred))
6. Define Classification. Discuss the procedure of the KNN-Classifier to classify a Person X ( sugar level 190, Age 45) from
given case study of diabetic patients. -> Mid Question
Answer :
Classification
The method of arranging data into homogenous classes according to common features present in the data is known as
classification
Types of Classification
KNN Classifier
K nearest neighbors (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a
similarity measure (distance function)
KNN Algorithm
Problem
Answer :
Random forests
is a supervised learning algorithm. It can be used both for classification and regression.
Random forests creates decision trees on randomly selected data samples, gets prediction from each tree and selects
the best solution by means of voting
RF are ensemble methods used to boost the performance of DT
Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction
Example - did not get
Code
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
def train(model, X, y):
# train the model
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.7,random_state=0)
model.fit(x_train, y_train)
print("Model Report")
print("RMSE:",sqrt(mean_squared_error(y_test, pred)))
Answer :
Ridge regression
is a model tuning method that is used to analyse any data that suffers from multicollinearity. This method performs L2
regularization
When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted
values to be far away from the actual values. By adding a degree of bias to the regression estimates, ridge regression
reduces the standard errors.
Example - did not get
Code
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(indp_data,target,test_size=0.25,random_state=3405)
from sklearn.linear_model import Ridge
import warnings
warnings.filterwarnings("ignore")
# Model initialization
regression_model = Ridge(normalize=True,random_state =100)
#Fit the data(train the model)
model= regression_model.fit(X_train, Y_train)
# Predict
y_predicted = model.predict(X_train)
# model evaluation
rmse = mean_squared_error(Y_train, y_predicted)
r2 =model.score(X_train, Y_train)
# printing values
print('Slope:' ,model.coef_)
print('Intercept:', model.intercept_)
print('Root mean squared error: ', rmse)
print('R2 score: ', r2)
9. What is Multi collinearity issue? How can you address this issue?
Answer :
Multicollinearity, or collinearity, is the existence of nearlinear relationships among the independent variables
occurs when independent variables in a regression model are correlated
Issue
create inaccurate estimates of the regression coefficients, inflate the standard errors of the regression coefficients,
deflate the partial t-tests for the regression coefficients, give false, nonsignificant, pvalues, and degrade the
predictability of the model
How can it be measured ?
1. Tolerance
is percentage of variance in the independent variable that is not accounted for by other independent variables
How to address it
1. Remove highly correlated predictors from the model. If you have two or more factors with a high VIF, remove one
from the model. Because they supply redundant information, removing one of the correlated factors usually doesn't
drastically reduce the R-squared. Consider using stepwise regression, best subsets regression, or specialized
knowledge of the data set to remove these variables. Select the model that has the highest R-squared value.
2. Use Partial Least Squares Regression (PLS) or Principal Components Analysis, regression methods that cut the number
of predictors to a smaller set of uncorrelated components.
3. Obtaining more data on an expanded range would cure multicollinearity problem caused due to Data collection (ie
data collected from a narrow subspace of the independent variables(
4. situation of Over-defined model(there are more variables than observations) should be avoided
5. outlier-induced multicollinearity can be corrected by removing the outliers before ridge regression is applied
6. If Strutural multicollinearity
centering variables is efficient solution
7. Data multicollinearity
remove some highly corelated independent variables
Linearly combine correlated Independent variables , add them together
use LASSO or ridge regression
8. First check to see if one of predictor variable is a duplicate
9. Remove a redundant variable
10. Aggregate similar variables
11. Increase sample size
Answer :
Answer :
12. What is ANOVA. Find the significance of the noise on solving questions from dataset given below
Answer :
Answer :
Mean vector
consists of the means of each variable
mean vector is often referred to as the centroid
Example
Correlation
Correlation is a statistical measure that expresses the extent to which two variables are related
Example
1. sales might increase when the marketing department spends more on TV advertisements
2. customer's average purchase amount on an ecommerce website might depend on a number of factors related
to that customer
Formula (Relation b/e Correlation & Covariance)
Use of Correlation
understanding the relationships and subsequently building better business and statistical models
Variance-Covariance matrix consists of the variances of the variables along the main diagonal and the covariances between
each pair of variables in the other matrix position
It is also called as dispersion or dispersion matrix
The set of 5 observations, measuring 3 variables, can be described by its mean vector and variance-covariance matrix.
The three variables, from left to right are length, width, and height of a certain object, for example. Each row vector Xi is
another observation of the three variables (or components)
The formula for computing the covariance of the variables X and Y is
Answer :
Answer :
Mahalanobis distance
is the distance between two points in multivariate space
Why Mahalanobis?
In a regular Euclidean space, variables (e.g. x, y, z) are represented by axes drawn at right angles to each other; The
distance between any two points can be measured with a ruler.
For uncorrelated variables, the Euclidean distance equals the MD. However, if two or more variables are correlated, the
axes are no longer at right angles, and the measurements become impossible with a ruler. In addition, if you have more
than three variables, you can’t plot them in regular 3D space at all. The MD solves this measurement problem, as it
measures distances between points, even correlated points for multiple variables
Formula
Refer this for example
https://www.youtube.com/watch?v=4buOoXp7AyI
3. a). Define Data matrix, Euclidean distance, Mahalanobis distance, Precision matrix in Multi variate data analysis. b).
Find the mean vector for the given vectors u1 = (23, 56, 76) u2 = (123, 89, 64) and u3 = (98, 54, 78)
Answer :
4. Explain PCA with example and How to implement in Python?
Answer :
5. What is the need for Dimensionality reduction. Reduce 2D to 1D for given dataset with PCA?
Answer :
6. What is the need of Dimensionality reduction. How do you identify the Principal components from given data set?
Answer :
Dimensionality reduction
Dimensionality reduction is reducing dimensionality of high dimensional data.
This approach reduces the high dimensionality by projecting the data to lower dimensional subspace, without loosing
the essence of original data.
When dimensionality reduction approach is applied, the original features no longer exist and new features are
constructed from the original data using projections.
PCA is a Dimensionality reduction method(algorithms)
How to identify the Principal components from given data set -> for example refer question 5
1. Take the whole dataset consisting of d+1 dimensions and ignore the labels such that our new dataset becomes d
dimensional.
2. Compute the mean for every dimension of the whole dataset.
3. Compute the covariance matrix of the whole dataset.
4. Compute eigenvectors and the corresponding eigenvalues.
5. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d ×
k dimensional matrix W.
6. Use this d × k eigenvector matrix to transform the samples onto the new subspace.
7. What are the key differences between PCA and MDS dimensionality reduction approaches. Give the steps in classical
Multi-Dimensional Scaling algorithm
Answer :
Steps
9. Define & Explain Stochastic process. Explain different types of stochastic process with suitable examples.
Answer :
10. Write short notes on i) Stochastic process, ii) Markov chain process, iii) Transition probability matrix
Answer :
Answer :
A Markov chain is a mathematical system that experiences transitions from one state to another according to
certain probabilistic rules. The defining characteristic of a Markov chain is that no matter how the process arrived at its present
state, the possible future states are fixed. In other words, the probability of transitioning to any particular state is dependent
solely on the current state and time elapsed
Markov Chain Classification -> this is given as Classification of States so not 100% sure whether correct or not
DS Unit 4 Essay Answers
1. What do you mean by well posed learning problem? Explain with example
Answer :
Example :
Answer :
3. What is Machine learning? Discuss different approaches in Machine Learning with suitable examples
Answer :
Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data.
It is seen as a part of artificial intelligence
Three Approches of ML :
1. Supervised Learning :
The supervised learning approach can be adopted when a dataset contains the records of the response variable
values (or labels). Depending on the context,
this data with labels is usually referred to as “labeled data” and “training data.”
Eg :
We can mark e-mails as ‘spam’ or ‘not-spam’ based on the differentiating features of the previously seen
spam and not-spam e-mails, such as the lengths of the e-mails and use of particular keywords in the e-
mails.
Learning from training data continues until the machine learning model achieves a high level of accuracy
on the training data.
Types : There are two main supervised learning problems:
1. Classification Problems
2. Regression Problems.
2. Un Supervised Learinng :
Unsupervised learning is a learning approach used in ML algorithms to draw inferences from datasets, which do
not contain labels.
There are two main unsupervised learning problems:
1. Clustering
2. Dimensionality Reduction
Eg :
Recommender systems, which involve grouping together users with similar viewing patterns in order to
recommend similar content.
3. Reinforcement Learning :
Reinforcement learning is one of the primary approaches to machine learning concerned with finding optimal
agent actions that maximize the reward within a particular environment. The agent learns to perfect its actions
to gain the highest possible cumulative reward.
There are four main elements in reinforcement learning:
1. Agent: The trainable program which exercises the tasks assigned to it
2. Environment: The real or virtual universe where the agent completes its tasks.
3. Action: A move of the agent which results in a change of status in the environment
4. Reward: A negative or positive remuneration based on the action.
Eg :
self-driving cars
5. Prove that the LMS weight update rule performs a gradient descent to minimize the squared error
Answer :
6. Illustrate general-to-specific ordering of hypothesis in concept learning
Answer :
7. Implement an algorithm demonstrating tic-tac-toe learning approach
Answer :
8. Write FIND-S algorithm. Apply the FIND-S algorithm on given training samples to find the maximally specific
hypothesis.
Answer :
Find - S Algorithm :
The Key Properties of Find S Algorihtm :
Issues in find-S
9. Define Version space? Find the hypothesis for given training examples using Candidate elimination algorithm
Answer :
Version Space :
Problem :
10. Consider the given below training example which finds malignant tumours from MRI Scans
Show the specific and general boundaries of the version space after applying candidate Elimination algorithm (Note:
Malignant:+Ve and Benign : -Ve)
Answer :
11. Explain candidate-elimination learning algorithm using version spaces
Answer :
The Candidate-Elimina on algorithm computes the Version Space Containg all hypothesis from H that are Consistent with
an observed sequence of training examples .
Answer :
DS Unit 5 Essay Answers
1. What is an Artificial Neural Network? Describe the architecture of ANN and explain the topological units of ANN.
Answer :
2. types of Ann,
Answer :
3. activation functions
Answer :
Activation function is a function that is added into an artificial neural network in order to help the network learn complex
patterns in the data. When comparing with a neuron-based model that is in our brains, the activation function is at the end
deciding what is to be fired to the next neuron
The activation functions help the network use the important information and suppress the irrelevant data points.
Purpose of Activation Function
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: