0% found this document useful (0 votes)
17 views4 pages

2CSOE03-O IR December 2023

The document outlines the Semester End Examination for Data Analytics at Nirma University, detailing various questions and tasks related to data analysis techniques such as dissimilarity matrices, regression analysis, K-means clustering, and Naive Bayes classification. It includes practical exercises on statistical measures, data cleaning methods, normalization, and decision tree algorithms. Additionally, it covers the application of the Apriori algorithm for finding frequent item sets and strong association rules.

Uploaded by

xxyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

2CSOE03-O IR December 2023

The document outlines the Semester End Examination for Data Analytics at Nirma University, detailing various questions and tasks related to data analysis techniques such as dissimilarity matrices, regression analysis, K-means clustering, and Naive Bayes classification. It includes practical exercises on statistical measures, data cleaning methods, normalization, and decision tree algorithms. Additionally, it covers the application of the Apriori algorithm for finding frequent item sets and strong association rules.

Uploaded by

xxyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
250403 Data Anatytis Nirma University Institute of Technology Semester End Examination (R), Desonber- 2023 Bich in CLy CH ME /ER)/1C/ EC Somewtecvi 2eS010'0 Data Anal Roll/ Exam No. [—_] Supervisor's initial [ Tirme: 8 Hours ‘with date Instructions : 1. Attempt all questions, et Q2 2, Figures to right indicate full marks, 3. Use section-wise separate answer book, 4. Draw neat sketches wherever necessary. 5. Assume necessary data wherever required. SECTION -1 Do as directe Compute the dissimilarity matrix between the objects of mixed attribute types given for the following table: Exam | Hale | Weight Performance | Colour Better —[- Gray [ 39 Good | White | 64 ‘Good | Gray [19 Best_—| Black | 28 Suppose that the data Tor analysis includes the attribute age. The age values for the data tuples are (in increasing order) 2,12,13,13,14,15,15,16,20,29 4) Find out mean, median and mode of the data, ) Find out outliers. ) Draw the box plot 4) Draw and comment on the distribution of the data (symmetric or skewed) List down the causes of incomplete, noisy and inconsistent data, Answer the following questions Find out Confusion Matrix, precision, error rate and recall for the CLASS -Lenly from the following multiclass confusion matrix. Page 1of& Max, Marks: 100 115) 5 5 us] Qs 250503 Data Analytics ‘80% of people who purchase pet insurance are women. If9 pet insurance 5 owners are randomly selected, find the probability that exactly 6 are What is Regression Analysis? How can regression analysis be used for 5 prediction, and what are the limitations? Differentiate between linear regression models and non-linear regression models. Answer the following questions (20) Consider a dataset with the following points in a two-dimensional space 10 as: (2, 3}, (8, 3, 3, 4), (5, 4, (5, 6}, (6, 5), (8, 7), ©, 8), (10, 7}, (10, 9). Perform K-means clustering on this dataset with K=2, and initial clusters as (3, 4) and (8, 7). Walk through the iterations of the K-means algorithm for one iteration and show how the clusters are formed, OR What is dimensionality reduction? Explain the need for it in data 10 preprocessing. Apply Principal Component Analysis (PCA) on following data and find the final PC component value. Data (x, ¥) = (4,11), (8.4), (13,5), (7,14)} You are given a dataset with information about weather conditions and 10 whether people played tonnie or not (Yeo, No). The dataact ia as follows: Outlook | Temperature | Humidity | Windy | Play __| Tennis (Sunn Hot High | False | No Sunny” Hot| High | True | No Overcast | Hot High | False | Ves Rainy Mile High — | False | Yes ‘Rainy ‘Coot Normal_| False | Yes ‘Rainy Cool Normal | True | No Overcast | Cool ‘Normal_| True_|Yes_ Suns Mild High | False | No) ‘Sunny Cool Normal_| False | Yes Rain Normal | False | Yes [Sunny Normal_| True_|~Yes_| ‘Overcast High [True | Yes ‘Overcast | Hot Normal_| False | Yes Rain Mild High —[ True | No Using Naive Bayes, predict whether people will play tennis or not on this, day for (Sunny, Cool, Normal, True) data tuple, SECTION - 1 Do as directed ps) What is noise? Explain data smoothing methods as noise removal 5 technique. Consider the following data and apply squal frequency binning ‘method and smooth data with bin means for bin size Page 20f4 © es a6 2¢S0E03 Osta Analytics Data: 10, 2, 19, 18, 20, 18, 25, 28, 22 Write down pros and cons for the following data cleaning methods. 4) Ignoring the tuple ) Replacing with mean and median Find out chi square value and degree of freedom for the gender from random sample of 395 people who Were surveyed and each person was asked to report the highest education level they obtained as follows High ] ee, | Bachelors | Masters ph.p. | Female |60 Eg 46 [ar | Male | 40 as 33___[57_] Answer the following questions ‘What is the role of normalization in data preprocessing? Suppose that the data for analysis includes the attribute age. The age values for the data tuples are: 13, 15, 16, 19, 20, 21, 22, 25, 30, 33, 35, 36, 40, 45, 46, 52, 70 (a) Use min-max normalization to transform the value 35 for age onto the range [0:0, 1:0} (0) Use z-score normalization to transform the value 35 for age, where the standard deviation of age is 15.99 years, (0) Use normalization by decimal scaling to transform the value 35 Define “Entropy” for calculation of purity in decision tree algorithms, Draw figure and explain the fact “When entropy is zero the sample is completely homogeneous” (meaning that each instance belongs to the same class) Write down the equation of probability distribution function for following: 1. Normal distribution 2. Geometric distribution 3, Binomial distribution 4. Multinomial distribution 5. Poisson distribution oR Given two objects represented by the tuples (20, 2, 41, 10) and (10, 0, 36, 8): Compute the Euclidean, Manhattan, Minkowski (using h = 3) and supremum distance between the two objects. Do as directed You have a friend who likes Saturday afternoon activity: go shopping or staying in. You have observed your friend's betaviour over 11 different weekends. On each of these weekends you have noted the weather (sunny, ‘windy, or rainy), whether her parents visit (visit or no-visit, whether she has drawn cash from an ATM machine (rich or poor), and whether she hhad an exam during the coming week (exam or no-examn). You have built the following data table: Page sof fs) s 20) 10 250402 Data Analytics ‘iw Bram | Activity Week | Weather | Parents’ | Withdrawal | Schedule end Visit — fa ‘No-viait_| Rich Noexam_| Shopping _ 2 Visit___[ Poor | No-exam__| Shopping 3 Rich | Noexam_[ Staying in 4 “Poor | Nowexam | Staying In| 5 fh Exam | Shopping | 6 Poor No-exam | Staying In 7 Rich, No-exam_| Shopping 3 Poor Bxam__| Staying in | 9 ich ‘Exam_| Shopping 10 Poor Bam__| Shopping 11 | Windy | No-isit | Rich No-exam | Staying ia} Identify decision tree root node to predict the activity from the observed values, A database has five transactions. Let min-support = 60% and min- 10 confidence = 80%. Find all frequent item sets by using Apriori Algorithm, Also find strong agsociation rules. [FID [items bought 71000 M,0,N,K,E,Y [i001 [DONKEY [=1002 | M.A.K.Ie 1003 | MUCKY 7-104 [C,0,LK.EY OR Apply the Apriori algorithm and find out frequent itemset. Minimum 10 support threshold=50%., Also find out strong association rules. Minimum, contidence=70%. ‘Transaction | List of items 7 hp | 12 1213.14 73 145 T+ 12,4 — 13 112,135 6 T2314 Page a ofa

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy