Rdatascience - Problem Statements
Rdatascience - Problem Statements
2.
Perform the following operations using Python by creating student performance dataset.
1. Display Missing Values
2. Replace missing values using any 2 suitable
3. Identify outliers using boxplot and scatterplot
4. Handle outlier using any technique
5. Perform any 2 data normalization technique
3.
Perform the following operations on iris dataset
1. check how the price of the ticket (column name: 'fare') for each passenger is distributed by plotting a histogram.
2. plot a box plot for distribution of age with respect to each gender along with the information about whether they survived
or not. (Column names : 'sex' and 'age')
3. Write observations on the inference from the above statistics.
5.
Perform the following operations on iris dataset
6. Create a Linear Regression Model using Python/R to predict home prices using Boston Housing Dataset. Find the performance of
your model.
7. Create a logistic regression model on social network ads.csv to perform classification on given dataset. Compute
Confusion matrix to find TP, FP, TN, FN, Accuracy, Error rate, Precision, Recall .
8. Create a Naïve Bayes classification model using Python on on social network ads.csv dataset. Compute Confusion matrix to find TP,
FP, TN, FN, Accuracy, Error rate, Precision, Recall on the given dataset.
9.
For given text apply following preprocessing methods:
1. Tokenization
2. POS Tagging
3. Stop word Removal
4. Lemmatization
5. Stemming
10. Calculate Term Frequency and Inverse Document Frequency. Considering sentences of documents.
11. Write Scala program to find average temperature, average dew point and average wind speed for given weather dataset
12.
Perform the following operations using Python by creating student performance dataset.
1. Display Missing Values
2. Replace missing values using any 2 suitable
3. Identify outliers using IQR and ZScore
4. Handle outlier using any technique
5. Perform data normalization using Min Max
13.
Perform the following operations using Python by creating student performance dataset.
1. Display Missing Values
2. Replace missing values using any 2 suitable
3. Identify outliers using IQR and ZScore
4.Handle outlier using any technique
5.Perform data normalization using decimal scaling
14.
For given text apply following preprocessing methods:
1. Tokenization
2. POS Tagging
3. Stop word Removal