The document lists 9 experiments related to data analysis and visualization to be performed by students. The experiments cover topics like data wrangling, preprocessing, summarization, outlier detection, transformations, visualizations using libraries like Pandas, Seaborn and scikit-learn. Open datasets mentioned include Iris, Titanic and stock price data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
21 views
List of Experiment - Data Analysis Lab
The document lists 9 experiments related to data analysis and visualization to be performed by students. The experiments cover topics like data wrangling, preprocessing, summarization, outlier detection, transformations, visualizations using libraries like Pandas, Seaborn and scikit-learn. Open datasets mentioned include Iris, Titanic and stock price data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2
Department of AI & DS Engineering
List of Experiments
SUBJECT: Data Analysis Lab
CLASS: SY (A & B) SEMESTER: 4th
Academic Year : 2023-24
Sr. Title of Experiment CO PO PSO
No. 1. Study and implementation of Pandas Profiling, Sweetviz, Autoviz. CO4 2. Data Wrangling, I CO4, Perform the following operations using Python on any open source da- CO5 taset (e.g., data.csv) 1. Import all the required Python Libraries. 2. Locate open source data from the web (e.g., https://www.kaggle.com). Provide a clear description of the data and its source (i.e., URL of the web site). 3. Load the Dataset into pandas dataframe. 4. Data Preprocessing: check for missing values in the data us- ing pandas is null(), describe() function to get some initial statistics. Provide variable descriptions. Types of variables etc. Check the dimensions of the data frame. 3. CO4, Create an “Academic performance” dataset of students and perform the following operations usingPython. CO5 1. Scan all variables for missing values and inconsistencies. If there are missing values and/or inconsistencies, use any of the suitable techniques to deal with them. 2. Scan all numeric variables for outliers. If there are outliers, use any of the suitable techniquesto deal with them. 3. Apply data transformations on at least one of the variables. The purpose of this transformation should be one of the fol- lowing reasons: to change the scale for better understanding of the variable, to convert a non-linear relation into a linear one, or to decrease the skewness and convert the distribution into a normal distribution. Reason and document your approach properly. 4. CO4, Perform the following operations on any open source dataset (e.g., da- ta.csv) CO5 Provide summary statistics (mean, median, minimum, maxi- mum, standard deviation) for a dataset (age, income etc.) with numeric variables grouped by one of the qualitative (categori- Department of AI & DS Engineering cal) variable. For example, if your categorical variable is age groups and quantitative variable is income, then provide sum- mary statistics of income grouped by the age groups. Create a list that contains a numeric value for each response to the cate- gorical variable. 2. Write a Python program to display some basic statistical details like percentile, mean, standard deviation etc. of the species of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-versicolor’ of iris.csv da- taset. Provide the codes with outputs and explain everything that you do in this step. 5. 1. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and CO4, contains information about the passengers who boarded the unfortunate CO5 Titanic ship. Use the Seaborn library to see if we can find any patterns in the data. 2. Write a code to check how the price of the ticket (column name: 'fare') for each passenger is distributed by plotting a histogram. 6. CO4, Use the inbuilt dataset 'titanic' as used in the above problem. Plot a box plot for distribution of age with respect to each gender along with the CO5 information about whether they survived or not. (Column names : 'sex' and 'age') Write observations on the inference from the above statistics. 7. Data Visualization III CO4, Download the Iris flower dataset or any other dataset CO5 into a DataFrame. (e.g., https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give the inference as: a. List down the features and their types (e.g., numeric, nominal) available in the dataset. b. Create a histogram for each feature in the dataset to illustrate the feature distributions. c. Create a boxplot for each feature in the dataset. Compare distributions and identify outliers. 8. Implement mini project on Predicting Stock Prices Using Pandas and CO4, Sckit –learn CO5 9. Implement color detection using Pandas and Autoviz, Sweetviz CO4, CO5
Head First EJB Brain Friendly Study Guides Enterprise JavaBeans 1st Edition Kathy Sierra - Download the entire ebook instantly and explore every detail