0% found this document useful (0 votes)
21 views

List of Experiment - Data Analysis Lab

The document lists 9 experiments related to data analysis and visualization to be performed by students. The experiments cover topics like data wrangling, preprocessing, summarization, outlier detection, transformations, visualizations using libraries like Pandas, Seaborn and scikit-learn. Open datasets mentioned include Iris, Titanic and stock price data.

Uploaded by

Atharva Digambar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

List of Experiment - Data Analysis Lab

The document lists 9 experiments related to data analysis and visualization to be performed by students. The experiments cover topics like data wrangling, preprocessing, summarization, outlier detection, transformations, visualizations using libraries like Pandas, Seaborn and scikit-learn. Open datasets mentioned include Iris, Titanic and stock price data.

Uploaded by

Atharva Digambar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Department of AI & DS Engineering

List of Experiments

SUBJECT: Data Analysis Lab

CLASS: SY (A & B) SEMESTER: 4th

Academic Year : 2023-24

Sr. Title of Experiment CO PO PSO


No.
1. Study and implementation of Pandas Profiling, Sweetviz, Autoviz. CO4
2. Data Wrangling, I CO4,
Perform the following operations using Python on any open source da- CO5
taset (e.g., data.csv)
1. Import all the required Python Libraries.
2. Locate open source data from the web (e.g.,
https://www.kaggle.com). Provide a clear description
of the data and its source (i.e., URL of the web site).
3. Load the Dataset into pandas dataframe.
4. Data Preprocessing: check for missing values in the data us-
ing pandas is null(), describe() function to get some initial
statistics. Provide variable descriptions. Types of variables
etc. Check the dimensions of the data frame.
3. CO4,
Create an “Academic performance” dataset of students and perform the
following operations usingPython. CO5
1. Scan all variables for missing values and inconsistencies. If
there are missing values and/or inconsistencies, use any of
the suitable techniques to deal with them.
2. Scan all numeric variables for outliers. If there are outliers,
use any of the suitable techniquesto deal with them.
3. Apply data transformations on at least one of the variables.
The purpose of this transformation should be one of the fol-
lowing reasons: to change the scale for better understanding
of the variable, to convert a non-linear relation into a linear
one, or to decrease the skewness and convert the distribution
into a normal distribution.
Reason and document your approach properly.
4. CO4,
Perform the following operations on any open source dataset (e.g., da-
ta.csv) CO5
Provide summary statistics (mean, median, minimum, maxi-
mum, standard deviation) for a dataset (age, income etc.) with
numeric variables grouped by one of the qualitative (categori-
Department of AI & DS Engineering
cal) variable. For example, if your categorical variable is age
groups and quantitative variable is income, then provide sum-
mary statistics of income grouped by the age groups. Create a
list that contains a numeric value for each response to the cate-
gorical variable.
2. Write a Python program to display some basic statistical details
like percentile, mean, standard deviation etc. of the species of
‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-versicolor’ of iris.csv da-
taset.
Provide the codes with outputs and explain everything that you do in this
step.
5. 1. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and CO4,
contains information about the passengers who boarded the unfortunate CO5
Titanic ship. Use the Seaborn library to see if we can find any patterns in
the data.
2. Write a code to check how the price of the ticket (column name: 'fare')
for each passenger is distributed by plotting a histogram.
6. CO4,
Use the inbuilt dataset 'titanic' as used in the above problem. Plot a box
plot for distribution of age with respect to each gender along with the CO5
information about whether they survived or not. (Column names : 'sex'
and 'age')
Write observations on the inference from the above statistics.
7. Data Visualization III CO4,
Download the Iris flower dataset or any other dataset CO5
into a DataFrame.
(e.g., https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give
the inference as:
a. List down the features and their types (e.g., numeric, nominal)
available in the dataset.
b. Create a histogram for each feature in the dataset to illustrate
the feature distributions.
c. Create a boxplot for each feature in the dataset.
Compare distributions and identify outliers.
8. Implement mini project on Predicting Stock Prices Using Pandas and CO4,
Sckit –learn CO5
9. Implement color detection using Pandas and Autoviz, Sweetviz CO4,
CO5

Mr. B. B. Kondbhar

Subject In-charge

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy