0% found this document useful (0 votes)

3 views6 pages

Python Project 2 Colab

The document outlines a data analysis project on the Wine Quality dataset, detailing the importation of necessary libraries, data preprocessing, and exploratory data analysis. Key findings include the most frequent wine quality, correlations between various features and wine quality, and the training of Decision Tree and Random Forest models to predict wine quality with accuracy scores of 0.675 and 0.725, respectively. The Random Forest model outperformed the Decision Tree model in terms of accuracy.

Uploaded by

Gaurav Rajula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Python Project 2 Colab

Uploaded by

Gaurav Rajula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1/22/25, 8:14 PM Finlatics project 2 .

ipynb - Colab

FINLATICS Project 2

In this dataset we are analysing Wine Quality dataset.

# importing necessery libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# importing the data set

df = pd.read_csv('/content/wine_data.csv')

# data preprocessing

df.head()

free total
fixed volatile citric residual
chlorides sulfur sulfur density pH sulphates alcohol q
acidity acidity acid sugar
dioxide dioxide

0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4

1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8

2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8

3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8

4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

# Checking info about the dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fixed acidity 1599 non-null float64
1 volatile acidity 1599 non-null float64
2 citric acid 1599 non-null float64
3 residual sugar 1599 non-null float64
4 chlorides 1599 non-null float64
5 free sulfur dioxide 1599 non-null float64
6 total sulfur dioxide 1599 non-null float64
7 density 1599 non-null float64
8 pH 1599 non-null float64
9 sulphates 1599 non-null float64
10 alcohol 1599 non-null float64
11 quality 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 1/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab
# Checking for missing values and duplicates
print(df.isnull().sum())

print("checking duplicate rows")

print(df.duplicated().sum())

print("describing data")

print(df.describe())

fixed acidity 0
volatile acidity 0
citric acid 0
residual sugar 0
chlorides 0
free sulfur dioxide 0
total sulfur dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64
checking duplicate rows
240
describing data
fixed acidity volatile acidity citric acid residual sugar \
count 1599.000000 1599.000000 1599.000000 1599.000000
mean 8.319637 0.527821 0.270976 2.538806
std 1.741096 0.179060 0.194801 1.409928
min 4.600000 0.120000 0.000000 0.900000
25% 7.100000 0.390000 0.090000 1.900000
50% 7.900000 0.520000 0.260000 2.200000
75% 9.200000 0.640000 0.420000 2.600000
max 15.900000 1.580000 1.000000 15.500000

chlorides free sulfur dioxide total sulfur dioxide density \

count 1599.000000 1599.000000 1599.000000 1599.000000
mean 0.087467 15.874922 46.467792 0.996747
std 0.047065 10.460157 32.895324 0.001887
min 0.012000 1.000000 6.000000 0.990070
25% 0.070000 7.000000 22.000000 0.995600
50% 0.079000 14.000000 38.000000 0.996750
75% 0.090000 21.000000 62.000000 0.997835
max 0.611000 72.000000 289.000000 1.003690

pH sulphates alcohol quality

count 1599.000000 1599.000000 1599.000000 1599.000000
mean 3.311113 0.658149 10.422983 5.636023
std 0.154386 0.169507 1.065668 0.807569
min 2.740000 0.330000 8.400000 3.000000
25% 3.210000 0.550000 9.500000 5.000000
50% 3.310000 0.620000 10.200000 6.000000
75% 3.400000 0.730000 11.100000 6.000000
max 4.010000 2.000000 14.900000 8.000000

1. What is the most frequently occurring wine quality? What is the highest number in and the lowest number in
the quantity column?

# Most frequently occurring wine quality

most_frequent_quality = df['quality'].mode()[0]
quality_count = df['quality'].value_counts()

# Highest and lowest values in the 'quality' column

highest_quality = df['quality'].max()
lowest_quality = df['quality'].min()

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 2/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab

print("Most frequent wine quality : ",most_frequent_quality)

print("Frequency of each wine quality : ", quality_count)
print("Highest wine quality : " ,highest_quality)
print("Lowest wine quality : " ,lowest_quality)

Most frequent wine quality : 5

Frequency of each wine quality : quality
5 681
6 638
7 199
4 53
8 18
3 10
Name: count, dtype: int64
Highest wine quality : 8
Lowest wine quality : 3

2. How is fixed acidity correlated to the quality of the wine? How does the alcohol content affect the quality?
How is the free Sulphur dioxide content correlated to the quality of the wine?

# finding correlations between given features

corr_fixed_acidity = df['fixed acidity'].corr(df['quality'])

corr_alcohol = df['alcohol'].corr(df['quality'])
corr_free_sulfur_dioxide = df['free sulfur dioxide'].corr(df['quality'])

print("correlation between fixed acidity and quality of wine : ",corr_fixed_acidity)

print("correlation between alcohol and quality of wine : ",corr_alcohol)

print("corelation between free sulfur dioxide and quality of wine : ",corr_free_sulfur_dioxide)

correlation between fixed acidity and quality of wine : 0.12405164911322428

correlation between alcohol and quality of wine : 0.4761663239995365
corelation between free sulfur dioxide and quality of wine : -0.0506560572442763

# visualizing the given correlations

import seaborn as sns

plt.figure(figsize=(8,8))

# Fixed acidity vs Quality

plt.subplot(1, 3, 1)
sns.scatterplot(x='fixed acidity', y='quality', data=df, alpha=0.5)
plt.title('Fixed Acidity vs Quality')
plt.xlabel('Fixed Acidity')
plt.ylabel('Quality')

# Alcohol vs Quality
plt.subplot(1, 3, 2)
sns.scatterplot(x='alcohol', y='quality', data=df, alpha=0.5, color='orange')
plt.title('Alcohol vs Quality')
plt.xlabel('Alcohol')
plt.ylabel('Quality')

# Free Sulfur Dioxide vs Quality

plt.subplot(1, 3, 3)
sns.scatterplot(x='free sulfur dioxide', y='quality', data=df, alpha=0.5, color='green')
plt.title('Free Sulfur Dioxide vs Quality')

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 3/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab
plt.xlabel('Free Sulfur Dioxide')
plt.ylabel('Quality')

plt.tight_layout()
plt.show()

3. What is the average residual sugar for the best quality wine and the lowest quality wine in the dataset?

# average residual sugar for the best quality wine and the lowest quality wine

residual_sugar_best_quality = df[df['quality'] == df['quality'].max()]['residual sugar'].mean()

residual_sugar_lowest_quality = df[df['quality'] == df['quality'].min()]['residual sugar'].mean()

print("Average residual sugar for the best quality wine : ",residual_sugar_best_quality)

print("Average residual sugar for the lowest quality wine : ",residual_sugar_lowest_quality)

Average residual sugar for the best quality wine : 2.5777777777777775

Average residual sugar for the lowest quality wine : 2.6350000000000002

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 4/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab

4. Does volatile acidity has an effect over the quality of the wine samples in the dataset?

# correlation of volatile acidity and wine quality

corr_volatile_acidity = df['volatile acidity'].corr(df['quality'])

print("correlation between volatile acidity and wine quality : ",corr_volatile_acidity)

# Scatter plot to visualize the relationship

plt.figure(figsize=(8, 5))
sns.scatterplot(x='volatile acidity', y='quality', data=df, alpha=0.5, color='green')
plt.title('Volatile Acidity vs Wine Quality')
plt.xlabel('Volatile Acidity')
plt.ylabel('Wine Quality')
plt.show()

correlation between volatile acidity and wine quality : -0.390557780264007

5. Train a Decision Tree model and Random Forest Model separately to predict the Quality of the given samples
of wine. Compare the Accuracy scores for both models.

# for this we need to train two models and compare the accuracy score of both models
# for this we need to import needed models and split the data into training and testing

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Splitting data into features (X) and target (y)

X = df.drop(columns=['quality'])
y = df['quality']

# Train-test split (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 5/6
1/22/25, 8:14 PM Finlatics project 2 .ipynb - Colab

# Decision Tree Model

dt_model = DecisionTreeClassifier(random_state=3)
# fitting and training model
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)
# accuracy score of decision tree model
dt_accuracy = accuracy_score(y_test, y_pred_dt)

print("accuracy score of decision tree model for the wine data : ",dt_accuracy)

# Random Forest Model

rf_model = RandomForestClassifier(random_state=3)
# fitting and training model
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
# accuracy score of random forest model
rf_accuracy = accuracy_score(y_test, y_pred_rf)

print("accuracy score of random forest model for the wine data : ",rf_accuracy)

# comparing accuracy score of both models

print("for the given wine data")

print("accuracy score of decision tree model : ",dt_accuracy)
print("accuracy score of random forest model : ",rf_accuracy)

if dt_accuracy > rf_accuracy:

print("Decision Tree model performs better.")
elif dt_accuracy < rf_accuracy:
print("Random Forest model performs better.")
else:
print("Both models have the same accuracy.")

accuracy score of decision tree model for the wine data : 0.675
accuracy score of random forest model for the wine data : 0.725
for the given wine data
accuracy score of decision tree model : 0.675
accuracy score of random forest model : 0.725
Random Forest model performs better.

Could not connect to the reCAPTCHA service. Please check your internet connection and reload to get a reCAPTCHA challenge.

https://colab.research.google.com/drive/1LFtPIQYXP4WcNZ6Aik3xLZMMsvVO1HvQ#scrollTo=mG7oet9mwoPi&printMode=true 6/6

Distillation Theoretical Stages Calculator
No ratings yet
Distillation Theoretical Stages Calculator
2,155 pages
List Cosmetic
No ratings yet
List Cosmetic
174 pages
Tabel Termodinamika
50% (2)
Tabel Termodinamika
104 pages
Learning Concepts Hackers Realm
No ratings yet
Learning Concepts Hackers Realm
78 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
R Console
No ratings yet
R Console
1 page
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Basic Python Analysis
No ratings yet
Basic Python Analysis
33 pages
The Art of Effective Visualization of Multi-Dimensional Data
No ratings yet
The Art of Effective Visualization of Multi-Dimensional Data
51 pages
Eda Red Wine
No ratings yet
Eda Red Wine
16 pages
Wine
No ratings yet
Wine
22 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
03 - Fractionators
No ratings yet
03 - Fractionators
30 pages
Audley Traveller Spring 2024
100% (1)
Audley Traveller Spring 2024
52 pages
Quality Prediction
No ratings yet
Quality Prediction
20 pages
Wine
No ratings yet
Wine
15 pages
AM19 EDA Assignment5
No ratings yet
AM19 EDA Assignment5
19 pages
Central Tendency and Dispersion Analysis - 12212204
No ratings yet
Central Tendency and Dispersion Analysis - 12212204
14 pages
Statistics and Probability PROJECT 2
No ratings yet
Statistics and Probability PROJECT 2
8 pages
Table of Correlation Analysis 1
No ratings yet
Table of Correlation Analysis 1
2 pages
Fermented Plant Juice Zea Mays
No ratings yet
Fermented Plant Juice Zea Mays
36 pages
Data Ingestion: Import As Import As Import As
No ratings yet
Data Ingestion: Import As Import As Import As
16 pages
Quality Prediction Checkpoint
No ratings yet
Quality Prediction Checkpoint
14 pages
Record
No ratings yet
Record
27 pages
Algebra 1 Engr Realuyo
No ratings yet
Algebra 1 Engr Realuyo
4 pages
Scope 79152 TC 13269 1716185565
No ratings yet
Scope 79152 TC 13269 1716185565
60 pages
Module 2 Communication Studies
93% (120)
Module 2 Communication Studies
27 pages
Food Processing Industry 01 - Daily Class Notes PDF
No ratings yet
Food Processing Industry 01 - Daily Class Notes PDF
4 pages
Assignment4 VidulGarg
No ratings yet
Assignment4 VidulGarg
14 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Principles of Engineering Thermodynamics - SI Version 8th Edition
No ratings yet
Principles of Engineering Thermodynamics - SI Version 8th Edition
47 pages
Karisma 23011101119 Eda Rec
No ratings yet
Karisma 23011101119 Eda Rec
88 pages
Finger Millet Breeding
No ratings yet
Finger Millet Breeding
3 pages
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
No ratings yet
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
6 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
14-May - Jupyter Notebook
No ratings yet
14-May - Jupyter Notebook
15 pages
Orchid Hotel Explaination
No ratings yet
Orchid Hotel Explaination
14 pages
Drmpe 6000
No ratings yet
Drmpe 6000
3 pages
45B AIML Practical07 Clustering
No ratings yet
45B AIML Practical07 Clustering
8 pages
Steam Tables A1 - A6
No ratings yet
Steam Tables A1 - A6
12 pages
Disadvantages of RTC
No ratings yet
Disadvantages of RTC
7 pages
Exercise#9 Instructions 2021
No ratings yet
Exercise#9 Instructions 2021
5 pages
Water Portability Sunig R
No ratings yet
Water Portability Sunig R
4 pages
Coding An
No ratings yet
Coding An
19 pages
Fishery Science
100% (1)
Fishery Science
31 pages
DataFrame and Series
No ratings yet
DataFrame and Series
2 pages
Data Mining 1 Practical File-1
No ratings yet
Data Mining 1 Practical File-1
24 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Water - Qualit (2) - JupyterLab
No ratings yet
Water - Qualit (2) - JupyterLab
10 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
Morley - 2000 - Syntax in Function - Class
No ratings yet
Morley - 2000 - Syntax in Function - Class
42 pages
Practical04.ipynb - Colab
No ratings yet
Practical04.ipynb - Colab
2 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
6 pages
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
No ratings yet
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
15 pages
Compte Rendu TP 2 Pandas
No ratings yet
Compte Rendu TP 2 Pandas
2 pages
21brs1715 Lab3
No ratings yet
21brs1715 Lab3
4 pages
DA
No ratings yet
DA
4 pages
Tabel Sifat Air
No ratings yet
Tabel Sifat Air
11 pages
Assignment Case Study
No ratings yet
Assignment Case Study
1 page
TP
No ratings yet
TP
13 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
Product List
No ratings yet
Product List
52 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Rewriting Sentences
No ratings yet
Rewriting Sentences
11 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
Pandas Usefull Code
No ratings yet
Pandas Usefull Code
2 pages
Indian Tobacco Company (ITC)
100% (1)
Indian Tobacco Company (ITC)
54 pages
Fixed Acidity Volatile Acidity Citric Acid Residual Sugar Chlorides Count Mean STD Min 25% 50% 75% Max Free Sulfur Dioxide
No ratings yet
Fixed Acidity Volatile Acidity Citric Acid Residual Sugar Chlorides Count Mean STD Min 25% 50% 75% Max Free Sulfur Dioxide
2 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
CO1 - English-8 - Lesson Plan - 2024
No ratings yet
CO1 - English-8 - Lesson Plan - 2024
16 pages
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
No ratings yet
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
15 pages
CHARLOTTE'S WEB - Educator's Edition
100% (1)
CHARLOTTE'S WEB - Educator's Edition
23 pages
Wine DS
No ratings yet
Wine DS
14 pages
Business Plan
No ratings yet
Business Plan
15 pages
AND Temperatures: Liq. Liq
No ratings yet
AND Temperatures: Liq. Liq
8 pages
BAHASA INGGRIS KELAS 7 (Penilaian Akhir Semester Genap) - Quizizz
No ratings yet
BAHASA INGGRIS KELAS 7 (Penilaian Akhir Semester Genap) - Quizizz
10 pages
English 6 Final
100% (2)
English 6 Final
4 pages
930-HomeScience E
No ratings yet
930-HomeScience E
4 pages
RestaurantReview Group01-AguedaEsteban PDF
No ratings yet
RestaurantReview Group01-AguedaEsteban PDF
3 pages
Class-11 Sec - F Teacher - Mrs. Deeksha Timothy Mam: by - Chitranshi Karki
No ratings yet
Class-11 Sec - F Teacher - Mrs. Deeksha Timothy Mam: by - Chitranshi Karki
9 pages
JR High Lunch December 2024
No ratings yet
JR High Lunch December 2024
1 page
An Introduction To Pakistan's Sugar Industry: TH TH TH
No ratings yet
An Introduction To Pakistan's Sugar Industry: TH TH TH
12 pages
Try Out 2014
No ratings yet
Try Out 2014
5 pages
Grade 7 Term 3 Agriculture Schemes
No ratings yet
Grade 7 Term 3 Agriculture Schemes
7 pages
Grade 6 Week 6 ENGLISH
No ratings yet
Grade 6 Week 6 ENGLISH
6 pages
Prof: F. El Ouardi: Université Abdelmalek Essaadi Faculté Polydisciplinaire de Larache
No ratings yet
Prof: F. El Ouardi: Université Abdelmalek Essaadi Faculté Polydisciplinaire de Larache
3 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
The New Homemade Kitchen: 250 Recipes and Ideas for Reinventing the Art of Preserving, Canning, Fermenting, Dehydrating, and More
From Everand
The New Homemade Kitchen: 250 Recipes and Ideas for Reinventing the Art of Preserving, Canning, Fermenting, Dehydrating, and More
Joseph Shuldiner
4.5/5 (5)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Python Project 2 Colab

Uploaded by

Python Project 2 Colab

Uploaded by

1/22/25, 8:14 PM Finlatics project 2 .

In this dataset we are analysing Wine Quality dataset.

# importing necessery libraries

# importing the data set

# Checking info about the dataset

print("checking duplicate rows")

chlorides free sulfur dioxide total sulfur dioxide density \

pH sulphates alcohol quality

# Most frequently occurring wine quality

# Highest and lowest values in the 'quality' column

print("Most frequent wine quality : ",most_frequent_quality)

Most frequent wine quality : 5

# finding correlations between given features

corr_fixed_acidity = df['fixed acidity'].corr(df['quality'])

print("correlation between fixed acidity and quality of wine : ",corr_fixed_acidity)

print("correlation between alcohol and quality of wine : ",corr_alcohol)

print("corelation between free sulfur dioxide and quality of wine : ",corr_free_sulfur_dioxide)

correlation between fixed acidity and quality of wine : 0.12405164911322428

# visualizing the given correlations

# Fixed acidity vs Quality

# Free Sulfur Dioxide vs Quality

residual_sugar_best_quality = df[df['quality'] == df['quality'].max()]['residual sugar'].mean()

print("Average residual sugar for the best quality wine : ",residual_sugar_best_quality)

Average residual sugar for the best quality wine : 2.5777777777777775

# correlation of volatile acidity and wine quality

corr_volatile_acidity = df['volatile acidity'].corr(df['quality'])

print("correlation between volatile acidity and wine quality : ",corr_volatile_acidity)

# Scatter plot to visualize the relationship

correlation between volatile acidity and wine quality : -0.390557780264007

from sklearn.model_selection import train_test_split

# Splitting data into features (X) and target (y)

# Train-test split (80% train, 20% test)

# Decision Tree Model

# Random Forest Model

# comparing accuracy score of both models

print("for the given wine data")

if dt_accuracy > rf_accuracy:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.