0% found this document useful (0 votes)
13 views4 pages

Assignment 8

The document is a Jupyter Notebook that analyzes the Boston housing dataset using Python libraries such as pandas, seaborn, and scikit-learn. It includes data loading, exploration, correlation analysis, and a linear regression model to predict median home prices. The model achieved an R^2 score of 0.6688 and a mean squared error of 24.2911.

Uploaded by

Purva Kamat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Assignment 8

The document is a Jupyter Notebook that analyzes the Boston housing dataset using Python libraries such as pandas, seaborn, and scikit-learn. It includes data loading, exploration, correlation analysis, and a linear regression model to predict median home prices. The model achieved an R^2 score of 0.6688 and a mean squared error of 24.2911.

Uploaded by

Purva Kamat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

05/03/2025, 12:20 Untitled18 - Jupyter Notebook

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
df=pd.read_csv('/home/pict/Downloads/data_boston_housing.csv')
df.head(5)

crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0

1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7

3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 crim 506 non-null float64
1 zn 506 non-null float64
2 indus 506 non-null float64
3 chas 506 non-null int64
4 nox 506 non-null float64
5 rm 506 non-null float64
6 age 506 non-null float64
7 dis 506 non-null float64
8 rad 506 non-null int64
9 tax 506 non-null int64
10 ptratio 506 non-null float64
11 b 506 non-null float64
12 lstat 506 non-null float64
13 medv 506 non-null float64
dtypes: float64(11), int64(3)
memory usage: 55.5 KB

In [4]:
df.describe()

crim zn indus chas nox rm age dis rad ta


count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.00000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 68.574901 3.795043 9.549407 408.23715

std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 28.148861 2.105710 8.707259 168.53711
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000 1.129600 1.000000 187.00000

25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 45.025000 2.100175 4.000000 279.00000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 77.500000 3.207450 5.000000 330.00000
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.623500 94.075000 5.188425 24.000000 666.00000

max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000 12.126500 24.000000 711.00000

localhost:8888/notebooks/Untitled18.ipynb?kernel_name=python3# 1/4
05/03/2025, 12:20 Untitled18 - Jupyter Notebook

In [5]:
df.isnull().sum()

crim 0
zn 0
indus 0
chas 0
nox 0
rm 0
age 0
dis 0
rad 0
tax 0
ptratio 0
b 0
lstat 0
medv 0
dtype: int64

In [6]:
df.fillna(df.mean())

crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7

3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6

503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0

505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9

506 rows × 14 columns

localhost:8888/notebooks/Untitled18.ipynb?kernel_name=python3# 2/4
05/03/2025, 12:20 Untitled18 - Jupyter Notebook

In [7]:
plt.figure(figsize=(12,8))
sns.heatmap(df.corr(),annot=True,cmap="coolwarm" ,fmt='.2f')
plt.title("Correlation Heatmap")
plt.show()

In [8]:
plt.scatter(df['rm'], df['medv'])
plt.xlabel('Average Number of Rooms')
plt.ylabel('Median Home Price')
plt.title('Rooms vs. Home Price')
plt.show()

localhost:8888/notebooks/Untitled18.ipynb?kernel_name=python3# 3/4
05/03/2025, 12:20 Untitled18 - Jupyter Notebook

In [9]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
x=df.drop('medv',axis=1)
y=df['medv']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
model=LinearRegression()
model.fit(x_train,y_train)

LinearRegression()

In [10]:
from sklearn.metrics import mean_squared_error,r2_score
y_pred=model.predict(x_test)
print(f"R^2 score:{r2_score(y_test,y_pred):.4f}")
print(f"Mean squared Error:{mean_squared_error(y_test,y_pred):.4f}")
#visualizing actual vs predicted values
plt.figure(figsize=(8,6))
plt.scatter(y_test,y_pred)
plt.plot([y_test.min(),y_test.max()],[y_test.min(),y_test.max()],color='red',linestyle='--
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted Home prices")
plt.show()

R^2 score:0.6688
Mean squared Error:24.2911

localhost:8888/notebooks/Untitled18.ipynb?kernel_name=python3# 4/4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy