Air Bnb data analysis project
Air Bnb data analysis project
In [23]: df = pd.read_csv("Airbnb_Open_Data.csv")
C:\Users\swati\AppData\Local\Temp\ipykernel_1952\3424017332.py:1: DtypeWarning: C
olumns (25) have mixed types. Specify dtype option on import or set low_memory=Fa
lse.
df = pd.read_csv("Airbnb_Open_Data.csv")
In [24]: df
Out[24]:
neigh
id NAME host id host_identity_verified host name
Skylit Midtown
1 1002102 52335172823 verified Jenna
Castle
THE VILLAGE
OF
2 1002403 78829239556 NaN Elise
HARLEM....NEW
YORK !
Entire Apt:
Spacious
4 1003689 92037596077 verified Lyndon
Studio/Loft by
central park
Spare room in
102594 6092437 12312296767 verified Krik
Williamsburg
Best Location
102595 6092990 near Columbia 77864383453 unconfirmed Mifan
U
Comfy, bright
102596 6093542 room in 69050334417 unconfirmed Megan
Brooklyn
Big Studio-One
102597 6094094 Stop from 11160591270 unconfirmed Christopher
Midtown
585 sf Luxury
102598 6094647 68170633372 unconfirmed Rebecca
Studio
id 0
NAME 250
host id 0
host_identity_verified 289
host name 406
neighbourhood group 29
neighbourhood 16
lat 8
long 8
country 532
country code 131
instant_bookable 105
cancellation_policy 76
room type 0
Construction year 214
price 247
service fee 273
minimum nights 409
number of reviews 183
last review 15893
reviews per month 15879
review rate number 326
calculated host listings count 319
availability 365 448
house_rules 52131
license 102597
dtype: int64
In [36]: print(df.isnull().sum())
id 0
NAME 0
host id 0
host_identity_verified 276
host name 0
neighbourhood group 26
neighbourhood 16
lat 8
long 8
country 526
country code 122
instant_bookable 96
cancellation_policy 70
room type 0
Construction year 200
price 239
service fee 268
minimum nights 403
number of reviews 182
last review 0
reviews per month 0
review rate number 314
calculated host listings count 318
availability 365 420
house_rules 51867
license 101947
dtype: int64
Remove Duplicates
In [42]: df.drop_duplicates(inplace=True)
<class 'pandas.core.frame.DataFrame'>
Index: 101410 entries, 0 to 102057
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 101410 non-null int64
1 NAME 101410 non-null object
2 host id 101410 non-null int64
3 host_identity_verified 101134 non-null object
4 host name 101410 non-null object
5 neighbourhood group 101384 non-null object
6 neighbourhood 101394 non-null object
7 lat 101402 non-null float64
8 long 101402 non-null float64
9 country 100884 non-null object
10 country code 101288 non-null object
11 instant_bookable 101314 non-null object
12 cancellation_policy 101340 non-null object
13 room type 101410 non-null object
14 Construction year 101210 non-null float64
15 price 101171 non-null float64
16 service fee 101142 non-null float64
17 minimum nights 101016 non-null float64
18 number of reviews 101228 non-null float64
19 last review 101410 non-null datetime64[ns]
20 reviews per month 101410 non-null float64
21 review rate number 101103 non-null float64
22 calculated host listings count 101092 non-null float64
23 availability 365 100990 non-null float64
24 house_rules 49831 non-null object
25 license 2 non-null object
dtypes: datetime64[ns](1), float64(11), int64(2), object(12)
memory usage: 20.9+ MB
None
In [49]: df
Out[49]:
host neighb
id NAME host id host_identity_verified
name
Skylit Midtown
1 1002102 52335172823 verified Jenna M
Castle
THE VILLAGE
OF
2 1002403 78829239556 NaN Elise M
HARLEM....NEW
YORK !
Entire Apt:
Spacious
4 1003689 92037596077 verified Lyndon M
Studio/Loft by
central park
Large Cozy 1
BR Apartment
5 1004098 45498551794 verified Michelle M
In Midtown
East
Cozy bright
102053 57365208 room near 77326652202 unconfirmed Mariam
Prospect Park
Private
Bedroom with
102054 57365760 45936254757 verified Trey
Amazing
Rooftop View
Pretty Brooklyn
One-Bedroom
102055 57366313 23801060917 verified Michael
for 2 to 4
people
Room &
private
102056 57366865 15593031571 unconfirmed Shireen M
bathroom in
historic Harlem
Rosalee
102057 57367417 93578954226 verified Stanley M
Stewart
Descriptive Statistics
file:///C:/Users/swati/Downloads/Air Bnb data analysis project.html 6/13
3/22/25, 3:13 PM Air Bnb data analysis project
In [54]: df.describe()
Out[54]:
Construction
id host id lat long
year
Visualization
Distribution of Prices
Plot the distribution of listing prices.
plt.figure(figsize=(10, 6))
sns.histplot(df['price'], bins=50, kde=True, color='red') # Set histogram color
plt.title('Distribution of Listing Prices')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()
Neighborhood Analysis
Examine how listings are distributed across different neighborhoods.
C:\Users\swati\AppData\Local\Temp\ipykernel_1952\1262699844.py:6: UserWarning: No
artists with labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no argument.
plt.legend(title='Room Type')
In [79]: df.head()
Out[79]:
host neighbourhoo
id NAME host id host_identity_verified
name grou
Skylit Midtown
1 1002102 52335172823 verified Jenna Manhatta
Castle
THE VILLAGE
OF
2 1002403 78829239556 NaN Elise Manhatta
HARLEM....NEW
YORK !
Entire Apt:
Spacious
4 1003689 92037596077 verified Lyndon Manhatta
Studio/Loft by
central park
Large Cozy 1
BR Apartment
5 1004098 45498551794 verified Michelle Manhatta
In Midtown
East
5 rows × 24 columns
plt.figure(figsize=(12, 6))
reviews_over_time.plot(kind='line',color='red')
plt.title('Number of Reviews Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Reviews')
plt.show()
In [ ]:
In [ ]: