0% found this document useful (0 votes)
18 views

Laptop Price Predictor

A Machine Learning Project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Laptop Price Predictor

A Machine Learning Project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('laptop_data.csv')

df.head()

Unnamed: 0 Company TypeName Inches


ScreenResolution \
0 0 Apple Ultrabook 13.3 IPS Panel Retina Display
2560x1600
1 1 Apple Ultrabook 13.3
1440x900
2 2 HP Notebook 15.6 Full HD
1920x1080
3 3 Apple Ultrabook 15.4 IPS Panel Retina Display
2880x1800
4 4 Apple Ultrabook 13.3 IPS Panel Retina Display
2560x1600

Cpu Ram Memory \


0 Intel Core i5 2.3GHz 8GB 128GB SSD
1 Intel Core i5 1.8GHz 8GB 128GB Flash Storage
2 Intel Core i5 7200U 2.5GHz 8GB 256GB SSD
3 Intel Core i7 2.7GHz 16GB 512GB SSD
4 Intel Core i5 3.1GHz 8GB 256GB SSD

Gpu OpSys Weight Price


0 Intel Iris Plus Graphics 640 macOS 1.37kg 71378.6832
1 Intel HD Graphics 6000 macOS 1.34kg 47895.5232
2 Intel HD Graphics 620 No OS 1.86kg 30636.0000
3 AMD Radeon Pro 455 macOS 1.83kg 135195.3360
4 Intel Iris Plus Graphics 650 macOS 1.37kg 96095.8080

df.shape

(1303, 12)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 1303 non-null int64
1 Company 1303 non-null object
2 TypeName 1303 non-null object
3 Inches 1303 non-null float64
4 ScreenResolution 1303 non-null object
5 Cpu 1303 non-null object
6 Ram 1303 non-null object
7 Memory 1303 non-null object
8 Gpu 1303 non-null object
9 OpSys 1303 non-null object
10 Weight 1303 non-null object
11 Price 1303 non-null float64
dtypes: float64(2), int64(1), object(9)
memory usage: 122.3+ KB

df.duplicated().sum()

df.isnull().sum()

Unnamed: 0 0
Company 0
TypeName 0
Inches 0
ScreenResolution 0
Cpu 0
Ram 0
Memory 0
Gpu 0
OpSys 0
Weight 0
Price 0
dtype: int64

df.drop(columns=['Unnamed: 0'],inplace=True)

df.head()

Company TypeName Inches ScreenResolution \


0 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600
1 Apple Ultrabook 13.3 1440x900
2 HP Notebook 15.6 Full HD 1920x1080
3 Apple Ultrabook 15.4 IPS Panel Retina Display 2880x1800
4 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600

Cpu Ram Memory \


0 Intel Core i5 2.3GHz 8GB 128GB SSD
1 Intel Core i5 1.8GHz 8GB 128GB Flash Storage
2 Intel Core i5 7200U 2.5GHz 8GB 256GB SSD
3 Intel Core i7 2.7GHz 16GB 512GB SSD
4 Intel Core i5 3.1GHz 8GB 256GB SSD

Gpu OpSys Weight Price


0 Intel Iris Plus Graphics 640 macOS 1.37kg 71378.6832
1 Intel HD Graphics 6000 macOS 1.34kg 47895.5232
2 Intel HD Graphics 620 No OS 1.86kg 30636.0000
3 AMD Radeon Pro 455 macOS 1.83kg 135195.3360
4 Intel Iris Plus Graphics 650 macOS 1.37kg 96095.8080

df['Ram'] = df['Ram'].str.replace('GB','')
df['Weight'] = df['Weight'].str.replace('kg','')

df.head()

Company TypeName Inches ScreenResolution \


0 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600
1 Apple Ultrabook 13.3 1440x900
2 HP Notebook 15.6 Full HD 1920x1080
3 Apple Ultrabook 15.4 IPS Panel Retina Display 2880x1800
4 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600

Cpu Ram Memory \


0 Intel Core i5 2.3GHz 8 128GB SSD
1 Intel Core i5 1.8GHz 8 128GB Flash Storage
2 Intel Core i5 7200U 2.5GHz 8 256GB SSD
3 Intel Core i7 2.7GHz 16 512GB SSD
4 Intel Core i5 3.1GHz 8 256GB SSD

Gpu OpSys Weight Price


0 Intel Iris Plus Graphics 640 macOS 1.37 71378.6832
1 Intel HD Graphics 6000 macOS 1.34 47895.5232
2 Intel HD Graphics 620 No OS 1.86 30636.0000
3 AMD Radeon Pro 455 macOS 1.83 135195.3360
4 Intel Iris Plus Graphics 650 macOS 1.37 96095.8080

df['Ram'] = df['Ram'].astype('int32')
df['Weight'] = df['Weight'].astype('float32')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Company 1303 non-null object
1 TypeName 1303 non-null object
2 Inches 1303 non-null float64
3 ScreenResolution 1303 non-null object
4 Cpu 1303 non-null object
5 Ram 1303 non-null int32
6 Memory 1303 non-null object
7 Gpu 1303 non-null object
8 OpSys 1303 non-null object
9 Weight 1303 non-null float32
10 Price 1303 non-null float64
dtypes: float32(1), float64(2), int32(1), object(7)
memory usage: 101.9+ KB

import seaborn as sns

sns.distplot(df['Price'])

C:\Users\91842\anaconda3\lib\site-packages\seaborn\
distributions.py:2557: FutureWarning: `distplot` is a deprecated
function and will be removed in a future version. Please adapt your
code to use either `displot` (a figure-level function with similar
flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)

<AxesSubplot:xlabel='Price', ylabel='Density'>

df['Company'].value_counts().plot(kind='bar')

<AxesSubplot:>
sns.barplot(x=df['Company'],y=df['Price'])
plt.xticks(rotation='vertical')
plt.show()
df['TypeName'].value_counts().plot(kind='bar')

<AxesSubplot:>
sns.barplot(x=df['TypeName'],y=df['Price'])
plt.xticks(rotation='vertical')
plt.show()
sns.distplot(df['Inches'])

C:\Users\91842\anaconda3\lib\site-packages\seaborn\
distributions.py:2557: FutureWarning: `distplot` is a deprecated
function and will be removed in a future version. Please adapt your
code to use either `displot` (a figure-level function with similar
flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)

<AxesSubplot:xlabel='Inches', ylabel='Density'>
sns.scatterplot(x=df['Inches'],y=df['Price'])

<AxesSubplot:xlabel='Inches', ylabel='Price'>

df['ScreenResolution'].value_counts()
Full HD 1920x1080 507
1366x768 281
IPS Panel Full HD 1920x1080 230
IPS Panel Full HD / Touchscreen 1920x1080 53
Full HD / Touchscreen 1920x1080 47
1600x900 23
Touchscreen 1366x768 16
Quad HD+ / Touchscreen 3200x1800 15
IPS Panel 4K Ultra HD 3840x2160 12
IPS Panel 4K Ultra HD / Touchscreen 3840x2160 11
4K Ultra HD / Touchscreen 3840x2160 10
Touchscreen 2560x1440 7
IPS Panel 1366x768 7
4K Ultra HD 3840x2160 7
IPS Panel Quad HD+ / Touchscreen 3200x1800 6
Touchscreen 2256x1504 6
IPS Panel Retina Display 2304x1440 6
IPS Panel Retina Display 2560x1600 6
IPS Panel Touchscreen 2560x1440 5
IPS Panel 2560x1440 4
IPS Panel Retina Display 2880x1800 4
IPS Panel Touchscreen 1920x1200 4
1440x900 4
Quad HD+ 3200x1800 3
IPS Panel Quad HD+ 2560x1440 3
1920x1080 3
Touchscreen 2400x1600 3
IPS Panel Touchscreen 1366x768 3
2560x1440 3
IPS Panel Full HD 2160x1440 2
IPS Panel Touchscreen / 4K Ultra HD 3840x2160 2
IPS Panel Quad HD+ 3200x1800 2
Touchscreen / Full HD 1920x1080 1
IPS Panel Retina Display 2736x1824 1
IPS Panel Full HD 1920x1200 1
IPS Panel Full HD 1366x768 1
Touchscreen / 4K Ultra HD 3840x2160 1
IPS Panel Touchscreen 2400x1600 1
IPS Panel Full HD 2560x1440 1
Touchscreen / Quad HD+ 3200x1800 1
Name: ScreenResolution, dtype: int64

df['Touchscreen'] = df['ScreenResolution'].apply(lambda x:1 if


'Touchscreen' in x else 0)

df.sample(5)

Company TypeName Inches \


1154 Dell Notebook 15.6
750 Lenovo Netbook 11.6
1246 Dell Notebook 14.0
879 HP Notebook 15.6
1021 Toshiba Ultrabook 13.3

ScreenResolution \
1154 IPS Panel Touchscreen / 4K Ultra HD 3840x2160
750 Touchscreen 1366x768
1246 1366x768
879 Full HD 1920x1080
1021 Full HD 1920x1080

Cpu Ram Memory \


1154 Intel Core i5 6300HQ 2.3GHz 8 256GB SSD
750 Intel Celeron Dual Core N3060 1.6GHz 4 128GB SSD
1246 Intel Core i5 7200U 2.5GHz 4 500GB HDD
879 Intel Core i5 7200U 2.5GHz 4 256GB SSD
1021 Intel Core i5 6200U 2.3GHz 8 256GB SSD

Gpu OpSys Weight Price


Touchscreen
1154 Nvidia GeForce 960M Windows 10 2.04 119916.2304
1
750 Intel HD Graphics 400 Windows 10 1.40 25308.0000
1
1246 Intel HD Graphics 620 Windows 10 1.60 46620.0000
0
879 Intel HD Graphics 620 Windows 10 2.04 44701.9200
0
1021 Intel HD Graphics 520 Windows 10 1.20 84715.2000
0

df['Touchscreen'].value_counts().plot(kind='bar')

<AxesSubplot:>
sns.barplot(x=df['Touchscreen'],y=df['Price'])

<AxesSubplot:xlabel='Touchscreen', ylabel='Price'>

df['Ips'] = df['ScreenResolution'].apply(lambda x:1 if 'IPS' in x else


0)

df.head()
Company TypeName Inches ScreenResolution \
0 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600
1 Apple Ultrabook 13.3 1440x900
2 HP Notebook 15.6 Full HD 1920x1080
3 Apple Ultrabook 15.4 IPS Panel Retina Display 2880x1800
4 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600

Cpu Ram Memory \


0 Intel Core i5 2.3GHz 8 128GB SSD
1 Intel Core i5 1.8GHz 8 128GB Flash Storage
2 Intel Core i5 7200U 2.5GHz 8 256GB SSD
3 Intel Core i7 2.7GHz 16 512GB SSD
4 Intel Core i5 3.1GHz 8 256GB SSD

Gpu OpSys Weight Price


Touchscreen Ips
0 Intel Iris Plus Graphics 640 macOS 1.37 71378.6832
0 1
1 Intel HD Graphics 6000 macOS 1.34 47895.5232
0 0
2 Intel HD Graphics 620 No OS 1.86 30636.0000
0 0
3 AMD Radeon Pro 455 macOS 1.83 135195.3360
0 1
4 Intel Iris Plus Graphics 650 macOS 1.37 96095.8080
0 1

df['Ips'].value_counts().plot(kind='bar')

<AxesSubplot:>
sns.barplot(x=df['Ips'],y=df['Price'])

<AxesSubplot:xlabel='Ips', ylabel='Price'>

new = df['ScreenResolution'].str.split('x',n=1,expand=True)

df['X_res'] = new[0]
df['Y_res'] = new[1]

df.sample(5)

Company TypeName Inches ScreenResolution \


141 Lenovo Notebook 14.0 IPS Panel Full HD 1920x1080
1055 HP Notebook 15.6 1366x768
75 Asus Gaming 15.6 Full HD 1920x1080
984 Toshiba Notebook 14.0 1366x768
337 HP Notebook 15.6 Full HD 1920x1080

Cpu Ram Memory


Gpu \
141 Intel Core i5 8250U 1.6GHz 8 256GB SSD AMD Radeon RX
550
1055 Intel Core i3 6100U 2.3GHz 4 500GB HDD Intel HD Graphics
520
75 Intel Core i7 7700HQ 2.8GHz 8 1TB HDD Nvidia GeForce GTX
1050
984 Intel Core i5 6200U 2.3GHz 4 500GB HDD Intel HD Graphics
520
337 Intel Core i5 7200U 2.5GHz 8 256GB SSD Intel HD Graphics
620

OpSys Weight Price Touchscreen Ips \


141 Windows 10 1.75 59461.5456 0 1
1055 Windows 10 2.31 37570.3920 0 0
75 Windows 10 2.20 50562.7200 0 0
984 Windows 10 1.75 48751.2000 0 0
337 Windows 10 1.84 60952.3200 0 0

X_res Y_res
141 IPS Panel Full HD 1920 1080
1055 1366 768
75 Full HD 1920 1080
984 1366 768
337 Full HD 1920 1080

df['X_res'] = df['X_res'].str.replace(',','').str.findall(r'(\d+\.?\
d+)').apply(lambda x:x[0])

df.head()

Company TypeName Inches ScreenResolution \


0 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600
1 Apple Ultrabook 13.3 1440x900
2 HP Notebook 15.6 Full HD 1920x1080
3 Apple Ultrabook 15.4 IPS Panel Retina Display 2880x1800
4 Apple Ultrabook 13.3 IPS Panel Retina Display 2560x1600

Cpu Ram Memory \


0 Intel Core i5 2.3GHz 8 128GB SSD
1 Intel Core i5 1.8GHz 8 128GB Flash Storage
2 Intel Core i5 7200U 2.5GHz 8 256GB SSD
3 Intel Core i7 2.7GHz 16 512GB SSD
4 Intel Core i5 3.1GHz 8 256GB SSD

Gpu OpSys Weight Price


Touchscreen Ips \
0 Intel Iris Plus Graphics 640 macOS 1.37 71378.6832
0 1
1 Intel HD Graphics 6000 macOS 1.34 47895.5232
0 0
2 Intel HD Graphics 620 No OS 1.86 30636.0000
0 0
3 AMD Radeon Pro 455 macOS 1.83 135195.3360
0 1
4 Intel Iris Plus Graphics 650 macOS 1.37 96095.8080
0 1

X_res Y_res
0 2560 1600
1 1440 900
2 1920 1080
3 2880 1800
4 2560 1600

df['X_res'] = df['X_res'].astype('int')
df['Y_res'] = df['Y_res'].astype('int')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Company 1303 non-null object
1 TypeName 1303 non-null object
2 Inches 1303 non-null float64
3 ScreenResolution 1303 non-null object
4 Cpu 1303 non-null object
5 Ram 1303 non-null int32
6 Memory 1303 non-null object
7 Gpu 1303 non-null object
8 OpSys 1303 non-null object
9 Weight 1303 non-null float32
10 Price 1303 non-null float64
11 Touchscreen 1303 non-null int64
12 Ips 1303 non-null int64
13 X_res 1303 non-null int32
14 Y_res 1303 non-null int32
dtypes: float32(1), float64(2), int32(3), int64(2), object(7)
memory usage: 132.5+ KB

df.corr()['Price']

Inches 0.068197
Ram 0.743007
Weight 0.210370
Price 1.000000
Touchscreen 0.191226
Ips 0.252208
X_res 0.556529
Y_res 0.552809
Name: Price, dtype: float64

df['ppi'] = (((df['X_res']**2) +
(df['Y_res']**2))**0.5/df['Inches']).astype('float')

df.corr()['Price']
Inches 0.068197
Ram 0.743007
Weight 0.210370
Price 1.000000
Touchscreen 0.191226
Ips 0.252208
X_res 0.556529
Y_res 0.552809
ppi 0.473487
Name: Price, dtype: float64

df.drop(columns=['ScreenResolution'],inplace=True)

df.head()

Company TypeName Inches Cpu Ram \


0 Apple Ultrabook 13.3 Intel Core i5 2.3GHz 8
1 Apple Ultrabook 13.3 Intel Core i5 1.8GHz 8
2 HP Notebook 15.6 Intel Core i5 7200U 2.5GHz 8
3 Apple Ultrabook 15.4 Intel Core i7 2.7GHz 16
4 Apple Ultrabook 13.3 Intel Core i5 3.1GHz 8

Memory Gpu OpSys Weight \


0 128GB SSD Intel Iris Plus Graphics 640 macOS 1.37
1 128GB Flash Storage Intel HD Graphics 6000 macOS 1.34
2 256GB SSD Intel HD Graphics 620 No OS 1.86
3 512GB SSD AMD Radeon Pro 455 macOS 1.83
4 256GB SSD Intel Iris Plus Graphics 650 macOS 1.37

Price Touchscreen Ips X_res Y_res ppi


0 71378.6832 0 1 2560 1600 226.983005
1 47895.5232 0 0 1440 900 127.677940
2 30636.0000 0 0 1920 1080 141.211998
3 135195.3360 0 1 2880 1800 220.534624
4 96095.8080 0 1 2560 1600 226.983005

df.drop(columns=['Inches','X_res','Y_res'],inplace=True)

df.head()

Company TypeName Cpu Ram


Memory \
0 Apple Ultrabook Intel Core i5 2.3GHz 8 128GB
SSD
1 Apple Ultrabook Intel Core i5 1.8GHz 8 128GB Flash
Storage
2 HP Notebook Intel Core i5 7200U 2.5GHz 8 256GB
SSD
3 Apple Ultrabook Intel Core i7 2.7GHz 16 512GB
SSD
4 Apple Ultrabook Intel Core i5 3.1GHz 8 256GB
SSD

Gpu OpSys Weight Price


Touchscreen Ips \
0 Intel Iris Plus Graphics 640 macOS 1.37 71378.6832
0 1
1 Intel HD Graphics 6000 macOS 1.34 47895.5232
0 0
2 Intel HD Graphics 620 No OS 1.86 30636.0000
0 0
3 AMD Radeon Pro 455 macOS 1.83 135195.3360
0 1
4 Intel Iris Plus Graphics 650 macOS 1.37 96095.8080
0 1

ppi
0 226.983005
1 127.677940
2 141.211998
3 220.534624
4 226.983005

df['Cpu'].value_counts()

Intel Core i5 7200U 2.5GHz 190


Intel Core i7 7700HQ 2.8GHz 146
Intel Core i7 7500U 2.7GHz 134
Intel Core i7 8550U 1.8GHz 73
Intel Core i5 8250U 1.6GHz 72
...
Intel Celeron Quad Core N3710 1.6GHz 1
Intel Core i5 7200U 2.7GHz 1
Intel Pentium Dual Core N4200 1.1GHz 1
AMD FX 8800P 2.1GHz 1
Intel Atom x5-Z8300 1.44GHz 1
Name: Cpu, Length: 118, dtype: int64

df['Cpu Name'] = df['Cpu'].apply(lambda x:" ".join(x.split()[0:3]))

df.head()

Company TypeName Cpu Ram


Memory \
0 Apple Ultrabook Intel Core i5 2.3GHz 8 128GB
SSD
1 Apple Ultrabook Intel Core i5 1.8GHz 8 128GB Flash
Storage
2 HP Notebook Intel Core i5 7200U 2.5GHz 8 256GB
SSD
3 Apple Ultrabook Intel Core i7 2.7GHz 16 512GB
SSD
4 Apple Ultrabook Intel Core i5 3.1GHz 8 256GB
SSD

Gpu OpSys Weight Price


Touchscreen Ips \
0 Intel Iris Plus Graphics 640 macOS 1.37 71378.6832
0 1
1 Intel HD Graphics 6000 macOS 1.34 47895.5232
0 0
2 Intel HD Graphics 620 No OS 1.86 30636.0000
0 0
3 AMD Radeon Pro 455 macOS 1.83 135195.3360
0 1
4 Intel Iris Plus Graphics 650 macOS 1.37 96095.8080
0 1

ppi Cpu Name


0 226.983005 Intel Core i5
1 127.677940 Intel Core i5
2 141.211998 Intel Core i5
3 220.534624 Intel Core i7
4 226.983005 Intel Core i5

def fetch_processor(text):
if text == 'Intel Core i7' or text == 'Intel Core i5' or text ==
'Intel Core i3':
return text
else:
if text.split()[0] == 'Intel':
return 'Other Intel Processor'
else:
return 'AMD Processor'

df['Cpu brand'] = df['Cpu Name'].apply(fetch_processor)

df.head()

Company TypeName Cpu Ram


Memory \
0 Apple Ultrabook Intel Core i5 2.3GHz 8 128GB
SSD
1 Apple Ultrabook Intel Core i5 1.8GHz 8 128GB Flash
Storage
2 HP Notebook Intel Core i5 7200U 2.5GHz 8 256GB
SSD
3 Apple Ultrabook Intel Core i7 2.7GHz 16 512GB
SSD
4 Apple Ultrabook Intel Core i5 3.1GHz 8 256GB
SSD
Gpu OpSys Weight Price
Touchscreen Ips \
0 Intel Iris Plus Graphics 640 macOS 1.37 71378.6832
0 1
1 Intel HD Graphics 6000 macOS 1.34 47895.5232
0 0
2 Intel HD Graphics 620 No OS 1.86 30636.0000
0 0
3 AMD Radeon Pro 455 macOS 1.83 135195.3360
0 1
4 Intel Iris Plus Graphics 650 macOS 1.37 96095.8080
0 1

ppi Cpu Name Cpu brand


0 226.983005 Intel Core i5 Intel Core i5
1 127.677940 Intel Core i5 Intel Core i5
2 141.211998 Intel Core i5 Intel Core i5
3 220.534624 Intel Core i7 Intel Core i7
4 226.983005 Intel Core i5 Intel Core i5

df['Cpu brand'].value_counts().plot(kind='bar')

<AxesSubplot:>
sns.barplot(x=df['Cpu brand'],y=df['Price'])
plt.xticks(rotation='vertical')
plt.show()
df.drop(columns=['Cpu','Cpu Name'],inplace=True)

df.head()

Company TypeName Ram Memory


Gpu \
0 Apple Ultrabook 8 128GB SSD Intel Iris Plus
Graphics 640
1 Apple Ultrabook 8 128GB Flash Storage Intel HD
Graphics 6000
2 HP Notebook 8 256GB SSD Intel HD
Graphics 620
3 Apple Ultrabook 16 512GB SSD AMD Radeon
Pro 455
4 Apple Ultrabook 8 256GB SSD Intel Iris Plus
Graphics 650

OpSys Weight Price Touchscreen Ips ppi Cpu


brand
0 macOS 1.37 71378.6832 0 1 226.983005 Intel
Core i5
1 macOS 1.34 47895.5232 0 0 127.677940 Intel
Core i5
2 No OS 1.86 30636.0000 0 0 141.211998 Intel
Core i5
3 macOS 1.83 135195.3360 0 1 220.534624 Intel
Core i7
4 macOS 1.37 96095.8080 0 1 226.983005 Intel
Core i5

df['Ram'].value_counts().plot(kind='bar')

<AxesSubplot:>

sns.barplot(x=df['Ram'],y=df['Price'])
plt.xticks(rotation='vertical')
plt.show()
df['Memory'].value_counts()

256GB SSD 412


1TB HDD 223
500GB HDD 132
512GB SSD 118
128GB SSD + 1TB HDD 94
128GB SSD 76
256GB SSD + 1TB HDD 73
32GB Flash Storage 38
2TB HDD 16
64GB Flash Storage 15
512GB SSD + 1TB HDD 14
1TB SSD 14
256GB SSD + 2TB HDD 10
1.0TB Hybrid 9
256GB Flash Storage 8
16GB Flash Storage 7
32GB SSD 6
180GB SSD 5
128GB Flash Storage 4
16GB SSD 3
512GB SSD + 2TB HDD 3
256GB SSD + 256GB SSD 2
128GB SSD + 2TB HDD 2
256GB SSD + 500GB HDD 2
512GB Flash Storage 2
1TB SSD + 1TB HDD 2
32GB HDD 1
64GB SSD 1
1.0TB HDD 1
512GB SSD + 256GB SSD 1
512GB SSD + 1.0TB Hybrid 1
8GB SSD 1
240GB SSD 1
128GB HDD 1
1TB HDD + 1TB HDD 1
512GB SSD + 512GB SSD 1
256GB SSD + 1.0TB Hybrid 1
508GB Hybrid 1
64GB Flash Storage + 1TB HDD 1
Name: Memory, dtype: int64

df['Memory'] = df['Memory'].astype(str).replace('\.0', '', regex=True)


df["Memory"] = df["Memory"].str.replace('GB', '')
df["Memory"] = df["Memory"].str.replace('TB', '000')
new = df["Memory"].str.split("+", n = 1, expand = True)

df["first"]= new[0]
df["first"]=df["first"].str.strip()

df["second"]= new[1]

df["Layer1HDD"] = df["first"].apply(lambda x: 1 if "HDD" in x else 0)


df["Layer1SSD"] = df["first"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer1Hybrid"] = df["first"].apply(lambda x: 1 if "Hybrid" in x
else 0)
df["Layer1Flash_Storage"] = df["first"].apply(lambda x: 1 if "Flash
Storage" in x else 0)

df['first'] = df['first'].str.replace(r'\D', '')

df["second"].fillna("0", inplace = True)

df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)


df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x
else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash
Storage" in x else 0)

df['second'] = df['second'].str.replace(r'\D', '')

df["first"] = df["first"].astype(int)
df["second"] = df["second"].astype(int)

df["HDD"]=(df["first"]*df["Layer1HDD"]+df["second"]*df["Layer2HDD"])
df["SSD"]=(df["first"]*df["Layer1SSD"]+df["second"]*df["Layer2SSD"])
df["Hybrid"]=(df["first"]*df["Layer1Hybrid"]
+df["second"]*df["Layer2Hybrid"])
df["Flash_Storage"]=(df["first"]*df["Layer1Flash_Storage"]
+df["second"]*df["Layer2Flash_Storage"])

df.drop(columns=['first', 'second', 'Layer1HDD', 'Layer1SSD',


'Layer1Hybrid',
'Layer1Flash_Storage', 'Layer2HDD', 'Layer2SSD',
'Layer2Hybrid',
'Layer2Flash_Storage'],inplace=True)

<ipython-input-93-10829db803de>:16: FutureWarning: The default value


of regex will change from True to False in a future version.
df['first'] = df['first'].str.replace(r'\D', '')
<ipython-input-93-10829db803de>:25: FutureWarning: The default value
of regex will change from True to False in a future version.
df['second'] = df['second'].str.replace(r'\D', '')

df.sample(5)

Company TypeName Ram Memory


Gpu \
1247 Asus Gaming 16 256 SSD + 1000 HDD Nvidia GeForce GTX
1070
505 Lenovo Notebook 8 256 SSD Intel HD Graphics
620
820 Lenovo Notebook 4 500 HDD Intel HD Graphics
520
21 Lenovo Gaming 8 128 SSD + 1000 HDD Nvidia GeForce GTX
1050
301 Asus Gaming 16 256 SSD + 1000 HDD Nvidia GeForce GTX
1070

OpSys Weight Price Touchscreen Ips ppi \


1247 Windows 10 2.34 123876.000 0 1 141.211998
505 Windows 10 1.44 50562.720 0 0 165.632118
820 Windows 10 2.10 26101.872 0 0 100.454670
21 Windows 10 2.50 53226.720 0 1 141.211998
301 Windows 10 2.90 113060.160 0 0 127.335675

Cpu brand HDD SSD Hybrid Flash_Storage


1247 Intel Core i7 1000 256 0 0
505 Intel Core i5 0 256 0 0
820 Intel Core i3 500 0 0 0
21 Intel Core i5 1000 128 0 0
301 Intel Core i7 1000 256 0 0

df.drop(columns=['Memory'],inplace=True)

df.head()
Company TypeName Ram Gpu OpSys Weight
\
0 Apple Ultrabook 8 Intel Iris Plus Graphics 640 macOS 1.37

1 Apple Ultrabook 8 Intel HD Graphics 6000 macOS 1.34

2 HP Notebook 8 Intel HD Graphics 620 No OS 1.86

3 Apple Ultrabook 16 AMD Radeon Pro 455 macOS 1.83

4 Apple Ultrabook 8 Intel Iris Plus Graphics 650 macOS 1.37

Price Touchscreen Ips ppi Cpu brand HDD SSD


Hybrid \
0 71378.6832 0 1 226.983005 Intel Core i5 0 128
0
1 47895.5232 0 0 127.677940 Intel Core i5 0 0
0
2 30636.0000 0 0 141.211998 Intel Core i5 0 256
0
3 135195.3360 0 1 220.534624 Intel Core i7 0 512
0
4 96095.8080 0 1 226.983005 Intel Core i5 0 256
0

Flash_Storage
0 0
1 128
2 0
3 0
4 0

df.corr()['Price']

Ram 0.743007
Weight 0.210370
Price 1.000000
Touchscreen 0.191226
Ips 0.252208
ppi 0.473487
HDD -0.096441
SSD 0.670799
Hybrid 0.007989
Flash_Storage -0.040511
Name: Price, dtype: float64

df.drop(columns=['Hybrid','Flash_Storage'],inplace=True)

df.head()
Company TypeName Ram Gpu OpSys Weight
\
0 Apple Ultrabook 8 Intel Iris Plus Graphics 640 macOS 1.37

1 Apple Ultrabook 8 Intel HD Graphics 6000 macOS 1.34

2 HP Notebook 8 Intel HD Graphics 620 No OS 1.86

3 Apple Ultrabook 16 AMD Radeon Pro 455 macOS 1.83

4 Apple Ultrabook 8 Intel Iris Plus Graphics 650 macOS 1.37

Price Touchscreen Ips ppi Cpu brand HDD SSD

0 71378.6832 0 1 226.983005 Intel Core i5 0 128

1 47895.5232 0 0 127.677940 Intel Core i5 0 0

2 30636.0000 0 0 141.211998 Intel Core i5 0 256

3 135195.3360 0 1 220.534624 Intel Core i7 0 512

4 96095.8080 0 1 226.983005 Intel Core i5 0 256

df['Gpu'].value_counts()

Intel HD Graphics 620 281


Intel HD Graphics 520 185
Intel UHD Graphics 620 68
Nvidia GeForce GTX 1050 66
Nvidia GeForce GTX 1060 48
...
Intel HD Graphics 540 1
AMD FirePro W6150M 1
AMD Radeon R5 M315 1
AMD Radeon R7 M360 1
AMD FirePro W5130M 1
Name: Gpu, Length: 110, dtype: int64

df['Gpu brand'] = df['Gpu'].apply(lambda x:x.split()[0])

df.head()

Company TypeName Ram Gpu OpSys Weight


\
0 Apple Ultrabook 8 Intel Iris Plus Graphics 640 macOS 1.37

1 Apple Ultrabook 8 Intel HD Graphics 6000 macOS 1.34

2 HP Notebook 8 Intel HD Graphics 620 No OS 1.86


3 Apple Ultrabook 16 AMD Radeon Pro 455 macOS 1.83

4 Apple Ultrabook 8 Intel Iris Plus Graphics 650 macOS 1.37

Price Touchscreen Ips ppi Cpu brand HDD SSD


\
0 71378.6832 0 1 226.983005 Intel Core i5 0 128

1 47895.5232 0 0 127.677940 Intel Core i5 0 0

2 30636.0000 0 0 141.211998 Intel Core i5 0 256

3 135195.3360 0 1 220.534624 Intel Core i7 0 512

4 96095.8080 0 1 226.983005 Intel Core i5 0 256

Gpu brand
0 Intel
1 Intel
2 Intel
3 AMD
4 Intel

df['Gpu brand'].value_counts()

Intel 722
Nvidia 400
AMD 180
ARM 1
Name: Gpu brand, dtype: int64

df = df[df['Gpu brand'] != 'ARM']

df['Gpu brand'].value_counts()

Intel 722
Nvidia 400
AMD 180
Name: Gpu brand, dtype: int64

sns.barplot(x=df['Gpu brand'],y=df['Price'],estimator=np.median)
plt.xticks(rotation='vertical')
plt.show()
df.drop(columns=['Gpu'],inplace=True)

C:\Users\91842\anaconda3\lib\site-packages\pandas\core\frame.py:4308:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation:


https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#
returning-a-view-versus-a-copy
return super().drop(

df.head()

Company TypeName Ram OpSys Weight Price Touchscreen


Ips \
0 Apple Ultrabook 8 macOS 1.37 71378.6832 0
1
1 Apple Ultrabook 8 macOS 1.34 47895.5232 0
0
2 HP Notebook 8 No OS 1.86 30636.0000 0
0
3 Apple Ultrabook 16 macOS 1.83 135195.3360 0
1
4 Apple Ultrabook 8 macOS 1.37 96095.8080 0
1

ppi Cpu brand HDD SSD Gpu brand


0 226.983005 Intel Core i5 0 128 Intel
1 127.677940 Intel Core i5 0 0 Intel
2 141.211998 Intel Core i5 0 256 Intel
3 220.534624 Intel Core i7 0 512 AMD
4 226.983005 Intel Core i5 0 256 Intel

df['OpSys'].value_counts()

Windows 10 1072
No OS 66
Linux 62
Windows 7 45
Chrome OS 26
macOS 13
Windows 10 S 8
Mac OS X 8
Android 2
Name: OpSys, dtype: int64

sns.barplot(x=df['OpSys'],y=df['Price'])
plt.xticks(rotation='vertical')
plt.show()
def cat_os(inp):
if inp == 'Windows 10' or inp == 'Windows 7' or inp == 'Windows 10
S':
return 'Windows'
elif inp == 'macOS' or inp == 'Mac OS X':
return 'Mac'
else:
return 'Others/No OS/Linux'

df['os'] = df['OpSys'].apply(cat_os)

<ipython-input-122-38671a3c07bd>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:


https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#
returning-a-view-versus-a-copy
df['os'] = df['OpSys'].apply(cat_os)

df.head()

Company TypeName Ram OpSys Weight Price Touchscreen


Ips \
0 Apple Ultrabook 8 macOS 1.37 71378.6832 0
1
1 Apple Ultrabook 8 macOS 1.34 47895.5232 0
0
2 HP Notebook 8 No OS 1.86 30636.0000 0
0
3 Apple Ultrabook 16 macOS 1.83 135195.3360 0
1
4 Apple Ultrabook 8 macOS 1.37 96095.8080 0
1

ppi Cpu brand HDD SSD Gpu brand os


0 226.983005 Intel Core i5 0 128 Intel Mac
1 127.677940 Intel Core i5 0 0 Intel Mac
2 141.211998 Intel Core i5 0 256 Intel Others/No OS/Linux
3 220.534624 Intel Core i7 0 512 AMD Mac
4 226.983005 Intel Core i5 0 256 Intel Mac

df.drop(columns=['OpSys'],inplace=True)

C:\Users\91842\anaconda3\lib\site-packages\pandas\core\frame.py:4308:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation:


https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#
returning-a-view-versus-a-copy
return super().drop(

sns.barplot(x=df['os'],y=df['Price'])
plt.xticks(rotation='vertical')
plt.show()

sns.distplot(df['Weight'])

C:\Users\91842\anaconda3\lib\site-packages\seaborn\
distributions.py:2557: FutureWarning: `distplot` is a deprecated
function and will be removed in a future version. Please adapt your
code to use either `displot` (a figure-level function with similar
flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)

<AxesSubplot:xlabel='Weight', ylabel='Density'>
sns.scatterplot(x=df['Weight'],y=df['Price'])

<AxesSubplot:xlabel='Weight', ylabel='Price'>

df.corr()['Price']
Ram 0.742905
Weight 0.209867
Price 1.000000
Touchscreen 0.192917
Ips 0.253320
ppi 0.475368
HDD -0.096891
SSD 0.670660
Name: Price, dtype: float64

sns.heatmap(df.corr())

<AxesSubplot:>

sns.distplot(np.log(df['Price']))

C:\Users\91842\anaconda3\lib\site-packages\seaborn\
distributions.py:2557: FutureWarning: `distplot` is a deprecated
function and will be removed in a future version. Please adapt your
code to use either `displot` (a figure-level function with similar
flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)

<AxesSubplot:xlabel='Price', ylabel='Density'>
X = df.drop(columns=['Price'])
y = np.log(df['Price'])

Company TypeName Ram Weight Touchscreen Ips


ppi \
0 Apple Ultrabook 8 1.37 0 1
226.983005
1 Apple Ultrabook 8 1.34 0 0
127.677940
2 HP Notebook 8 1.86 0 0
141.211998
3 Apple Ultrabook 16 1.83 0 1
220.534624
4 Apple Ultrabook 8 1.37 0 1
226.983005
... ... ... ... ... ... ...
...
1298 Lenovo 2 in 1 Convertible 4 1.80 1 1
157.350512
1299 Lenovo 2 in 1 Convertible 16 1.30 1 1
276.053530
1300 Lenovo Notebook 2 1.50 0 0
111.935204
1301 HP Notebook 6 2.19 0 0
100.454670
1302 Asus Notebook 4 2.20 0 0
100.454670
Cpu brand HDD SSD Gpu brand os
0 Intel Core i5 0 128 Intel Mac
1 Intel Core i5 0 0 Intel Mac
2 Intel Core i5 0 256 Intel Others/No OS/Linux
3 Intel Core i7 0 512 AMD Mac
4 Intel Core i5 0 256 Intel Mac
... ... ... ... ... ...
1298 Intel Core i7 0 128 Intel Windows
1299 Intel Core i7 0 512 Intel Windows
1300 Other Intel Processor 0 0 Intel Windows
1301 Intel Core i7 1000 0 AMD Windows
1302 Other Intel Processor 500 0 Intel Windows

[1302 rows x 12 columns]

0 11.175755
1 10.776777
2 10.329931
3 11.814476
4 11.473101
...
1298 10.433899
1299 11.288115
1300 9.409283
1301 10.614129
1302 9.886358
Name: Price, Length: 1302, dtype: float64

from sklearn.model_selection import train_test_split


X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.15,random_state=2)

X_train

Company TypeName Ram Weight Touchscreen Ips


ppi \
183 Toshiba Notebook 8 2.00 0 0
100.454670
1141 MSI Gaming 8 2.40 0 0
141.211998
1049 Asus Netbook 4 1.20 0 0
135.094211
1020 Dell 2 in 1 Convertible 4 2.08 1 1
141.211998
878 Dell Notebook 4 2.18 0 0
141.211998
... ... ... ... ... ... ...
...
466 Acer Notebook 4 2.20 0 0
100.454670
299 Asus Ultrabook 16 1.63 0 0
141.211998
493 Acer Notebook 8 2.20 0 0
100.454670
527 Lenovo Notebook 8 2.20 0 0
100.454670
1193 Apple Ultrabook 8 0.92 0 1
226.415547

Cpu brand HDD SSD Gpu brand os


183 Intel Core i5 0 128 Intel Windows
1141 Intel Core i7 1000 128 Nvidia Windows
1049 Other Intel Processor 0 0 Intel Others/No OS/Linux
1020 Intel Core i3 1000 0 Intel Windows
878 Intel Core i5 1000 128 Nvidia Windows
... ... ... ... ... ...
466 Intel Core i3 500 0 Nvidia Windows
299 Intel Core i7 0 512 Nvidia Windows
493 AMD Processor 1000 0 AMD Windows
527 Intel Core i3 2000 0 Nvidia Others/No OS/Linux
1193 Other Intel Processor 0 0 Intel Mac

[1106 rows x 12 columns]

from sklearn.compose import ColumnTransformer


from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import r2_score,mean_absolute_error

from sklearn.linear_model import LinearRegression,Ridge,Lasso


from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import
RandomForestRegressor,GradientBoostingRegressor,AdaBoostRegressor,Extr
aTreesRegressor
from sklearn.svm import SVR
from xgboost import XGBRegressor

Linear regression
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = LinearRegression()

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8073277448418521
MAE 0.21017827976429174

Ridge Regression
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = Ridge(alpha=10)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8127331031311811
MAE 0.20926802242582954

Lasso Regression
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = Lasso(alpha=0.001)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8071853945317105
MAE 0.21114361613472565

KNN
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = KNeighborsRegressor(n_neighbors=3)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8021984604448553
MAE 0.19319716721521116

Decision Tree
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = DecisionTreeRegressor(max_depth=8)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))
R2 score 0.8466456692979233
MAE 0.1806340977609143

SVM
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = SVR(kernel='rbf',C=10000,epsilon=0.1)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8083180902257614
MAE 0.20239059427481307

Random Forest
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = RandomForestRegressor(n_estimators=100,
random_state=3,
max_samples=0.5,
max_features=0.75,
max_depth=15)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))
R2 score 0.8873402378382488
MAE 0.15860130110457718

ExtraTrees
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = ExtraTreesRegressor(n_estimators=100,
random_state=3,
max_samples=0.5,
max_features=0.75,
max_depth=15)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8753793123440623
MAE 0.15979519126758127

AdaBoost
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = AdaBoostRegressor(n_estimators=15,learning_rate=1.0)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))
R2 score 0.7929652659237908
MAE 0.23296532406396742

Gradient Boost
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = GradientBoostingRegressor(n_estimators=500)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8823244736036472
MAE 0.15929506744611283

XgBoost
step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

step2 = XGBRegressor(n_estimators=45,max_depth=5,learning_rate=0.5)

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8811773435850243
MAE 0.16496203512600974
Voting Regressor
from sklearn.ensemble import VotingRegressor,StackingRegressor

step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

rf =
RandomForestRegressor(n_estimators=350,random_state=3,max_samples=0.5,
max_features=0.75,max_depth=15)
gbdt = GradientBoostingRegressor(n_estimators=100,max_features=0.5)
xgb = XGBRegressor(n_estimators=25,learning_rate=0.3,max_depth=5)
et =
ExtraTreesRegressor(n_estimators=100,random_state=3,max_samples=0.5,ma
x_features=0.75,max_depth=10)

step2 = VotingRegressor([('rf', rf), ('gbdt', gbdt), ('xgb',xgb),


('et',et)],weights=[5,1,1,1])

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8901036732986811
MAE 0.15847265699907628

Stacking
from sklearn.ensemble import VotingRegressor,StackingRegressor

step1 = ColumnTransformer(transformers=[
('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])
],remainder='passthrough')

estimators = [
('rf',
RandomForestRegressor(n_estimators=350,random_state=3,max_samples=0.5,
max_features=0.75,max_depth=15)),

('gbdt',GradientBoostingRegressor(n_estimators=100,max_features=0.5)),
('xgb',
XGBRegressor(n_estimators=25,learning_rate=0.3,max_depth=5))
]

step2 = StackingRegressor(estimators=estimators,
final_estimator=Ridge(alpha=100))

pipe = Pipeline([
('step1',step1),
('step2',step2)
])

pipe.fit(X_train,y_train)

y_pred = pipe.predict(X_test)

print('R2 score',r2_score(y_test,y_pred))
print('MAE',mean_absolute_error(y_test,y_pred))

R2 score 0.8816958647512341
MAE 0.1663048975120589

Exporting the Model


import pickle

pickle.dump(df,open('df.pkl','wb'))
pickle.dump(pipe,open('pipe.pkl','wb'))

df

Company TypeName Ram Weight Price


Touchscreen Ips \
0 Apple Ultrabook 8 1.37 71378.6832
0 1
1 Apple Ultrabook 8 1.34 47895.5232
0 0
2 HP Notebook 8 1.86 30636.0000
0 0
3 Apple Ultrabook 16 1.83 135195.3360
0 1
4 Apple Ultrabook 8 1.37 96095.8080
0 1
... ... ... ... ... ... ..
. ...
1298 Lenovo 2 in 1 Convertible 4 1.80 33992.6400
1 1
1299 Lenovo 2 in 1 Convertible 16 1.30 79866.7200
1 1
1300 Lenovo Notebook 2 1.50 12201.1200
0 0
1301 HP Notebook 6 2.19 40705.9200
0 0
1302 Asus Notebook 4 2.20 19660.3200
0 0

ppi Cpu brand HDD SSD Gpu brand \


0 226.983005 Intel Core i5 0 128 Intel
1 127.677940 Intel Core i5 0 0 Intel
2 141.211998 Intel Core i5 0 256 Intel
3 220.534624 Intel Core i7 0 512 AMD
4 226.983005 Intel Core i5 0 256 Intel
... ... ... ... ... ...
1298 157.350512 Intel Core i7 0 128 Intel
1299 276.053530 Intel Core i7 0 512 Intel
1300 111.935204 Other Intel Processor 0 0 Intel
1301 100.454670 Intel Core i7 1000 0 AMD
1302 100.454670 Other Intel Processor 500 0 Intel

os
0 Mac
1 Mac
2 Others/No OS/Linux
3 Mac
4 Mac
... ...
1298 Windows
1299 Windows
1300 Windows
1301 Windows
1302 Windows

[1302 rows x 13 columns]

X_train

Company TypeName Ram Weight Touchscreen Ips


ppi \
183 Toshiba Notebook 8 2.00 0 0
100.454670
1141 MSI Gaming 8 2.40 0 0
141.211998
1049 Asus Netbook 4 1.20 0 0
135.094211
1020 Dell 2 in 1 Convertible 4 2.08 1 1
141.211998
878 Dell Notebook 4 2.18 0 0
141.211998
... ... ... ... ... ... ...
...
466 Acer Notebook 4 2.20 0 0
100.454670
299 Asus Ultrabook 16 1.63 0 0
141.211998
493 Acer Notebook 8 2.20 0 0
100.454670
527 Lenovo Notebook 8 2.20 0 0
100.454670
1193 Apple Ultrabook 8 0.92 0 1
226.415547

Cpu brand HDD SSD Gpu brand os


183 Intel Core i5 0 128 Intel Windows
1141 Intel Core i7 1000 128 Nvidia Windows
1049 Other Intel Processor 0 0 Intel Others/No OS/Linux
1020 Intel Core i3 1000 0 Intel Windows
878 Intel Core i5 1000 128 Nvidia Windows
... ... ... ... ... ...
466 Intel Core i3 500 0 Nvidia Windows
299 Intel Core i7 0 512 Nvidia Windows
493 AMD Processor 1000 0 AMD Windows
527 Intel Core i3 2000 0 Nvidia Others/No OS/Linux
1193 Other Intel Processor 0 0 Intel Mac

[1106 rows x 12 columns]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy