0% found this document useful (0 votes)
36 views

Lab 2 Solved

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Lab 2 Solved

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

10/24/24, 8:31 AM Lab 2 Solved

2022F-BSE-014

Lab 2
DATASET PREPARATION WITH EXCEL SPREADSHEET AND DATASET PREPROCESSING AND SCALING
TECHNIQUES

OBJECTIVE
Dataset preparation by selecting all the possible features on the given scenario using excel. Load
designated data set to working environment. Checking the data set for missing values and outliers.
Implementing Normalization and Standardization techniques to scale the values

Lab Tasks:

1. Write a python code to load an excel spreadsheet containing two different sheets and print both of
them.

In [15]: import pandas as pd


df1 = pd.read_excel('Book1.xlsx',sheet_name = 'Sheet1')
print('Sheet 1 \n',df1)
df2 = pd.read_excel('Book1.xlsx',sheet_name = 'Sheet2')
print('Sheet 2 \n',df2)

Sheet 1
Name age
0 person1 20
1 person2 30
2 person3 40
Sheet 2
Name work
0 person3 software
1 person 4 tuitor
2 person5 student

2. Write a python cade to generate a pandas data frame having 4 columns and 5 rows. Column 1 must
contain the index values like Ali, Amir, Kamran, etc and Row 1 must contain the subject names.

In [19]: import pandas as pd


df = pd.DataFrame({
'Math':[85, 90, 80, 70,85],
'Science':[75, 88, 85, 65,90],
'English':[95, 92, 88, 75,92],
'History':[60, 78, 70, 80,88],},
index = ['Ali', 'Amir', 'Kamran', 'Sara', 'Zain'])
print(df)

file:///C:/Users/DC/Downloads/Lab 2 Solved (1).html 1/3


10/24/24, 8:31 AM Lab 2 Solved

Math Science English History


Ali 85 75 95 60
Amir 90 88 92 78
Kamran 80 85 88 70
Sara 70 65 75 80
Zain 85 90 92 88

3. Write a python code to read an excel spreadsheet and only print first two columns using pandas data
frame.

In [21]: import pandas as pd


reqiure_coulumns = [0,3]
df = pd.read_excel('Student_Score.xlsx',usecols= reqiure_coulumns)
print(df)

Unnamed: 0 English
0 Ali 95
1 Amir 92
2 Kamran 88
3 Sara 75
4 Zain 92

4. Write a python code to skip the first two rows of excel spreadsheet and print the output using
pandas data frame.

In [29]: import pandas as pd


df = pd.read_excel('Student_Score.xlsx',skiprows = 2)
print(df)

Amir 90 88 92 78
0 Kamran 80 85 88 70
1 Sara 70 65 75 80
2 Zain 85 90 92 88

5. Write a python code to fill all the null values in Gender column of employees.csv with “No Gender”.
Print the first 10 to 30 rows of the data frame for visualization.

In [31]: import pandas as pd


df = pd.read_csv('employee.csv')
df.replace(to_replace='no',value = 'No Gender')
print(df)

Employee_ID Name Age Gender Department


0 1 John Smith 30 Male IT
1 2 Jane Doe 25 Female HR
2 3 Sam Jones 35 Male Marketing
3 4 Emily Ray 28 Female Sales
4 5 Michael 40 No Gender IT
5 6 Sarah Lee 22 Female HR
6 7 Tom Hanks 45 Male Marketing
7 8 Lisa Kim 32 No Gender Sales
8 9 David Lee 29 Male IT
9 10 Nina Patel 26 Female HR
10 11 Mark Fox 50 Male Marketing
11 12 Amy Adams 38 Female Sales
12 13 Ben Price 42 No Gender IT
13 14 Lily Chen 23 Female HR
14 15 Robert Z 36 Male Marketing
15 16 Anna Brown 31 No Gender Sales

file:///C:/Users/DC/Downloads/Lab 2 Solved (1).html 2/3


10/24/24, 8:31 AM Lab 2 Solved

6. Write a python code to scale the values of features (Age and Salary) using Min-Max Normalization
technique. Verify your answers by applying the formula mentioned above.

In [46]: import pandas as pd


import numpy as np
from sklearn import preprocessing
x = np.array([[25.0, 32.0, 45.0, 29.0, 38.0],
[50000.0, 70000.0, 120000.0, 65000.0, 80000.0]])
minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
print(minmax.fit(x).transform(x))

[[0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1.]]

7. Write a python code to scale the values of features (Age and Salary) using Standardization technique.
Verify your answers by applying the formula mentioned above. Age Salary 25 42000 36 50000 30
45000 27 43000 38 51000 42 62000 34 48000

In [50]: import pandas as pd


from sklearn import preprocessing
df = pd.DataFrame({'Age':[25,36,30,27,38,42,34],
'Salary':[42000,50000,45000,43000,51000,62000,48000]})
x = np.array(df)
sd = preprocessing.StandardScaler();
print(sd.fit(x).transform(x))

[[-1.43672117 -1.07039567]
[ 0.50411269 0.20496938]
[-0.55452396 -0.59213377]
[-1.08384229 -0.91097504]
[ 0.85699158 0.36439001]
[ 1.56274934 2.11801696]
[ 0.15123381 -0.11387188]]

8. Given this dictionary, create a dataframe from dictionary and interpolate the missing values using
backward interpolation. Hint: use interpolate().

dict = {'First Score': [100, 90, np.nan, 95],

'Second Score': [30, 45, 56, np.nan],

'Third Score': [np.nan, 40, 80, 98]}

In [6]: import pandas as pd


import numpy as np
dict = {'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(dict)
print(df.interpolate(value = np.nan,direction='backword'))

First Score Second Score Third Score


0 100.0 30.0 NaN
1 90.0 45.0 40.0
2 92.5 56.0 80.0
3 95.0 56.0 98.0

file:///C:/Users/DC/Downloads/Lab 2 Solved (1).html 3/3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy