Lab 2 Solved
Lab 2 Solved
2022F-BSE-014
Lab 2
DATASET PREPARATION WITH EXCEL SPREADSHEET AND DATASET PREPROCESSING AND SCALING
TECHNIQUES
OBJECTIVE
Dataset preparation by selecting all the possible features on the given scenario using excel. Load
designated data set to working environment. Checking the data set for missing values and outliers.
Implementing Normalization and Standardization techniques to scale the values
Lab Tasks:
1. Write a python code to load an excel spreadsheet containing two different sheets and print both of
them.
Sheet 1
Name age
0 person1 20
1 person2 30
2 person3 40
Sheet 2
Name work
0 person3 software
1 person 4 tuitor
2 person5 student
2. Write a python cade to generate a pandas data frame having 4 columns and 5 rows. Column 1 must
contain the index values like Ali, Amir, Kamran, etc and Row 1 must contain the subject names.
3. Write a python code to read an excel spreadsheet and only print first two columns using pandas data
frame.
Unnamed: 0 English
0 Ali 95
1 Amir 92
2 Kamran 88
3 Sara 75
4 Zain 92
4. Write a python code to skip the first two rows of excel spreadsheet and print the output using
pandas data frame.
Amir 90 88 92 78
0 Kamran 80 85 88 70
1 Sara 70 65 75 80
2 Zain 85 90 92 88
5. Write a python code to fill all the null values in Gender column of employees.csv with “No Gender”.
Print the first 10 to 30 rows of the data frame for visualization.
6. Write a python code to scale the values of features (Age and Salary) using Min-Max Normalization
technique. Verify your answers by applying the formula mentioned above.
[[0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1.]]
7. Write a python code to scale the values of features (Age and Salary) using Standardization technique.
Verify your answers by applying the formula mentioned above. Age Salary 25 42000 36 50000 30
45000 27 43000 38 51000 42 62000 34 48000
[[-1.43672117 -1.07039567]
[ 0.50411269 0.20496938]
[-0.55452396 -0.59213377]
[-1.08384229 -0.91097504]
[ 0.85699158 0.36439001]
[ 1.56274934 2.11801696]
[ 0.15123381 -0.11387188]]
8. Given this dictionary, create a dataframe from dictionary and interpolate the missing values using
backward interpolation. Hint: use interpolate().