Ids Final Sol
Ids Final Sol
Ids Final Sol
Question#1 [6+10+5=21]
x1 x2 y Predicted y
1 2 1
-2 -2 0
1 -1 0
3 1 1
1
Roll#:
2
Roll#:
3
Roll#:
Acual
legitimate fraudulent
Predicted
legitimate 345 15
fraudulent 6 135
Accuracy = 95.8
Precision = 95.74
Recall/Sensitivity = 90
Specificity = 98.29
F1 Score = 0.927
4
Roll#:
Question#3 [10]
1 100 500 7
2 150 470 5
3 200 700 8
4 120 300 4
5 180 800 9
6 250 400 6
7 90 600 7
8 300 250 3
9 170 900 8
10 130 520 6
11 160 630 7
12 220 550 5
13 110 480 4
14 190 730 9
15 140 720 8
16 260 200 2
17 230 950 10
18 120 380 5
19 200 830 9
20 180 700 8
5
Roll#:
6
Roll#:
Question#4: [10]
In this dataset:
OR
You are given a set of sentences, and your task is to calculate the Term Frequency-Inverse Document
Frequency (TF-IDF) for each word in these sentences. [For section A]
1. Inflation has increased unemployment
2. The company has increased its sales
3. Fear increased his pulse
7
Roll#:
8
Roll#:
9
Roll#:
Question#5 [10+5=15]
In this scenario, you are provided with data on several locations where accidents occurred in the
past month, along with the coordinates of three hospitals. Your objective is to analyze accident
hotspots and strategically cluster accident locations to determine an optimal allocation of hospitals.
Point
1 31.5764 74.3118
2 31.5769 74.3131
3 31.5774 74.3149
4 31.5775 74.3177
5 31.5779 74.3194
6 31.5763 74.3173
7 31.5759 74.3199
8 31.5745 74.3175
9 31.5730 74.3199
10
Roll#:
C1 = {1,2,3,11,12}
C2 = {4,5,6,7,8,9,10}
C1 = {1,2,3,11,12}
C2 = {4,5,6,7,8,9,10}
No change
11
Roll#:
12
Roll#:
1- imdb_movies.csv:
Attributes: {Title, Genre, Rating, IMDb Rating, Year, Duration, RegionCode}
2- imdb_regions.csv:
Attributes: {RegionCode, Country, Language}
3- crime_rate.csv:
Attributes: {Year, Crime rate, Country}
a) Load the csv files into pandas dataframes and perform basic preprocessing to clean the
dataset.
#drop duplicates
imdb_movies .drop_duplicates(inplace=True)
imdb_regions .drop_duplicates(inplace=True)
crime_rate .drop_duplicates(inplace=True)
b) Merge the dataframes where necessary and filter the data that belongs to United Kingdom
during 2000-2022
13
Roll#:
c) Perform EDA and come up with at least two comprehensive graphs/visualizations that can
help understand the data.
# Time Series Plot for Crime Rate Over the Years (2000-2022)
plt.figure(figsize=(12, 6))
sns.lineplot(x='Year', y='Crime rate', hue='RegionCode', data=crime_rate)
plt.title('Crime Rate Over the Years (2000-2022) in the United Kingdom')
plt.xlabel('Year')
plt.ylabel('Crime Rate')
plt.legend(title='Region Code', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
d) Your task is to find any relationship between movie Genre and crime rate over the years.
How would you achieve this task? Your answer should contain you strategy accompanied
with python code.
# Group by Genre and calculate the average crime rate for each genre
genre_crime_avg = merged_data.groupby('Genre')['IMDb Rating'].mean().reset_index()
14
Roll#:
plt.ylabel('Crime Rate')
plt.show()
# Correlation analysis
correlation_matrix = merged_genre_crime[['IMDb Rating', 'Crime rate']].corr()
print("Correlation Matrix:\n", correlation_matrix)
Rough Sheet:
15
Roll#:
—---------------------Good Luck!----------------------------
16