DSREPORT
DSREPORT
APPROACH
Name: RegNo.:RA22110030204
Date: 21/2/2025 Class: CSE – 3G
Course: Data Science
1. Introduction
Crime rates are influenced by various socioeconomic factors such as education levels,
employment rates, median income, poverty rates, and population density. This report analyzes
a dataset containing crime rates and socioeconomic indicators across different regions. Using
the Think-Pair-Share approach, I first independently examined trends in the dataset, then
discussed insights with a peer, and finally shared collective observations. This collaborative
method enhances data-driven decision-making and critical thinking.
2. Dataset Description
The dataset contains information on crime rates and socioeconomic factors across multiple
regions, with the following key indicators:
Crime Rate: The highest crime rate is observed in Region_181 (1484), while the
lowest belongs to Region_28 (71). Regions with higher population density tend to
have higher crime rates.
Education Level: Regions with higher education levels, such as Region_27 (99.5),
generally have lower crime rates compared to regions with lower education levels like
Region_4 (54.4).
Median Income: Regions with higher median income, such as Region_1 (116,664),
generally have lower crime rates compared to regions with lower median income like
Region_2 (21,401).
Poverty Rate: Regions with higher poverty rates, such as Region_5 (26.5), tend to
have higher crime rates, whereas regions with lower poverty rates like Region_11
(8.3) experience lower crime rates.
To predict future crime rates, I applied two different approaches: the 3-day moving average
method and Linear Regression to compare their effectiveness.
MAt=(Pt−1+Pt−2+Pt−3)3MAt=3(Pt−1+Pt−2+Pt−3)
1200,1180,11451200,1180,1145
(1200+1180+1145)/3=1175(1200+1180+1145)/3=1175
This technique helps in identifying short-term trends and mitigating daily fluctuations.
A linear regression model was used to predict crime rates based on multiple factors such as
education level, employment rate, median income, poverty rate, and population density. The
model follows the equation:
CrimeRate=β0+β1(EducationLevel)+β2(EmploymentRate)+β3(MedianIncome)
+β4(PovertyRate)+β5(PopulationDensity)+εCrimeRate=β0+β1(EducationLevel)+β2
(EmploymentRate)+β3(MedianIncome)+β4(PovertyRate)+β5(PopulationDensity)+ε
where βi are coefficients learned from the data, and ε is the error term.
Region Actual Crime Rate Moving Average Prediction Linear Regression Prediction
Regions with higher education levels and employment rates generally have lower
crime rates.
Economic stability, indicated by higher median income, correlates with lower crime
rates.
Higher poverty rates and population density are associated with higher crime rates.
5.3 Limitations
The moving average method does not consider external shocks such as economic
crises or policy changes.
The linear regression model assumes a linear relationship, which may not fully
capture complex interactions.
Crime rates are influenced by complex interactions that require advanced predictive
modeling.
import pandas as pd
# Load dataset
data = pd.read_csv("crime_vs_socioeconomic_factors.csv")
target = data["Crime_Rate"]
model = LinearRegression()
model.fit(features, target)
predictions = model.predict(features)
QUALITY OF LIFE ANALYSIS USING THINK-PAIR-SHARE
APPROACH
Name: RegNo.: RA22110030204
Date: 21/2/2025 Class: CSE - 3G
Course: Data Science
1. Introduction
2. Dataset Description
The dataset contains information on 88 countries, with the following key indicators:
Quality of Life Index Composite score reflecting overall well-being 128.5 - 220.1
Health Care Index Quality and accessibility of healthcare services 39.8 - 79.3
3. Trend Analysis
Quality of Life Index: The highest QoL index is observed in Luxembourg (220.1),
while the lowest belongs to Bangladesh (128.5). Developed countries such as the
Netherlands (211.3) and Denmark (209.9) consistently score high.
Purchasing Power: Countries like the USA (177.4) and Switzerland (164.8) exhibit
strong purchasing power, whereas countries like Venezuela (31.5) and Egypt (39.2)
face economic constraints.
Safety Index: The safest country in the dataset is Oman (81.7), while South Africa
(23.4) has the lowest safety ranking due to high crime rates.
Health Care: The Netherlands (79.3) and Denmark (78.4) rank highest in healthcare
quality, whereas developing nations like India (39.8) lag behind.
Cost of Living: Switzerland has the highest cost of living (98.4), while Pakistan
records the lowest (23.1), reflecting affordability differences.
Pollution and Climate: The most polluted country is Bangladesh (89.6), while
Finland (12.6) enjoys the cleanest air. Climate favorability is highest in Spain (87.2)
and lowest in Russia (37.2).
To predict future QoL indicators, I applied two different approaches: the 3-day moving
average method and Linear Regression to compare their effectiveness.
This technique helps in identifying short-term trends and mitigating daily fluctuations.
A linear regression model was used to predict QoL based on multiple factors such as
purchasing power, healthcare, safety, and pollution index. The model follows the equation:
where are coefficients learned from the data, and is the error term.
Developed countries generally score high across all indices, particularly in purchasing
power (above 150), healthcare (above 70), and safety (above 60).
Economic stability and strong governance correlate with higher QoL, evident in
European nations consistently scoring above 200 in QoL index.
5.3 Limitations
The moving average method does not consider external shocks such as pandemics,
economic crises, or policy changes.
6. Code Implementation
import pandas as pd
# Load dataset
data = pd.read_csv("quality_of_life_indices_by_country.csv")
features = data[["Purchasing Power Index", "Safety Index", "Health Care Index", "Pollution
Index"]]
model = LinearRegression()
model.fit(features, target)
predictions = model.predict(features)