0% found this document useful (0 votes)

6 views10 pages

Final SSD

This document outlines a project focused on analyzing New York City's 311 Customer Service Requests dataset to enhance public service efficiency through data analytics. It details the stages of data understanding, preparation, analysis, exploration, and statistical testing, ultimately aiming to identify trends and improve resource allocation for service requests. The findings highlight significant insights into complaint types, response times, and regional service issues, providing actionable recommendations for city officials.

Uploaded by

anshraut807

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views10 pages

Final SSD

Uploaded by

anshraut807

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

1.

Introduction
Public service agencies are increasingly using data analytics to improve operational
efficiency and service delivery in the age of data-driven decision-making. The
examination of New York City's 311 Customer Service Requests dataset, which contains
millions of resident complaints and service requests, is the main emphasis of this
coursework. This project's main goal is to comprehend, prepare, investigate, and analyze
the dataset for significant insights using Python programming and data analysis
methodologies. Understanding the data is the first step in this project's extensive process,
which also includes data preparation, exploration, statistical analysis, and visualization.
We investigate a number of dataset features, including complaint categories, service
response times, and regional trends in service requests. To prepare the dataset for
additional mining and statistical testing, a large portion of the study entails cleaning it up
by eliminating unnecessary columns and dealing with missing values. We seek to identify
important information regarding where and how services are requested, as well as how
well they are handled, by calculating summary statistics, examining correlations, and
visualizing complaint trends. To ascertain whether response times vary by complaint type
and whether complaint types are linked to particular locations, statistical tests are also
conducted. The analysis's findings can assist local authorities in better allocating
resources and enhancing response effectiveness.

2. Data understanding
New York City residents' non-emergency complaints and service requests are tracked in
the NYC 311 Customer Service Requests dataset. Numerous columns, including
"Created Date," "Closed Date," "Complaint Type," "Descriptor," "Agency," and
"Borough," are included in the data. A distinct service request is represented by each
row in the dataset. To perform meaningful analysis, one must comprehend the contents
and structure of this collection. Examining the quantity of rows and columns,
determining the data types of each column, and assessing the significance of the
important variables are all part of the initial investigation. For instance, "Complaint Type"
describes the nature of the issue reported, while "Created Date" and "Closed Date"
represent the timeline of the service request. Additional fields provide geographical and
administrative context, such as the borough or agency responsible. A crucial part of this
phase is identifying missing or inconsistent values and determining which columns are
relevant or redundant. Several columns, such as those related to school details, taxi
information, and bridge or highway names, may not contribute meaningfully to the
analysis and can be dropped in the data preparation stage. This understanding lays the
foundation for effective data cleaning, transformation, and deeper exploration in
subsequent steps.
3. Data preparation

Cleaning and converting the raw material into an appropriate format for analysis is a
crucial step in the data preparation process. Using Python packages like pandas, the
NYC 311 Customer Service Requests dataset is initially imported for this project.
Following data loading, an initial analysis is carried out to determine the kinds of
information contained in the dataset, such as different complaint types, timestamps,
locations, and agency specifics. Certain transformations are used to properly prepare
the data. In order to compute a new column named "Request_Closing_Time" that
indicates the amount of time that passed between the creation and closure of a
complaint, the "Created Date" and "Closed Date" columns must first be changed to the
datetime format. This new capability is crucial for service efficiency analysis. The
following stage is eliminating a list of superfluous columns that don't contribute to the
study, like comprehensive data about schools, cars, and bridges. To cut down on data
noise, a Python program is used to drop these columns. To maintain consistency in the
dataset, all missing (NaN) values are then eliminated. Finally, to learn more about the
variability of the data, the unique values in each column are analyzed. These
preparatory procedures guarantee that the dataset is clear, pertinent, and prepared for
insightful analysis and statistical testing.
4. Data Analysis

Finding significant statistical insights in the cleaned dataset is the main goal of the data
analysis phase. This entails creating summary statistics and investigating correlations
between various factors. Key metrics including total, mean, standard deviation,
skewness, and kurtosis are computed for the dataset's numerical columns using Python
packages like pandas, numpy, and scipy. Understanding the distribution and variability
of service request durations as well as other numerical aspects is made easier with the
aid of these descriptive statistics. The "Request_Closing_Time" mean and standard
deviation, for instance, can be used to determine how quickly complaints are normally
handled and how much response time variance there is. Finding outliers or anomalous
patterns can be aided by knowing the symmetry and peakedness of the data
distribution, which is provided by skewness and kurtosis. The correlation matrix is
created to look at correlations between numerical variables in addition to univariate
analysis. This correlation analysis shows the degree of relationship between several
variables, including the potential relationship between location-related attributes and
complaint duration. To ensure that the research is both data-driven and statistically
valid, these discoveries form the foundation for more in-depth data exploration,
visualization, and hypothesis testing.

5. Data Exploration and visual insights

In order to find patterns, trends, and insights that might not be readily apparent from raw
numbers, data exploration entails visually examining the dataset. The purpose of this
project is to better understand the behavior of 311 service calls in New York City by
applying visualization techniques utilizing Python libraries like matplotlib, seaborn, and
plotly. Four important findings are found by visual analysis. First, the complaints'
distribution throughout the boroughs reveals which ones have the most service
problems. Second, determining the most common complaint categories aids in
identifying the most prevalent issues among the general public, such as unlawful
parking, noise, or heating issues. Third, to comprehend daily or seasonal patterns in
service requests, the relationship between time and complaint frequency is illustrated.
Fourth, differences in average request closure times by borough and complaint category
indicate regions with higher rates of service delays. Additionally, complaint types are
arranged based on their average response times, categorized by location, and
illustrated in comparative bar or box plots. These visualizations help city planners and
public service departments prioritize issues and optimize response strategies. Overall,
data exploration enhances the interpretability of the dataset and provides a foundation
for more advanced statistical testing.

6. Statistical Testing

Formal hypothesis testing is used in statistical testing to verify hypotheses and find
important links in the dataset. This study runs two important tests. The first test looks
into if there are any notable differences in the average response time
(Request_Closing_Time) between the different kinds of complaints. This is
accomplished by using a one-way ANOVA test, in which the alternative hypothesis (H₁)
proposes that at least one complaint type has a different mean, whereas the null
hypothesis (H₀) posits that all complaint kinds have the same average response time.
The test's p-value aids in deciding whether to accept or reject the null hypothesis. A
statistically significant difference in service response times across complaint kinds is
indicated by a low p-value, which is usually less than 0.05. The second test looks for a
correlation between the type of complaint and the borough in which it was filed. For this,
a Chi-square test of independence is employed. In this case, the alternative hypothesis
(H₁) proposes a link between complaint type and location, whereas the null hypothesis
(H₀) asserts that they are independent. This assertion is evaluated with the aid of the
obtained p-value. These tests offer statistical support for data-driven decision-making in
addition to validating patterns found during data exploration.

7. Conclusions & Recommendations

Through the application of numerous data analytic techniques and statistical tools, this
research offered a thorough approach to examining the NYC 311 Customer Service
Requests dataset. Preparing the data for additional data mining and deriving valuable
insights into the trends of public service requests around New York City were the main
goals.

Data understanding was the initial stage, during which the dataset's structure and
properties were carefully investigated. For analysis, key columns including "Complaint
Type," "Created Date," "Closed Date," and "Borough" were determined to be crucial.
This stage made sure that the subsequent procedures had a solid base. The dataset was
cleaned and converted during the data preparation stage; missing values were dealt with,
extraneous columns were eliminated, and the "Request_Closing_Time" feature was
developed to gauge service responsiveness. These actions greatly enhanced the quality of
the data and prepared it for analysis.

The data analysis step then used correlations, skewness, and summary statistics to
investigate the statistical nature of the data. These metrics highlighted data distribution
and relationships between variables. Following that, data exploration used visualizations
to reveal trends such as the most common complaint types, boroughs with the highest
request volumes, and average resolution times across different areas. Finding service
bottlenecks and efficiency gaps required these insights.

Hypothesis tests were used in the statistical testing phase to see whether the average
response times for the various complaint kinds were statistically significant and whether
the complaint types were associated with certain locations. Valid, fact-based conclusions
from both tests supported the exploratory findings.

All things considered, our investigation showed how Python programming can be used
practically to solve real-world issues, with the help of data wrangling, visualization, and
statistical analysis. City officials can improve response tactics and service delivery for
New Yorkers with the help of the project's insights.

8. Appendix
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns
from scipy import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
plt.rcParams["figure.figsize"]=(10,6); sns.set_theme(style="whitegrid")

# 1 Data Understanding -------------------------------------------------

df = pd.read_csv("311_Service_Requests.csv", low_memory=False)
print("SHAPE:", df.shape); df.info()
print(df.head()); print(df.describe().T)

# 2 Data Preparation ---------------------------------------------------

for c in ["Created Date","Closed Date"]:
df[c]=pd.to_datetime(df[c],errors='coerce')
df["Request_Closing_Time"]=(df["Closed Date"]-df["Created
Date"]).dt.total_seconds()/3600

cols_to_drop=[ 'Agency Name','Incident Address','Street Name','Cross Street

1','Cross Street 2',
'Intersection Street 1','Intersection Street 2','Address Type','Park Facility
Name','Park Borough',
'School Name','School Number','School Region','School Code','School Phone
Number','School Address',
'School City','School State','School Zip','School Not Found','School or Citywide
Complaint',
'Vehicle Type','Taxi Company Borough','Taxi Pick Up location','Bridge Highway
Name',
'Bridge Highway Direction','Road Ramp','Bridge Highway Segment','Garage Lot
Name','Ferry Direction',
'Ferry Terminal Name','Landmark','X Coordinate (State Plane)','Y Coordinate
(State Plane)','Due Date',
'Resolution Action Updated Date','Community Board','Facility Type','Location' ]
df.drop(columns=[c for c in cols_to_drop if c in df.columns], inplace=True)

print("Rows before dropna:", len(df)); df.dropna(inplace=True)

print("Rows after dropna :", len(df))

for c in df.columns: print(f"{c:<25} {df[c].nunique()}")

# 3 Data Analysis ------------------------------------------------------

num=df.select_dtypes(np.number)
summary=pd.DataFrame({"sum":num.sum(),"mean":num.mean(),"std":num.std(),
"skew":num.skew(),"kurtosis":num.kurtosis()})
print(summary); corr=num.corr(); sns.heatmap(corr); plt.show()

# 4 Data Exploration ---------------------------------------------------

top10=df["Complaint Type"].value_counts().nlargest(10)
sns.barplot(y=top10.index,x=top10.values).set(title="Top 10 Complaint Types");
plt.savefig("fig1_top10.png"); plt.clf()

borough=df["Borough"].value_counts()
sns.barplot(x=borough.index,y=borough.values).set(title="Requests by
Borough"); plt.savefig("fig2_borough.png"); plt.clf()

sns.histplot(df["Request_Closing_Time"],bins=100,kde=True)
plt.xlim(0,df["Request_Closing_Time"].quantile(0.99)); plt.title("Distribution of
Closing Time"); plt.savefig("fig3_closing_time.png"); plt.clf()
df["MonthCreated"]=df["Created Date"].dt.to_period("M").dt.to_timestamp()
df.groupby("MonthCreated").size().plot(); plt.title("Requests Over Time");
plt.savefig("fig4_trend.png"); plt.clf()

pivot=(df.groupby(["Borough","Complaint Type"])
["Request_Closing_Time"].mean().reset_index())
sns.barplot(data=pivot.sort_values("Request_Closing_Time",ascending=False).h
ead(30),
x="Request_Closing_Time",y="Complaint Type",hue="Borough")
plt.title("Slowest Complaint-Type/Borough Combos");
plt.savefig("fig5_slowest_combos.png"); plt.clf()

# 5 Statistical testing ------------------------------------------------

keep=df["Complaint Type"].value_counts()[lambda s:s>=500].index
anova=[df.loc[df["Complaint Type"]==ct,"Request_Closing_Time"] for ct in keep]
f,p=stats.f_oneway(*anova); print("ANOVA F,p:",f,p)
if p<0.05:
tukey=pairwise_tukeyhsd(endog=df[df["Complaint Type"].isin(keep)]
["Request_Closing_Time"],
groups=df[df["Complaint Type"].isin(keep)]["Complaint
Type"],alpha=0.05)
print(tukey.summary())

top=df["Complaint Type"].value_counts().nlargest(10).index
table=pd.crosstab(df.loc[df["Complaint Type"].isin(top),"Complaint
Type"],df["Borough"])
chi,chi_p,dof,exp=stats.chi2_contingency(table)
print("Chi²,p,dof:",chi,chi_p,dof)
# ----------------------------------------------------------------------
print(" Finished – figures saved, console shows stats.")

IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Alpha Pro 2 - Curve
No ratings yet
Alpha Pro 2 - Curve
152 pages
Autocad Land Desktop Tutorial: Lesson 1
No ratings yet
Autocad Land Desktop Tutorial: Lesson 1
288 pages
Q3FY25-Investor-Presentation
No ratings yet
Q3FY25-Investor-Presentation
44 pages
Group 3-Report Mgt555
No ratings yet
Group 3-Report Mgt555
19 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
3.2 GaussElimination+Pivoting
No ratings yet
3.2 GaussElimination+Pivoting
61 pages
23049277 Mohammad Sohail
No ratings yet
23049277 Mohammad Sohail
30 pages
SEM R Notes 2019 3
No ratings yet
SEM R Notes 2019 3
172 pages
SourceCode&Writeup_Comcast telecom
No ratings yet
SourceCode&Writeup_Comcast telecom
28 pages
Exploratory Data Analysis, Inference, Interpretation
No ratings yet
Exploratory Data Analysis, Inference, Interpretation
45 pages
Criminal Justice Statistics: Essential Methods
From Everand
Criminal Justice Statistics: Essential Methods
Sandeep Krishnamurthy
No ratings yet
Final
No ratings yet
Final
32 pages
Consumer Complaint Analysis - Debershi Analysis
No ratings yet
Consumer Complaint Analysis - Debershi Analysis
46 pages
Doc3_merged
No ratings yet
Doc3_merged
16 pages
Top 10 Production-Grade Reusable PySpark Scripts for Data Engineers _ by Mayurkumar Surani _ May, 2025 _ Medium
No ratings yet
Top 10 Production-Grade Reusable PySpark Scripts for Data Engineers _ by Mayurkumar Surani _ May, 2025 _ Medium
14 pages
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Tutorial RegressionAnalysis
No ratings yet
Tutorial RegressionAnalysis
23 pages
Ca2 2023034
No ratings yet
Ca2 2023034
26 pages
cover2
No ratings yet
cover2
31 pages
Antiderivatives, Differential Equations, and Slope Fields
No ratings yet
Antiderivatives, Differential Equations, and Slope Fields
15 pages
PFDA_Khalil_Mirza_TP053846.docx
No ratings yet
PFDA_Khalil_Mirza_TP053846.docx
39 pages
Dbms
No ratings yet
Dbms
15 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Python Comcast Telecom Consumer Complaints Final Project
100% (2)
Python Comcast Telecom Consumer Complaints Final Project
44 pages
Appointment Confirmation · Customer Self-Service (3)
No ratings yet
Appointment Confirmation · Customer Self-Service (3)
6 pages
Customer Service Requests Analysis - Solution
No ratings yet
Customer Service Requests Analysis - Solution
11 pages
Modified PPT Review 2
No ratings yet
Modified PPT Review 2
16 pages
Name of Service Provider Circle/Service Area Quarter Ending: Bharti Airtel AP Sep-21 A. Tariff Reporting (Postpaid)
No ratings yet
Name of Service Provider Circle/Service Area Quarter Ending: Bharti Airtel AP Sep-21 A. Tariff Reporting (Postpaid)
2 pages
1-s2.0-S187705091932126X-main
No ratings yet
1-s2.0-S187705091932126X-main
8 pages
Unit 2 FDS
No ratings yet
Unit 2 FDS
13 pages
Ponents of System Unit
No ratings yet
Ponents of System Unit
23 pages
Product Development Report Mingie
No ratings yet
Product Development Report Mingie
13 pages
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
No ratings yet
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
15 pages
Project 02 Customer Service Requests Analysis Caltech
No ratings yet
Project 02 Customer Service Requests Analysis Caltech
19 pages
Hangover Tamil Dubbed Bad Words Full 107
0% (1)
Hangover Tamil Dubbed Bad Words Full 107
4 pages
What Is Python?
No ratings yet
What Is Python?
4 pages
AERO2598 Matlab Assignment 1-1 Slides
No ratings yet
AERO2598 Matlab Assignment 1-1 Slides
16 pages
A - Statistics Paper
No ratings yet
A - Statistics Paper
9 pages
Comcast Analysis
No ratings yet
Comcast Analysis
8 pages
Assignment 1 - Rev
No ratings yet
Assignment 1 - Rev
8 pages
Customer Services Request Analysis: Pandas PD Numpy NP Matplotlib MPL Matplotlib
No ratings yet
Customer Services Request Analysis: Pandas PD Numpy NP Matplotlib MPL Matplotlib
14 pages
R Assignment
No ratings yet
R Assignment
5 pages
Enhancing The Government Accounting Information Sys - 2023 - International Journ
No ratings yet
Enhancing The Government Accounting Information Sys - 2023 - International Journ
19 pages
Homework 04
No ratings yet
Homework 04
2 pages
Statistics and the Law
From Everand
Statistics and the Law
Pasquale De Marco
No ratings yet
Data Science With R - Comcast Telecom Consumer Complaints
No ratings yet
Data Science With R - Comcast Telecom Consumer Complaints
11 pages
Mitola Cognitive Radio Thesis
100% (3)
Mitola Cognitive Radio Thesis
5 pages
all code explanations
No ratings yet
all code explanations
8 pages
Dbscan Cluster
No ratings yet
Dbscan Cluster
7 pages
Research Article On Big Data
No ratings yet
Research Article On Big Data
15 pages
Customer Service Requests Analysis PDF
93% (15)
Customer Service Requests Analysis PDF
2 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Comcast Project
No ratings yet
Comcast Project
8 pages
Set 2
No ratings yet
Set 2
3 pages
Customer Analysis Project1
No ratings yet
Customer Analysis Project1
6 pages
RegressionAnalysisTutorial ArcGIS10
No ratings yet
RegressionAnalysisTutorial ArcGIS10
24 pages
HTML Css Notes
No ratings yet
HTML Css Notes
7 pages
DVR and NVR Network Setup
No ratings yet
DVR and NVR Network Setup
7 pages
Project 1: Comcast Telecom Consumer Complaints
No ratings yet
Project 1: Comcast Telecom Consumer Complaints
7 pages
Group9 - NYC311 Analysis
No ratings yet
Group9 - NYC311 Analysis
6 pages
User Model 2
No ratings yet
User Model 2
6 pages
Project 1 - Comcast Telecommunication
No ratings yet
Project 1 - Comcast Telecommunication
1 page
Data Science With R: Project 2: Comcast Telecom Consumer Complaints
No ratings yet
Data Science With R: Project 2: Comcast Telecom Consumer Complaints
11 pages
судник даниэль М3312 лаб4-2
No ratings yet
судник даниэль М3312 лаб4-2
4 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
VCF Ems Deployment Parameter
No ratings yet
VCF Ems Deployment Parameter
11 pages
Project: Data Science For Social Good: Crime Study Mohammad Osama
No ratings yet
Project: Data Science For Social Good: Crime Study Mohammad Osama
4 pages
Secure Bootloader Implementation: Application Note
No ratings yet
Secure Bootloader Implementation: Application Note
5 pages
My Project Customer Service Requests Analysis
No ratings yet
My Project Customer Service Requests Analysis
3 pages
Count Data Analysis: A Comprehensive Guide
From Everand
Count Data Analysis: A Comprehensive Guide
Pasquale De Marco
No ratings yet
Vademecum de Auriculoterapia Ebook - Free of Registration
No ratings yet
Vademecum de Auriculoterapia Ebook - Free of Registration
2 pages
Travel and Tourism Management System - Final Documentation
No ratings yet
Travel and Tourism Management System - Final Documentation
150 pages
Calculating Request - Closing - Time in Terms of HRS: Most Common Complain in NYC
No ratings yet
Calculating Request - Closing - Time in Terms of HRS: Most Common Complain in NYC
1 page
Secondary Dynamics of Data Reviews
From Everand
Secondary Dynamics of Data Reviews
Pasquale De Marco
No ratings yet
Turnstile Design Sheet
No ratings yet
Turnstile Design Sheet
6 pages
Matlab 4
No ratings yet
Matlab 4
4 pages
FIT5196-S2-2020 Assessment 2
No ratings yet
FIT5196-S2-2020 Assessment 2
4 pages
Factoring Patterns
No ratings yet
Factoring Patterns
3 pages
Simatic St80 STPC Complete English 2024
No ratings yet
Simatic St80 STPC Complete English 2024
1,026 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Introduction to Information Quality
From Everand
Introduction to Information Quality
Craig Fisher
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Internet Free
0% (1)
Internet Free
6 pages
DDL, DML, DCL, DQL, TCL 1
No ratings yet
DDL, DML, DCL, DQL, TCL 1
13 pages
Oracle-Fusion-Cloud-Technical Sample Resume-2
No ratings yet
Oracle-Fusion-Cloud-Technical Sample Resume-2
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Final SSD

Uploaded by

Final SSD

Uploaded by

1.

5. Data Exploration and visual insights

7. Conclusions & Recommendations

# 1 Data Understanding -------------------------------------------------

# 2 Data Preparation ---------------------------------------------------

cols_to_drop=[ 'Agency Name','Incident Address','Street Name','Cross Street

print("Rows before dropna:", len(df)); df.dropna(inplace=True)

for c in df.columns: print(f"{c:<25} {df[c].nunique()}")

# 3 Data Analysis ------------------------------------------------------

# 4 Data Exploration ---------------------------------------------------

# 5 Statistical testing ------------------------------------------------

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.