It's Not The Algorithm, It's The Data: in Risk Assessment and Predictive Policing, Biased Data Can Yield Biased Results
It's Not The Algorithm, It's The Data: in Risk Assessment and Predictive Policing, Biased Data Can Yield Biased Results
C
RI M E I N T H E U.S. has fallen
dramatically over the past
three decades, with 2014 sta-
tistics from the Federal Bu-
reau of Investigation (FBI)
noting the number of violent crimes
committed per 100,000 people in 2013
(368) was less than half the level seen in
1991 (758).
Nevertheless, the debate continues
over how to maintain these lower crime
rates while addressing issues of fair-
ness in the way communities are po-
liced, as well as how to effectively and
fairly use risk-assessment tools that
can be relied upon by sentencing
courts or parole boards.
There are two primary issues at
stake: risk-assessment algorithms,
which weigh a variety of factors relat-
ed to recidivism, or the likelihood an
individual will commit another crime
and wind up back behind bars; and
predictive policing, which has been
described as using data analytics and
algorithms to better pinpoint where Predictive policing systems identify “hotspots” where crime risk is the highest.
and when a crime might occur, so po-
lice resources can be more efficiently are either answered by defendants or tively and achieved satisfactory pre-
deployed. Both issues are fraught pulled from criminal records, and dictive accuracy,” with an accuracy
with challenges—moral, logistical, whether such tools may ultimately pe- rate of 0.71 AUC (area under curve)
and political—and opinions on nalize racial minorities by overpredict- value (the optimal AUC value is 1.0,
whether they can be fairly and ethi- ing the likelihood of recidivism in which would indicate no false posi-
cally utilized largely depend on how these groups. tives/all true positives were identi-
one views the nature of policing and The most widely known of these fied).
the criminal justice system. tools is COMPAS (Correctional Of- The report noted actual and expected
IMAGE BY BIRGIT TE BL ANDH OEL, C OURTESY OF PACITA PROJ ECT EU
There is no debate that both of these fender Management Profiling for Al- rates for any re-arrest were closely
types of technologies are being used on ternative Sanctions), a software tool aligned across scores, and that the
a fairly widespread basis in the U.S. Ac- owned by Northpointe, Inc., which tool was more effective with higher-
cording to a 2013 article published by has been used by a number of juris- risk cases (53.8% re-arrest rate for
Sonja B. Starr, a professor of law at the dictions, including Broward County, those deemed high-risk by the tool,
University of Michigan Law School, FL, the State of New York, the State of versus 16.9% for those deemed low-
nearly every state has adopted some Wisconsin, and the State of Califor- risk by the tool).
type of risk-based assessment tools to nia, among others. The tool is seen as Nevertheless, in recent years, there
aid in sentencing. The primary con- a success by many jurisdictions, such has been significant criticism from
cern related to these tools revolves as New York State, which issued a many in academia and a scathing in-
around the use of computerized algo- 2012 report highlighting the effective- vestigative analysis from ProPublica
rithms, which provide risk scores ness of the recidivism scale, noting, (whose website describes it as “an in-
based on the result of questions that “the Recidivism Scale worked effec- dependent, non-profit newsroom that
F E B R UA RY 2 0 1 7 | VO L. 6 0 | N O. 2 | C OM M U N IC AT ION S OF T HE ACM 21
news
data with more recent data to create a and solid community policing strate-
more accurate crime model and fore- gies to reduce crime.
cast, as opposed to simply relying on Critics say these “Departments that adopt predictive-
older data that may not be reflective of tools are inherently policing programs must at the same
more recent activity. time re-emphasize their commitment
Also, Seals says, CommandCentral biased since they rely to community policing,” Bachner
introduces into the algorithm the con- on reported crimes wrote. “Officers won’t achieve substan-
cept of seasonality, which addresses tial reductions in crime by holding up
crime patterns when temperatures rise data, which is often in patrol cars, generating real-time hot-
or fall, further improving the granulari- concentrated in spot maps. Effective policing still re-
ty of the algorithm. Nonetheless, Seals quires that officers build trust with the
agrees CommandCentral is a tool to heavily policed areas, communities they serve.”
help officers, not a replacement for the skewing statistics Most importantly, the tools put in
judgement of experienced officers. place must be used. A RAND Corpora-
“It takes a seasoned officer to look to overrepresent the tion study focused on a predictive-
at the data, and say, ‘hey, I know what poor and minorities. policing pilot program deployed in
that is,’” Seals says. “It may be seem- 2013 and 2014 by the Chicago Police
ingly benign, but to that seasoned offi- Department called Strategic Subjects
cer who knows the patterns, who List, which examined data on people
knows the persons in that area, that with arrest records and generated a list
sounds like ‘Bob.’ ‘Bob used to do that, of several hundred individuals deemed
and Bob just got out [of prison.]’” at elevated risk of being shot or com-
Critics, however, say tools such as CommandCentral does not just rely on mitting a shooting.
PredPol and CommandCentral are in- data from years ago. “As we get closer to While an analysis of the program
herently biased since they rely heavily the time we’re predicting, we actually found that people on the list were near-
on reported crimes data, which is of- crunch another shorter term [algo- ly three times as likely to be arrested for
ten concentrated in areas that are rithm],” Seals says. What’s more, as it a shooting as those who did not get
heavily policed, thereby skewing sta- employs a learning algorithm, Com- flagged by the system, the system re-
tistics to overrepresent poor or mi- mandCentral will get more accurate sulted in very few arrests. This was due
nority communities. over time, if the system is properly up- the presence of no fewer than 11 other
“We know that we have a history of ra- dated. violence-reduction programs in use at
cially biased policing in the United States, Ultimately, however, “The algorithm the time, so officers simply ignored the
and that has fed into all the data that we itself may not be biased, but the data data, and their superiors did not make
have on where arrests have occurred, used by predictive policing algorithms utilizing the system a priority.
which crimes are more likely to occur in is colored by years of biased police prac-
specific communities, and at which par- tices,” the EFF’s Lynch says, citing gov-
Further Reading
ticular times,” says Jennifer Lynch, senior ernment statistics that up to 15% of ve-
staff attorney at the Electronic Frontier hicle thefts and 65% of rapes or sexual Starr, S. B.
Evidence-Based Sentencing and the
Foundation. “That’s the data that’s being assaults are not reported, and noting
Scientific Rationalization of Discrimination
fed into predictive policing algorithms.” that these non-reported crimes may be (September 1, 2013). Stanford Law Review,
Still, it is difficult to discount the val- occurring in areas that are not neces- Forthcoming; U of Michigan Law & Econ
ue of event-based predictive policing, sarily deemed “high crime.” Research Paper No. 13-014. Available at
which relies on actual data on crimes “An algorithm can only predict crime SSRN: http://ssrn.com/abstract=2318940
that have been committed; ignoring this based on the data it already has,” Lynch New York State COMPAS-Probation Risk
data could result in losing opportunities says. “This means it will continue to pre- and Need Assessment Study: Examining the
Recidivism Scale’s Effectiveness and
to prevent additional criminal acts. dict crime that looks like the crime we
Predictive Accuracy https://www.ncjrs.gov/
“There has been a lot of research on already know about, and will miss App/Publications/abstract.aspx?ID=269445
near-repeat effects in crime,” PredPol’s crimes for which we don’t have data.”
Wisconsin v. Loomis July 2016 Decision:
Mohler says. “If someone breaks into a What’s more, defenders of predictive https://www.wicourts.gov/sc/opinion/
car in a certain neighborhood and is policing admit it must be accompanied DisplayDocument.
successful, they’ll often return to that by better community police outreach pdf?content=pdf&seqNo=171690
same neighborhood a few days later, and transparency, to engender greater Statistics on Non-Reported Crime:
and break into another car.” trust in these types of systems. Writing
Truman, J. and Langton, L.
Systems such as PredPol and Com- in The Wall Street Journal in April 2016, Criminal Victimization. September 29, 2015.
mandCentral likely can spot such trends Jennifer Bachner, director of the master U.S. Department of Justice. http://www.bjs.
more quickly than relying on crunching of science in government analytics pro- gov/content/pub/pdf/cv14.pdf.
historical crime statistics by hand, and gram at Johns Hopkins University and
allow law enforcement to target resourc- author of a paper that supports greater Keith Kirkpatrick is principal of 4K Research &
Consulting, LLC, based in Lynbrook, NY.
es to address specific incidents. use of predictive policing, cited a need
Motorola’s Seals agrees, noting that for both greater technology utilization © 2017 ACM 0001-0782/17/2 $15.00
F E B R UA RY 2 0 1 7 | VO L. 6 0 | N O. 2 | C OM M U N IC AT ION S OF T HE ACM 23