Crime - Data-Mining-And-K-Means 2018

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Published by : International Journal of Engineering Research & Technology (IJERT)

http://www.ijert.org ISSN: 2278-0181


Vol. 7 Issue 02, February-2018

Crime Detection Technique Using Data Mining


and K-Means
1
Khushabu A. Bokde, 2Tiksha P. Kakade, Prof. Deepa Bhattacharya
3
Dnyaneshwari S. Tumsare , 4Chetan G. Wadhai Assistant Professor
B. E. Student Department of CSE, Ballarpur Institute of Technology,
Department of CSE, Ballarpur Institute of Technology, Ballarpur, Chandrapur District, Maharashtra, India.
Ballarpur, Chandrapur District, Maharashtra, India.

Abstract—Crimes will somehow influence organizations and B. Clustering


institutions when occurred frequently in a society. Thus, it seems Division of a set of data or objects to a number of clusters is
necessary to study reasons, factors and relations between called clustering. Thereby, a cluster is composed of a set of
occurrence of different crimes and finding the most appropriate similar data which behave same as a group. It can be said that
ways to control and avoid more crimes. The main objective of this
paper is to classify clustered crimes based on occurrence
the clustering is equal to the classification, with only difference
frequency during different years. Data mining is used extensively that the classes are not defined and determined in advance, and
in terms of analysis, investigation and discovery of patterns for grouping of the data is done without supervision [2].
occurrence of different crimes. We applied a theoretical model
based on data mining techniques such as clustering and C. Clustering by K-means Algorithm
classification to real crime dataset recorded by police in England K-means is the simplest and most commonly used partitioning
and Wales within 1990 to 2011. We assigned weights to the algorithm among the clustering algorithms in scientific and
features in order to improve the quality of the model and remove industrial software [3] [4] [5]. Acceptance of the K-means is
low value of them. The Genetic Algorithm (GA) is used for mainly due to its being simple. This algorithm is also suitable
optimizing of Outlier Detection operator parameters using
RapidMiner tool.
for clustering of the large datasets since it has much less
computational complexity, though this complexity grows
Keywords— Crime; Clustering; K-Means Algorithm; linearly by increasing of the data points [5]. Beside simplicity
of this technique, it however suffers from some disadvantages
I. INTRODUCTION such as determination of the number of clusters by user,
A. Crime Analysis affectability from outlier data, high-dimensional data, and
Today, collection and analysis of crime-related data are sensitivity toward centers for initial clusters and thus
imperative to security agencies. The use of a coherent method possibility of being trapped into local minimum may reduce
to classify these data based on the rate and location of efficiency of the K-means algorithm [6].
occurrence, detection of the hidden pattern among the
committed crimes at different times, and prediction of their II. LETRATURE REVIEW
future relationship are the most important aspects that have to J. Agarwal, R. Nagpal and R. Sehgal in [1] have analyzed
be addressed. crime and considered homicide crime taking into account the
In this regard, the use of real datasets and presentation of a corresponding year and that the trend is descending from 1990
suitable framework that does not be affected by outliers should to 2011. They have used the k-means clustering technique for
be considered. Preprocessing is an important phase in data extracting useful information from the crime dataset using
mining in which the results are significantly affected by RapidMiner tool because it is solid and complete package with
outliers. Thus, the outlier data should be detected and flexible support options. Figure1 shows the proposed system
eliminated though a suitable method. Optimization of Outlier architecture.
Detection operator parameters through the GA and definition Priyanka Gera and Dr. Rajan Vohra in [11] have used a linear
of a Fitness function are both based on Accuracy and regression for prediction the occurrence of crimes in Delhi
Classification error. The weighting method was used to (India). They review a dataset of the last 59 years to predict
eliminate low-value features because such data reduce the occurrence of some crimes including murder, burglary, robbery
quality of data clustering and classification and, consequently, and etc. Their work will be helpful for the local police stations
reduce the prediction accuracy and increase the classification in decision making and crime supervision.
error. ―After training systems will predict data values for next
The main purposes of crime analysis are mentioned below [1]: coming fifteen years. The system is trained by applying linear
• Extraction of crime patterns by crime analysis and based regression over previous year data. This will produce a formula
on available criminal information, and squared correlation( ).
The formula is used to predict values for comong future years.
• Prediction of crimes based on spatial distribution of
The coefficent of determination, , is useful because is gives the
existing data and prediction of crime frequency using
proportion of variance of one variable that is predictable from
various data mining techniques,
• Crime recognition.

IJERTV7IS020110 www.ijert.org 223


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 7 Issue 02, February-2018

other variable.‖ Figure 2 shows the proposed system


architecture.
In [12] an integrated system called PrepSearch have proposed
by L. Ding et al. It has been combined using two separate
categories of visualization tools: providing the geographic
view of crimes and visualization ability for social networks.
―It will take a given description of a crime, including its
location, type, and the physical description of suspects
(personal characteristics) as input.
To detect suspects, the system will process these inputs
through four integrated components: geographic profiling,
social network analysis, crime patterns and physical matching.‖ Fig. 3. System design and process of PrepSearch [12]
Figure 3 shows the system design and process of PrepSearch.
In [13] researches have introduced intelligent criminal
identification system called ICIS which can potentially
distinguish a criminal in accordance with the observations
collected from the crime location for a certain class of crimes.
The system uses existing evidences in situations for identifying
a criminal by clustering mechanism to segment crime data in to
subsets, and the Nave Bayesian classification has used for
identifying possible suspect of crime incidents. ICIS has been
used the communication power of multi agent system for
increasing the efficiency in identifying possible suspects. In
order to describe the system ICIS is divided to user interface,
managed bean, multi agent system and database. Oracle
Database is used for implementing of database, and
identification of crime patterns has been implemented using
Java platform.
In [14] an improved method of classification algorithms for
crime prediction has proposed by A. Babakura, N. Sulaiman
and M. Yusuf. They have compared Naïve Bayesian and Back
Propagation (BP) classification algorithms for predicting crime
category for distinctive state in USA. In the first step phase, the
model is built on the training and in the second phase the
model is applied. The performance measurements such as
Accuracy, Precision and Recall are used for comparing of the
classification algorithms. The precision and recall remain the
same when BP is used as a classifier.
In [15] researches have introduced crime analysis and
prediction using data mining. They have proposed an approach
between computer science and criminal justice to develop a
data mining procedure that can help solve crimes faster. Also
they have focused on causes of crime occurrence like criminal
background of offender, political, enmity and crime factors of
Fig. 1. Flow chart of crime analysis [1]
each day. Their method steps are data collection, classification,
pattern identification, prediction and visualizatiztion.
III. EXISTING SYSTEM
Data mining in the study and analysis of criminology can be
categorized into main areas, crime control and crime
suppression. De Bruin et. al. [1] introduced a framework for
crime trends using a new distance measure for comparing all
individuals based on their profiles and then clustering them
accordingly. Manish Gupta et. al. [2]. highlights the existing
systems used by Indian police as e-governance initiatives and
also proposes an interactive query based interface as crime
analysis tool to assist police in their activities. He proposed
interface which is used to extract useful information from the
vast crime database maintained by National Crime Record
Fig. 2. Predicting future crime trends [11] Bureau (NCRB) and find crime hot spots using crime data

IJERTV7IS020110 www.ijert.org 224


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 7 Issue 02, February-2018

mining techniques such as clustering etc. The effectiveness of IV. PROPOSED SYSTEM
the proposed interface has been illustrated on Indian crime Architecture
records. Nazlena Mohamad Ali et al.[3] discuss on a After literature review there is need to used an open source
development of Visual Interactive Malaysia Crime News data mining tool which can be implemented easily and analysis
Retrieval System (i-JEN) and describe the approach, user can be done easily. So here crime analysis is done on crime
studies and planned, the system architecture and future plan. dataset by applying k means clustering algorithm using
Their main objectives were to construct crime-based event; rapid miner tool.
investigate the use of crime based event in improving the The procedure is given below:
classification and clustering; develop an interactive crime news 1. First we take crime dataset
retrieval system; visualize crime news in an effective and 2. Filter dataset according to requirement and create new
interactive way; integrate them into a usable and robust system dataset which has attribute according to analysis to be done
and evaluate the usability and system performance and the 3. Open rapid miner tool and read excel file of crime dataset
study will contribute to the better understanding of the crime and apply “Replace Missing value operator” on it and execute
data consumption in the Malaysian context as well as the operation
developed system with the visualization features to address 4. Perform “Normalize operator” on resultant dataset and
crime data and the eventual goal of combating the crimes execute operation
.Sutapat Thiprungsri [4] examines the application of cluster 5. Perform k means clustering on resultant dataset formed after
analysis in the accounting domain, particularly discrepancy normalization and execute operation
detection in audit. The purpose of his study is to examine the 6. From plot view of result plot data between crimes and get
use of clustering technology to automate fraud filtering during required cluster
an audit. He used cluster analysis to help auditors focus their 7. Analysis can be done on cluster formed.
efforts when evaluating group life insurance claims. A. Malathi
et al.[5] look at the use of missing value and clustering
algorithm for a data mining approach to help predict the crimes Take crime dataset
patterns and fast up the process of solving crime. Malathi. A et.
al.[6] used a clustering/classify based model to anticipate crime
trends. The data mining techniques are used to analyze the city Filter dataset according to
crime data from Police Department. The results of this data requirement
Perform k means clustering on
mining could potentially be used to lessen and even prevent
resultant dataset and execute
crime for the forth coming years.Dr. S. Santhosh Baboo and Perform Normalization operator
Malathi. A [7] research work focused on developing a crime Open
on Rapiddataset
resultant miner and
tool execute
and read
analysis tool for Indian scenario using different data mining Applyexcel file ofMissing
Replace crime dataset
Value
techniques that can help law enforcement department to Perform
operator kandmeans clustering on
execute
efficiently handle crime investigation. resultant
Open Rapid dataset andtool
miner execute
and read
Perform
excel fileNormalization
of crime datasetoperator
onApply Replace
resultant Missing Value
The proposed tool enables agencies to easily and economically Filter datasetdataset and execute
according to
Apply operator and execute
Replace Missing Value
clean, characterize and analyze crime data to identify requirement
operator
Perform and
crimeexecute
analysis on cluster
actionable patterns and trends .Kadhim B. Swadi Al-Janabi [8] Open Rapid miner tool and read
formed
presents a proposed framework for the crime and criminal data excel file of crime dataset
analysis and detection using Decision tree Algorithms for data Perform Normalization operator
Filter dataset according to
on resultant dataset and execute
classification and Simple K Means algorithm for data requirement
clustering. The paper tends to help specialists in discovering Perform crime analysis on cluster
patterns and trends, making forecasts, finding relationships and formed
possible explanations, mapping criminal networks and Perform k means clustering on
identifying possible suspects. Aravindan Mahendiran et al. [9] resultant dataset and execute
apply myriad of tools on crime data sets to mine for
information that is hidden from human perception. With the
help of state of the art visualization techniques we present the
patterns discovered through our algorithms in a neat and Perform plot view and get cluster
intuitive way that enables law enforcement departments to
channelize their resources accordingly. Sutapat
Thiprungsri[10] examine the possibility of using clustering Perform crime analysis on
technology for auditing. Automating fraud filtering can be of cluster formed
great value to continuous audits. The objective of their study is
to examine the use of cluster analysis as an alternative and Fig 1: Flow chart of crime analysis
innovative anomaly detection technique in the wire transfer
system. K. Zakir Hussain et al. [11] tried try to capture years of
human experience into computer models via data mining and
by designing a simulation model.

IJERTV7IS020110 www.ijert.org 225


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 7 Issue 02, February-2018

V. CONCLUSION AND FUTURE SCOPE


This project focuses on crime analysis by implementing
clustering algorithm on crime dataset using rapid miner tool
and here we do crime analysis by considering crime homicide
and plotting it with respect to year and got into conclusion that
homicide is decreasing from 1990 to 2011 .From the clustered
results it is easy to identify crime trend over years and can be
used to design precaution methods for future
From the encouraging results, we believe that crime data
mining has a promising future for increasin the effectiveness
and efficiency of criminal and intelligence analysis. Visual and
intuitive criminal and intelligence investigation techniques can
be developed for crime pattern. As we have applied clustering
technique of data mining for crime analysis we can also
perform other techniques of data mining such as classification.
Also we can perform analysis on various dataset such as
enterprise survey dataset, poverty dataset, aid effectiveness
dataset, etc.

VI.REFERENCES
[1] J. Agarwal, R. Nagpal, and R. Sehgal, ―Crime analysis using k-means
clustering,‖ International Journal of Computer Applications, Vol. 83 –
No4, December 2013.
[2] J. Han, and M. Kamber, ―Data mining: concepts and techniques,‖ Jim
Gray, Series Editor Morgan Kaufmann Publishers, August 2000.
[3] P. Berkhin, ―Survey of clustering data mining techniques,‖ In: Accrue
Software, 2003.
[4] W. Li, ―Modified k-means clustering algorithm,‖ IEEE Congress on
Image and Signal Processing, pp. 616- 621, 2006.
[5] D.T Pham, S. Otri, A. Afifty, M. Mahmuddin, and H. Al-Jabbouli,
―Data clustering using the Bees algorithm,‖ proceedings of 40th CRIP
International Manufacturing Systems Seminar, 2006.
[6] J. Han, and M. Kamber, ―Data mining: concepts and techniques,‖ 2nd
Edition, Morgan Kaufmann Publisher, 2001.
[7] S. Joshi, and B. Nigam, ―Categorizing the document using multi class
classification in data mining,‖ International Conference on Computational
Intelligence and Communication Systems, 2011.
[8] T. Phyu, ―Survey of classification techniques in data mining,‖
Proceedings of the International Multi Conference of Engineers and
Computer Scientists Vol. IIMECS 2009, March 18 - 20, 2009, Hong
Kong.
[9] S.B. Kim, H.C. Rim, D.S. Yook, and H.S. Lim, ―Effective Methods for
Improving Naïve Bayes Text Classifiers,‖ In Proceeding of the 7th
Pacific Rim International Conference on Artificial Intelligence,
Vol.2417, 2002.
[10] S. Sindhiya, and S. Gunasundari, ―A survey on Genetic algorithm based
feature selection for disease diagnosis system,‖ IEEE International
Conference on Computer Communication and Systems(ICCCS), Feb 20-
21, 2014, Chermai, INDIA.
[11] P. Gera, and R. Vohra, ―Predicting Future Trends in City Crime Using
Linear Regression,‖ IJCSMS (International Journal of Computer Science
& Management Studies) Vol. 14, Issue 07Publishing Month: July 2014.
[12] L. Ding et al., ―PerpSearch: an integrated crime detection system,‖ 2009
IEEE 161-163 ISI 2009, June 8-11, 2009, Richardson, TX, USA.
[13] K. Bogahawatte, and S. Adikari, ―Intelligent criminal identification
system,‖ IEEE 2013 The 8th International Conference on Computer
Science & Education (ICCSE 2013) April 26-28, 2013. Colombo, Sri
Lanka.
[14] A. Babakura, N. Sulaiman, and M. Yusuf, ―Improved method of
calssification algorithms for crime prediction,‖ International Symposium
on Biometrics and Security Technologies (ISBAST) IEEE 2014. [15] S.
Sathyadevan, and S. Gangadharan, ―Crime analysis and prediction
using data mining,‖ IEEE 2014.

IJERTV7IS020110 www.ijert.org 226


(This work is licensed under a Creative Commons Attribution 4.0 International License.)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy