Widm 1208
Widm 1208
Widm 1208
important for crime investigation.5 Extracting the main objective of data mining is learning from struc-
hidden network structures among criminals, and tured or unstructured data, and turning data to action-
inferring their respective roles from criminal data can able knowledge. Statisticians have developed different
help law enforcement and intelligence agencies theories and models to extract knowledge from data.
develop effective strategies to prevent crimes from These models enable us to analyze the links among
taking place. With the big data analytics, key factors variables for developing prediction models, quantifying
for criminal network discovery such as identifying effects, or suggesting casual paths.8
central members and detecting subgroups can be AI is one of the important sub-fields that lays
done by automatically mining social media data from the solid foundation of data mining methods. A large
Twitter and Facebook.6 During the 1980s, different number of AI algorithms have already been devel-
data mining techniques, machine-learning algorithms, oped to enable automated learning from data. These
neural networks, and intelligent agents were designed algorithms are the foundations of building different
for classification, prediction, and profiling of human predictive models for detecting criminal activities,
behavior to meet the objectives of tracking and deter- criminal behavioral profiling, and clustering of crimi-
ring criminals. The applications of these techniques nal data.9 Thus, security organizations, police depart-
have demonstrated that automatic analysis of crime ments, and intelligence agencies now rely on different
trends and criminals’ behavioral patterns without data mining techniques to detect and deter crime and
requiring the constant intervention of humans for terrorism. Several breakthrough applications have
examining numerous criminal attributes is feasible. already emerged in which link analysis, intelligent
Increasingly criminal activities in the world will agent, text mining, neural networks, and ML are
become more digitized in nature, and so most crimi- being used for criminal data analysis to prevent
nal monitoring and detection agencies are eager to potential criminal activities.
apply data mining techniques to identify scams.
Moreover, to address the challenge of data analytics
for security and criminal investigation, researchers DATA MINING TECHNIQUES
have explored different data mining techniques in the As the number of organized crimes continues to rise
past few decades. However, a thorough study of data in recent years, law enforcement, and security agen-
analytics for security and criminal detection is miss- cies require new and advanced technologies to fight
ing in the existing literature. One contribution of this this battle. In this section, we identify five cutting-
paper is the systematic evaluation of state-of-the-art edge data mining technologies—link analysis, intelli-
data mining technologies including intelligent agents, gent agents, text mining, neural networks, and ML—
link analysis, text mining, machine learning (ML), that have been used to combat crime. These technol-
and neural networks. Another contribution is the ogies have developed over time to incorporate differ-
critical analysis of big data analytics for security and ent tools and methods. We summarize these five
criminal investigations. After examining some impor- techniques in Table 1 in terms of their tools, applica-
tant aspects of security intelligence discovery from tions, purposes, and challenges.
criminal networks, we propose a framework to man-
age big data under such a context. From a practical
perspective, this paper introduces security managers, Link Analysis
law enforcement, and intelligence agencies, fraud Link analysis is a data mining technique, which
detection specialists, and information security ana- reveals the structure of data by representing it as a
lysts to the latest data mining techniques and shows set of interconnected, linked object or entities.10 Link
how these techniques can be applied to enhance analysis starts with data that are represented as an
crime investigations. interconnected network. By extrapolating the hidden
relational patterns from the entities and their rela-
tionships, investigators, and analysts can discover
useful links among entities. This technique can corre-
DATA MINING
late massive amounts of data about entities in regard
Data mining is a rapidly growing field positioned at to fraud, terrorism, narcotics, and others.11 This data
the intersection of several sub-fields such as statistics, mining technique is the first level of analysis by
database research, high-performance computing, ML, which networks of people, locations, groups, vehi-
and so on. In data mining systems, the mining proce- cles, contact addresses, bank accounts, and other tan-
dure is the fusion of statistical modeling, database stor- gible entities can be explored, assembled, detected,
age, and Artificial Intelligence (AI) techniques.7 The and analyzed (see Figure 1). Linkage data is
visualized as a graph with linked nodes where nodes knowledge into a link analysis approach for measur-
represent suspects of interest for the investigators, ing association strength automatically.11 A criminal
and the link indicates a relationship or transaction link analysis approach is proposed in Ref 15, where
between suspects and criminal artifact. the system uses the time-based relationship, event
Link analysis has been used in different studies similarity, time-related proximity, and document dis-
to analyze social networks over the last two dec- tributional proximity to identify the events of a ter-
ades.12 Criminal link analysis tools visualize relation- rorist attack incident. For a criminal network, the
ships among suspects or offenders usually do not characteristic of a criminal is extracted via some ties
bother too much about the exact mathematical den- and links. The social capital associated with a net-
sity of entities.13 Their objective is mainly to depict work is also one of the key factors that facilitates the
who does what to whom, and with what frequency. outbreaks of crimes. By using social network analysis
Jennifer Schroeder et al. develop a prototype system (SNA), a link analysis tool is introduced in Ref 5. In
named CrimeLink Explorer that enables automated SNA, the link analysis approach is used to extract
link analysis.14 They also incorporate several techni- degree, betweenness, and closeness centralities16 to
ques in their proposed system: the co-occurrence analyze criminal network. Though link analysis is the
analysis approach and a heuristic approach for the best approach for the structural data, it can also be
identification of associations between crime entities, applied to analyze unstructured data by incorporat-
and a shortest path algorithm for association path ing some text mining methods.17
search. In CrimeLink Explorer the heuristic approach On the basis of existing theories and practice,
supports to incorporate investigative domain we find that link analysis can explore associations
among large numbers of entities with multiple attri- Agents can perform tasks delegated by their develo-
butes. Different individual instruments have already pers, and so they are also called task specific autono-
been developed by using these software packages. mous computing systems.18 By using software agent,
For instance, Analyst’s notebook (www.i2.co.uk/ a security model is developed in Ref 19 where game
home.html) assists investigators and analysts by pro- theory is applied to develop security games between
viding faster, more informed decision-making across, a defender and an attacker. Here, the proposed secu-
and inside organizations. It provides a centralized, rity games are bi-level models20 that consider an
aggregated view of information from different attacker’s ability to gather information about the
sources. Recently, a company named i2 has devel- defense strategy before planning an attack. For moni-
oped a software that can be used on analysts’ note- toring bio-terrorist attack, an agent based real-time
books to assist FBI investigations. Another link bio-surveillance system is developed in another
analysis tool is crime link (www.crimelink.com) study.21 A number of statistical methods are used for
which is being used to assist the law enforcement developing the proposed surveillance system. Moreo-
investigators and analysts. ATAC (www. ver, in order to prevent natural hazard, Jaber et al.22
bairsoftware.com/atac.htm), Automated Tactical proposed an automated decision support system
Analysis of Crime, is a criminal data analysis tool incorporating both tele-geoprocessing approach and
that can extrapolate potential criminal patterns and intelligent software concept. The software agents are
trends. Crime Workbench (CWB; www.memex.com/ used in corporative information systems where they
cwbover.html) is another mature link analysis tool collect and manipulate information obtained from
for intelligence management. This tool offers compre- relevant sources to answer the queries posed by other
hensive searching capabilities using the Memex infor- users or agents.23
mation engine. By using intelligent agent technology, the FBI,
and IRS (Internal Revenue Service) have developed
some expert systems where younger agents automati-
Intelligent Agent cally receive helpful advice from experienced agents
Over the past two decades, the concept of agents has who gained experience from previous solutions.24
emerged as a new software paradigm, and it has COPLINK agent is an example of such expert system
become important in both AI and mainstream com- that incorporates intelligent software agent for deli-
puter science. Under a criminal investigation context, vering alert messages through a number of communi-
agents perform the function of software detectives, cation channels including e-mail, and instant
monitoring, identifying, and extracting information messaging.25 Šišlák et al. develop an air traffic con-
for analysis, development, and real-time response. trol system where agent technology supports security
analysts by controlling the flow of air traffic.26 A ver- compares the phrases or sentences for extracting
ification service system is developed in Ref 27, where associations and patterns within a criminal network.
software agents are used as integrated security Using text mining approach, it is possible to identify
agents. Furthermore, a framework of multi-agent deceptive persons based on linguistic features. For
based network security system is developed in instance, in Ref 36 text mining and data mining
another study.28 There are numerous software agent approaches were applied to determine the presence
tools, which are widely used in security investigation of false items in a textual content. For capturing sen-
context. For example, FinCEN is used to detect timent feature, a rule-based system was developed by
money laundering through the FinCEN Artificial Yin, where contextual features were identified by
Intelligence System (FIAS).29 InfoGist is an informa- comparing textual posts to a window of neighboring
tion retrieval software agent tool that can search and textual posts.37 They also incorporated SVM in the
aggregate a set of webpages on the basis of some text mining approach to classify harassing posts on
defined keywords. Doppelgaenger is such an intelli- online social media.
gent tool, developed by the MIT media lab. This NASA proposed a text mining tool called Peri-
agent can evaluate criminal actions, monitor criminal log, which can retrieve, and organize contextual rele-
behavior, and create new rules dynamically and then vant data from any sequence of terms.38 This tool
update these rules by herself. The agent also can alert was originally developed to support the FAA’s Avia-
investigators when a large irregularity is identified. tion Safety Reporting System (ASRS).39 This text
mining tool was used to investigate the dominant
cause of airline crashes. Autonomy (www.autonomy.
Text Mining com) is another text mining tool which provides a
Text mining, also known as text data mining30 or unified view of disparate data sources. Autonomy
knowledge discovery31 from unstructured datasets; it offers a user profiling system which can automati-
refers to the process of extracting relevant informa- cally identify a user’s expertise through analyzing
tion and knowledge from textual documents. Text work patterns, records, and e-mail contents. Auton-
mining explores the key concepts or themes in docu- omy is used by UK police forces to categorize and
ments rather than any specific keywords, and the tag stored criminal data. Police officers also use this
taxonomies help the investigators and analysts to tool to serve as a central police information reposi-
find relevant information in multiple linked docu- tory. Copernic is another text mining tool that com-
ments.32 Text mining tools and techniques offer bines intelligent agent technology for context-aware
appropriate links among many relevant textual infor- retrieval, and text mining for theme aggregation; this
mation for discovering new knowledge to determine tool is an effective forensic instrument for investiga-
the appropriate course of actions. In Ref 32, authors tors and analysts. Clairvoyance (www.claritech.com)
proposed a text mining method to explore criminal is a text analysis tool that applies NLP techniques to
networks from the collected textual documents. The interpret and extract information from textual data
proposed system primarily discovers useful knowl- such as written documents, newspaper reports, and
edge for criminal study, and then visualize the other evidential documents.40 Similarly, ClearForest
extracted criminal network for investigation. Another (www.clearforest.com) is a text mining tool that can
similar research work is conducted in Ref 33, where discover forensic knowledge from unstructured infor-
relationship among criminals are extracted from the mation repositories. It allows investigators and ana-
news articles about crimes. For discovering new lysts to quickly extract and insightful information
knowledge from unstructured documents of police from textual files.
records, a comparative study was performed in
another work34 where usability of emergent self-
organizing map (ESOM) and multi-dimensional Artificial Neural Networks
scaling (MDS) were exploited as text exploration An artificial neural network (ANN) is modeled as an
instruments to assist police investigations. Moreover, aggregation of hundreds of thousands of basic com-
in Ref 35, Lee introduced an information extraction putational units that mimic the function of neurons
method from unstructured police records. He used in a human brain. In ANNs, the functions of artificial
matching phrases or sentences to predefined tem- neurons like learning, remembering, and decision-
plates. This method mainly incorporates Natural making are designed by some software systems that
Language Processing (NLP) techniques for identifying can mimic the cognitive neurological functions of the
the required entities (e.g., individuals, profession, human brain.41 Through executing the learning proc-
places, and time) from collected textual data. It then ess and using programmable memories, ANNs can
predict new trends based on existing samples, and so can effectively apply ANN to assist criminal
they are being used to predict potential criminal pat- investigation.
terns based on observations of current criminal activ- Over the past two decades, a large number of
ities.42 In order to provide possible psychological ANN tools have been developed and applied to differ-
and behavioral profile of an unknown offender, a ent areas. These tools have very intuitive interfaces
neural network approach can provide an investiga- such that users find it easy to apply them to a variety
tive support tool. For example, a ANN-based psy- of applications. Some ANN tools can automatically
chological criminal profiling model is developed in adjust their internal structures with respect to a
Ref 43; the ANN model can conduct a sophisticated dynamic environment.52 Attrasoft (www.attrasoft.
link analysis than traditional database oriented com) image finder is an image and facial recognition
approach. ANNs have been used for entity extrac- ANN tool that can handle unlimited numbers of sus-
tion, where ANN-based algorithms enable investiga- pect photographs at a rate of 600 images per minute.53
tors to identify useful entities from textual data such BrainMaker (www.calsci.com) is a back propagation
as police narrative reports. An ANN-based name neural network tool that has the ability to handle more
entity extraction technique was developed from the extensive automated training and tuning. Moreover,
COPLINK project.44 In recent study,45 an ANN- Ward Systems (www.wardsystems.com) STATISTICIA
based system was developed for shortest path com- Neural Networks, (www.statsoftinc.com), and Pro-
putation enabling the extraction of associations in a Forma (www.proformacorp.com) are also supplemen-
criminal network. tary neural network tools which discover knowledge
The ANN-base shortest path algorithms were from the internal features of large scale criminal data.
applied to identify the strongest association paths
between entities within a criminal network. There are
several common properties, which are frequently visi-
ble in criminal data such as subjectivity, impression, Machine Learning
noise, and incompleteness; these properties are most For a ML approach, a computer system learns like
amenable to neural networks because they are highly what human does based on past experience and dif-
fault-tolerant, and when properly trained, are capa- ferent classes of tasks, and then acquired knowledge
ble of providing better solutions from degraded and is used to make decisions and predictions.54 ML tech-
erroneous raw data.46 Although the complex proper- niques are used to adapt to dynamic and new circum-
ties of criminal data impose constraints on conven- stances and to detect and extrapolate internal and
tional security and criminal investigation techniques, external structures as well.55 In recent years, a large
ANNs can offer a significant contribution to criminal number of research work have appeared in the secu-
network analytics as they can properly integrate the rity and criminal investigation literature, which
elusive qualities of human reasoning with the com- describes the application of ML techniques to iden-
pulsive thoroughness, precise logic, and perfect mem- tify potential anomalies (e.g., financial fraud), and
ory of computer.47,48 By using neural network criminal profiling by means of different inference
concept, an event initiator model was developed in engines.50,56 Moreover, there has been a dramatic
Ref 49 where past behavior demonstrating the prefer- increase in the application of ML methods, tools,
ence of offenders was used to infer both time and and techniques to facilitate diagnostic and prognostic
venue of future crimes. For extracting potential pat- investigation in forensic science.57 ML approaches
tern of serial crime with a high degree of accuracy, offer several new possibilities and find a place in the
Dahbur and Muscarello used an ANN tool for the criminal investigation and forensic science. Based on
classification of criminal patterns.50 They designed a partial criminal data, ML algorithms can predict vital
hybrid system by using neural networks and rule crime patterns from among a large number of crimi-
based heuristic techniques. Though neural networks nal incidents. These data driven tools enable investi-
offer significant support in terms of clustering, classi- gators to better understand the patterns of crimes,
fying, and data summarizing, there are still some leading to more specific attribution of previous
challenges in criminal investigation and forensic crimes and the apprehension of suspects.58 For
study. When processing a large volume of data, detecting specific patterns of crimes that are commit-
ANNs may generate false alarms, which may lead to ted by same offenders or gangsters, a ML algorithm
the wrong arrestment of suspects.51 This type of is proposed in Ref 59; the algorithm can look for
problem is found in all application domains. Hence, similarities between crimes in a growing pattern
new ANN algorithms should be developed such that learning fashion from a database, and it tries to iden-
the false alarms can be minimized, and law enforcers tify the behavioral codes that the offender follows.
When more criminal data are fed to this algorithm, study. Though most of the studies use one of five
behavioral codes become more fine-grained. data mining techniques, some studies employ hybrid
ML techniques are also used as an entity techniques. Solution refers to the specific methods or
extractor that can identify useful entities from the theories that are applied to solve the research pro-
police narrative reports.60 Moreover, Yin blems, and data type refers to the type of experimen-
et al. demonstrated that ML approach was more fea- tal data used. Research impact is an important aspect
sible and achieved better document classification for the evaluation of success of any studies.
results with respect to local features, sentiment fea-
tures, and contextual feature.37 They also found that
supervised learning approach could accurately iden- BIG DATA FOR CRIMINAL
tify harassing posts in chat rooms and discussion for- INVESTIGATION
ums. In some expert systems, both ML and neural Big data analytics include numerous state-of-the-art
network techniques are adopted to support police technologies that reshape security intelligence, which
operations. For instance, in research work46 the concerns the discovery of vital security knowledge
AICMS prototype system was proposed to address from large amount of data. While the accessibility of
the operational needs of police who encountered an big data creates different opportunities for police and
increasing number of criminal activities. The ACIMS intelligence agencies, it also brings some challenges to
system is a rule-based system with the support by practitioners and researchers. Existing big data ana-
machine-learning and neural network techniques. lytics is concerned with three important challenges
Another rule-based expert system was developed in storage, management, and processing.4 Based on the
Ref 61, where proposed system used the fuzzy set characteristics of collected data, different methods
theory to improve the effectiveness and the quality of are being applied to discover knowledge about crimi-
the data analysis phase of criminal investigation. Fur- nal networks. When analysis models are developed
thermore, ML approaches deliver quick responses based on a single data source, then analytical results
through analyzing various dimensions of active data may provide limited and biased knowledge. On the
in criminal investigation.62,63 As criminals can inten- other hand, data from multiple sources provide a
tionally alter their methods of operations, this is holistic view of the crime domain and allow more
another great challenge of ML methods to criminal accurate and effective prediction of criminal patterns.
investigation, which should be continuously adaptive Unfortunately, integrating heterogeneous data from
in responding to the evolving crime patterns. For multiple sources to discover criminal network is not
criminal investigation, ML will perform as an adapt- a trivial task. This prompts for the exploration of
ive process rather than just a technology solution. advanced methods, platforms, tools, applications,
AC2 (www.alice-soft.com) is a ML software and frameworks for effective data management in
tool that uses decision tree learning, and it provides the context of security intelligence.
graphical interfaces for both data preparation and
decision tree building. This tool can test all possible
groupings of attributes, and hence to segment and Big Data Management Framework
extract the desired patterns that converge on a In this paper, we propose a framework to manage
selected variable (eg., criminals versus suspects). big data under the context of criminal investigation.
ANGOSS (www.angoss.com) is a predictive analytics For big data analytics, some modules are common
and ML software suite that has some decision tree for every task of a problem domain. For instance, big
components with industrial strength for discovering data extraction, big data transformation, big data
and visualizing links among variables in which the integration, big data analysis are among the common
relationships among criminals are identified. Moreo- modules of a big data analytics system. On the other
ver, Prudsys (www.prudsys.de), Quadstone (www. hand, data sources, methods, and applications are
quadstone.com), and SAS (www.sas.com) are also mainly domain-specific. In the proposed framework,
ML software tools than can be used to analyze crimi- we adopt the R-P (relational-positional) model76 that
nal data for profiling, automated segmentation, and classifies the activities of criminal investigation under
modeling. two broad perspectives namely, relation and position.
Table 2 summaries 30 relevant studies with There are four major steps in our proposed frame-
respect to six dimensions: technology, solution meth- work (see Figure 2). For the first step, data are col-
ods, data type, key contribution, scalability, and lected from multiple sources, and then various big
impact. The dimension of technology refers to the data tools and techniques are applied to transform
collection of techniques applied to the corresponding data from a raw format to a suitable format for
TABLE 2 | A Summary of Previous Studies in Data Mining Enabled Security and Criminal Investigation
Study Technology Solution Method(s) Data Key Contribution Scalability Impacts
11 Link analysis Shortest path algorithm, Police Records Develop a criminal High Facilitating crime
and heuristic method investigation prototype investigation process
system named
‘CrimeLink Explorer’
64 Neural network Named-entity extraction Police narrative Develop an automatic Low Facilitating automatic
method reports entity and link crime investigation
extraction technique
65 Link analysis Semantic analysis and Weblogs data Social network analysis High Improve the weblog
Topic analysis framework social network
analysis
33 Text mining Key word extraction Crime news Presenting an integrated High Offer network mining
algorithm term-relationship through criminal
mining method for news
crime investigation
66 Link analysis Encryption method N/A Develop a privacy High Secure information
protocol sharing for social
media analytics
15 Link analysis Topic detection News data(CNN) Develop an event High Facilitating large scale
evaluation system event evaluation
67 Machine NLP approach Wikipedia(English) Incorporate ML technique High Improve Wiki data
learning for AVD Increase the authenticity and
efficiency of AVD reliability
expert system
5 Link analysis SNA method N/A Presents a tentative N/A Enable thorough
protocol for crime data understanding of
handling and coding criminal behavior
36 Data and text Logistic regression and Criminal data Develop a text-based Low Enable a system to
mining decision tree deception detection detect deception
method approach technique
60 Machine NLP approach Corpus data Develop an online Low Increase effectivity and
learning reporting system efficiency of crime
investigation
68 Text mining NLP and fuzzy matching News data Develop an efficient and High Increases efficiency of
effective surveillance review and
system for controlling investigation system
potential violative
market activity
37 Text mining and Supervised learning Three datasets Introduce contextual and N/A Control harassment in
Machine approach (Kongregate, similarity features in online community
learning Slashdot and harassment Detection
MySpace)
59 Machine Supervised learning housebroken data Develop a crime pattern High Good impact for time
learning method (USA) detection algorithm management to find
crime pattern
56 Machine Bayesian network Structured criminal Develop a learning based Low Support police
learning (BN) model information decision-aid tools investigations
3 Text mining Multi-entity Bayesian Heterogeneous Discovering crime pattern High Reduce and control
network model criminal data in large scale datasets organized crime
69 Intelligent Agent Intelligent agent theory N/A (conceptual Offer a solid framework for High Reduce and prevent
study) anti-money laundering financial crime
system practice.
(continued overleaf )
TABLE 2 | Continued
Study Technology Solution Method(s) Data Key Contribution Scalability Impacts
70 Intelligent Agent Monge’s detection Police records Develop deceptive High Increase efficiency and
method71 criminal detection accuracy in criminal
system investigation
21 Intelligent agent Statistical modeling Respiratory health Develop a statistical High Improve public
data Process Control system healthcare system
for Early Detection of
Bioterrorism
19 Software agent Game theory Police records(arrest) Model security games High Improve transportation
between a defender security and
and an attacker effectivity
72 Intelligent agent RIPPER learning method Calling record Develop a distributed Low Improve operating
and Machine intelligent agent system and network
learning approach for intrusion
detection
32 Text mining and NER model Textual documents Develop criminal High Improve criminal
data mining (pdf, e-mail) community discovery investigation system
method
73 Neural network Adaptive Network Standard spreadsheet Develop an advanced low Improve auditing
models (structured) discriminant system mechanism
45 Link analysis Shortest-path Crime reports Develop a link analysis Low Discovering new era for
algorithms, priority- technique to reveal investigation.
first-search (PFS) strong associations
among entities
47 Neural Network Self-organization map Orion database Develop a text mining High Discovering complex
and Text (SOM) algorithm and approach to expand behavior of the crime
mining Point pattern analysis upon the spatial geography
method analytical capabilities
in criminal
investigation
43 Neural network Inductive and deductive N/A Incorporate neural High Improve criminal
approaches network for investigation process
psychological criminal
profiling
46 Machine Rule base approach Police reports and Develop a prototype High Improving the
learning and housebroken data system to support operation of crime
Neural (Hong Kong) police investigation investigation and
network prevention
74 Machine Unsupervised learning Crime and forensic Develop a novel data High Improve automated
learning algorithm data mining approach to investigation
assist crime scene mechanism
investigator(CSI)
performance
75 Text mining Centrality measure Crime news Examines the social High Control hacking
organization of a activities through
hacker community from revealing hackers’
a network perspective community network
61 Machine Fuzzy inference system Crime data Develop an expert system High Enhancing intelligence-
learning for network forensics based approaches for
(fuzzy system) crime investigation
50 Neural network Kohonen neural Crime data Develop a hybrid High Data itself presents
networks, heuristic Classification system crime pattern (for
processing serial offenders)
subsequent analysis. For the third step, various ana- Big Data Transformation, Platform, and Tools
lytical methods are applied to discover criminal At this stage, all collected data are raw data with het-
knowledge from the pre-processed data. Finally, the erogeneous format. Accordingly, the raw big data
automatically discovered criminal knowledge should be transformed to a suitable format for fur-
(e.g., criminal network) is applied to support criminal ther analysis. Many techniques are available to pre-
investigations. process raw data. In the criminal investigation
domain, various kinds of data can be fed to a big
data analytics platform no matter the data are struc-
Data Resources tured or unstructured format. Hadoop is the most
For studying criminal networks, security intelligence remarkable platform for big data analytics. Hadoop
agencies collect data from various types of data is an open source distributed data processing plat-
resources. For the relational analysis, surveillance logs, form that mainly belongs to the ‘NOSQL’
telephone records, location-based social networks, approaches, which manipulate data without using
financial transaction data, and crime incident reports the classical SQL approach. Hadoop can handle
are the main data resources.2,3 Mostly, to disrupt crim- voluminous datasets through the distribution of data
inal networks, law enforcement, and intelligence agen- to numerous servers (nodes), each of which is
cies utilize voice calling records, bank accounts, and accountable for executing a specific task and then
transaction data.5,77 Recently, different security intelli- synthesizing these intermediate results for the final
gence analysis projects have used textual data solution.79 Though Hadoop is the widely used and
(e.g., e-mail, SMS messages) to find associations and effective platform for big data analytics, it is some-
relations among suspected entities.78 For the positional what challenging to install, configure, and adminis-
analysis, different social media data such as Facebook, ter.80 Moreover, it is also difficult to find individuals
Twitter, LinkedIn, and Blogs have been examined. As with technique skills for Hadoop. Therefore, organi-
positional analysis generally only examines the connec- zations may not be ready to embrace to the Hadoop
tion pattern among network members, location- platform. There are a number of surrounding big
specific pattern is not the focus of investigation.2 data ecosystems with additional platforms and
• Surveillance logs
• Telephone records • Social network data
• Location based social • Multimedia data
Data resources networks • Demographic data
• Financial transaction data
• Crime incident reports
FI G URE 2 | A big data management framework for security and criminal investigation.
TABLE 3 | Tools and Platforms for Big Data Analytics in Security Intelligence
Big data Platforms/Tools Description
The Hadoop distributed File HDFS allows the underlying storage for the Hadoop cluster. It splits the datasets into smaller parts and
System (HDFS) distributes across different servers.82
PIG Pig programming language is configured to integrate heterogeneous data. PIG comprises two key
modules: (1) Language itself called PigLatin, and (2) runtime module, which executes language code.
Hive Hive is a runtime Hadoop base architecture that leverages Structure Query Language (SQL) with the
Hadoop platform.83
MapReduce MapReduce is a programming model that allows distributed processing across many parallel nodes on the
similar system. MapReduce provides the interface for the distribution of sub-tasks and the aggregating
of outputs.84
HBase HBase is an Apache open-source project which is a distributed, highly scalable, column-oriented database
management system.
Cassandra Cassandra is also a distributed database enabler for managing a large volume of data spread out across
many utility servers. Cassandra also offers consistent service without any failure.85
Avro Avro provides record abstraction and data serialization service.86 It has more features like versioning,
version control.
Zookeeper ZooKeeper enables big data infrastructure since it can serve to coordinate parallel computation of
distributed applications.87
Lucene The Lucene project is applied to text analytics/searches and has been incorporated into different open
source projects. It provides numerous opportunities including full text indexing and library search for
use within a Java application.88
Oozie Oozie is developed from the ground up for large-scale Hadoop workflow. All architectural attributes of
Oozie deliver scalability, multitenancy, and effective coordination among sub-systems.89
Mahout Mahout is another Apache project whose focus is to produce free applications of distributed and scalable
artificial intelligence algorithms that support big data analytics on the Hadoop platform.
tools.81 These supporting tools and platforms are (SOMs) have been applied to this domain. Associa-
summarized in Table 3. tion rule is a rule that implies a particular link
between sets of objects within a dataset, in the form
of ‘if precedent then incident.’90 Another class of
Methods SNA is positional analysis where block modeling and
Security intelligence refers to developing insights clustering approaches are more suitable because these
from data for criminal network investigation. Data two approaches can effectively extract connection
mining techniques can help achieve such a goal patterns among nodes in a network. To discover the
through extracting and detecting patterns of criminal overall structure of a network, the key approach is
networks or forecasting criminal behavior from block modeling that includes two steps: network par-
large-scale datasets. Different data mining methods tition and interaction pattern identification.91 How-
such as clustering, association rule mining, block ever, some appropriate data mining techniques
modeling, network partitioning, and classification are should be selected based on the underlying data char-
employed to study organized crimes.2,3 In this paper, acteristics and business problems.90
we classify SNA into two classes. One of them is rela-
tional analysis where network partitioning, clustering
and association rule mining approaches are more Applications
suitable because they can estimate different kinds of
centralities more effectively. In addition, clustering Applications for Relational Analysis Purposes
methods are applied to associate a person with an Behavioral profiling is the capacity to extract patterns
organization and/or vehicle in criminal investiga- of criminal activities, to predict the probable time
tion.64 Sequential clustering techniques have been and place of crime to take place, and to identify the
used to discover the preference of computer criminals different members of a criminal network.24 Behav-
by Brown and Gunderson.49 Recently, some neural ioral profiling has been widely used in marketing
network models such as Self-Organizing Maps intelligence to provide personalization, that is,
making the right offer to the right person at the right Moreover, it is essential to identify similar groups
place and at the right time.92,93 By the same token, of criminal networks and their nodes (members) based
behavior profiling can be applied to launch the right on the analogous conversation and transaction data in
investigation against the right suspect at the right inter or intra networks for the development of effective
time, and before the criminal commits a crime. How- security measures. The segmentation techniques can be
ever, behavioral profiling is becoming increasingly applied to identify various criminal groups who have
challenging because of the 3Vs of big data. For similar interests. McCue proposed a behavioral seg-
instance, identifying a suspect group is one challenge, mentation approach for identifying communities within
and extracting deviations from the norm is another a criminal network.98 There are different approaches
challenge. To design behavioral profiling, it is neces- (e.g., hierarchical clustering, network partitioning and
sary to analyze huge amount of telephone call data matrix permutation approaches) which are widely used
and bank transaction records with a variety of for- for segmenting network.99 Segmentation is also related
mats.24 Pattern analyses mainly focuses on three to various decision-making processes, and so it is
important factors: interaction pattern, relationship becoming increasingly challenging under the big data
pattern, and financial transaction pattern.94 Pattern environment. Therefore, new computational methods
analysis defines some standards to identify deviations must be developed to cope with the 3Vs of big data.
from groups.
big data are first collected, described, and trans- of iterative what-if analyses. The analytical models
formed to a format suitable for sequent analysis. and their characteristics are tested and validated in
Then, the team should develop the basic understand- step 4. During the evidence extraction stage, the big
ing about the structure and contents of the datasets. data enabled models are refined to improve their
After the data have been characterized and cleaned- accuracy and scalability, as well as their ability to
up, they are prepared for subsequent analysis. meet the original stated business objectives
Although exploring and preparing big data is time- (e.g., crime investigation).104 For the final application
consuming, it is a fundamental step of a data mining step, the discovered actionable knowledge is applied
methodology.103 Moreover, apart from data clean- to support relevant activities and processes such as
ing, data aggregating and the production of training potential crime prediction.
and test data should be conducted at this stage. In the next section, some real-world big data
Then, the team should proceed to step 3 of the pro- analytics projects for security intelligence are dis-
posed methodology if the quality of the pre-processed cussed. We collected these examples from numerous
data is above the desired threshold; otherwise, it sources including organizations’ official Websites.
must go back to step 1 for further refinement.
In step 3, different big data analytic platforms
and tools are chosen to analyze the big data. This
step is more challenging, and so much more compli-
APPLICATIONS AND CHALLENGES
cated than other steps. Different big data analytics According to the White House report on big data,105
tools are deployed at this stage. The deployment of top administrators of security intelligence agencies
big data analytics tools and platforms tends to be agreed to launch a big data enabled safety and secu-
complicated, and it usually involves a team of experts rity initiative back to 2012. They have introduced
for its execution.79 At this stage, various big data two pilot projects named Neptune and Cerberus,
analytics methods are applied to gain insights from respectively. Neptune mainly performed the security
the big data. Unlike routine analytical methods, big and privacy preservation processes. Its information
data analytics methods can scale up with high vol- architecture is designed in such a way that data confi-
ume of heterogeneous data. Finally, the analytical dentiality is strongly protected. For strengthening
models are designed for the big data through a series data security, Neptune which belongs to the Sensitive
Big data
• Big data collection
Step2 • Big data integration and cleaning
• Big data description and formatting
• Big data management and quality
Evidence extraction
Step4 • Validating the result
• Reviewing the process
Application
Step5 • Planning deployment
• Reporting applications
but Unclassified (SBU) domain, tags the information A data mining project was conducted by Wol-
through assigning both tag names and values of all verhampton University and West Midlands Police
consumed data. The pilot project of Neptune can (UK), where they used a SOM to predict and prevent
ingest, tag, and transfer collected raw data to exter- sex- and homicide-related crime. The USA, UK,
nal systems such as the Electronic System for Travel Canada, and South Africa police have used ANNs to
Authorization (ESTA) system, Student and Exchange identify fraud and money laundering. Their ANN
Visitor Information System (SEVIS), and Alien Flight application was empowered by the HNC falcon sys-
Student Program (AFSP) component database sys- tem.106 Auto trackXP, DARPA, COPLINK, and Dol-
tems. Cerberus includes protected cloud computing phinSearch have leveraged data mining techniques to
in big data settings on a Top Secret/Sensitive Com- combat different terrorist activities such as bioterror-
partmented Information (TS/SCI) network to allow ism, drug enforcement, fraud, and money laundering.
real-time computing, analysis, and it has been used The aforementioned technologies have also been
by the Department of Homeland Security (DSH) via adopted by the department of Homeland Security,
the Neptune platform. This project is administrated USA. On the other hand, the UK police department
by the Intelligence and Analysis (I&A) office and has applied another text mining technology named
supervised by the Common Vetting Task Copernic for forensic investigation.
Force (CVTF). Big data analytics for security intelligence
UK police and the US defense department have should support some key features that are common
been using CWB developed by a British company for analyzing security data. The big data platforms
named Memex technology limited. They use large- are assessed by some key features such as availability,
scale textual databases to enhance the criminal data scalability, privacy, security enablement, ease of
analysis process. They have employed an enhanced access, real-time output, and ability to process
search engine with a database intelligent management diverse levels of granularity.107,108 As the nature of
system. By using big textual data analytics, they clus- criminal data has evolved, so are the data analytic
ter and categorize criminal networks and criminals’ approaches that should be scaled up to carry out the
modus operandi. The LEAP network, developed by more complex and dynamic analytics to address the
the technology community and law enforcement 3Vs challenges of big data. Criminal datasets have
agencies is an information network for analyzing become increasingly more popular represented by
large volumes of heterogeneous data which are unstructured multimedia formats. Heterogeneous
shared across security agencies, geographies, and nature of the collected criminal data is a challenging
machines. This network provides users with a unique issue that makes the investigation process become
capability to combat different offences that transcend more complicated and challenging. Recently, another
geography and time such as human trafficking, nar- characteristic of big data named veracity have
cotics trafficking, and organized retail crime. received a lot of attention by researchers and practi-
Operation Virtual Shield (OVS) is another tioners. Big data sources must be reliable and big
example of a big data analytics project to enhance data analytics methods should be effective for
security intelligence discovery. This project is extracting useful and relevant knowledge from
financed by the Homeland Security grants with an among a mixture of high quality and low quality
amount of $217 million. In Chicago’s OVS, it data. Under the criminal investigation context, verac-
includes at least 2,250 cameras, where 250 of them ity is a more challenging issue for the following rea-
have biometric equipment and technology. It also sons. First, investigative actions and law-enforcement
enables investigators to identify deceptive movement decisions heavily depend on having the right infor-
and harmful activity, and extract events’ locations. mation (high quality information). Second, the secu-
Moreover, OVS includes facial recognition, which is rity and well-being of citizens quite depend on the
a computer application for recognizing and validat- correct and timely law-enforcement actions. To
ing a person from digital images or video data.105 address the above challenges, we identify some gen-
Recently, the Apache project has developed eral research guidelines to facilitate big data analytics
another big data security platform named Metron projects for security and criminal investigations.
that can detect cyber anomalies through analyzing
big data. The Metron platform enables an organiza-
tion to store and manipulate huge volume of data.
FUTURE RESEARCH DIRECTIONS
This platform not only identifies cyber anomalies but
also enables organizations to constantly monitor In this study, we identify the data sources, methods,
their network traffic and activities. big data platforms, tools, and applications from
different security perspectives. We also describe some 5. Tackling the variety of data formats from mul-
challenging issues of big data analytics in the context tiple data sources. Owing to various structures,
of security and criminal investigation perspectives. quality, granularity, and objectives of investiga-
We now highlight some future research directions for tions, data collection and analysis methods
big data enhanced criminal investigation. tend to be different.110 Accordingly, project
teams should continuously refine their frame-
1. Studying the characteristics of different data and works, analytics techniques, and tools to meet
the strategies to select the most appropriate data the evolving characteristics of investigations,
sources for achieving particular investigation goals. and the emerging formats of criminal data gen-
As the volume of data is increasing continuously, erating from possibly new data sources.
classical techniques may not be efficient enough to
process all available data in a timely manner.4,105 CONCLUSIONS
Thus, the selection of proper data sources is neces-
sary for managing security intelligence. The purpose of this review article is to provide research-
2. Selection of appropriate analytical models. For ers and practitioners with a retrospective view of several
methodologies and technologies, particularly big data
big data analytics, there are numerous data
analytical methods in which some methods are analytics methodology for enhancing security and crimi-
more suitable for certain investigation purposes nal investigations. We identify five major technologies
namely, link analysis, intelligent agents, text mining,
and datasets.1 Consistent with the HACE theo-
rem in Ref 109, big data also introduce some ANNs, and ML which have been widely used in various
new challenges such as non-uniform data circu- domains for developing the technical foundations of an
automated security and criminal investigation system.
lation and distributed manipulation with a
large number of variables that cannot be coped Under the big data environment, opportunities exist for
with by existing analytical methods. Therefore, tapping into big data to enhance security and criminal
investigations. However, new methodologies and analyti-
the project team needs to examine the charac-
teristics of different analytical models, and cal techniques should be explored to address the funda-
match the most appropriate methods with the mental challenges of big data, and hence to leverage big
data to facilitate criminal investigations. In short, big data
specific nature of the investigation domains.
analytics has the potential to transform the way that law
3. Matching between data mining techniques and enforcement and security intelligence agencies extract
methodology. After selecting the analytical meth- vital knowledge (e.g., criminal networks) from multiple
ods, project teams should select a suitable meth- data sources in real-time to support their investigations.
odology to guide the whole investigation work. Such real-time knowledge could help security intelligence
Some techniques are applicable to some meth- agencies to develop comprehensive strategies to prevent
odologies. For example, the clustering technique and respond to organized crimes such as terrorist attacks
is more suitable for the CRISP-DM methodol- and human trafficking. Accordingly, we propose a big
ogy, but less suitable for the AMPA methodol- data analytics methodology to facilitate security intelli-
ogy.104 Therefore, proper matching between gence agencies for criminal investigations. With the con-
techniques and methodologies are required for tinuous advancement of big data analytics techniques,
effective and efficient investigative actions. platforms, tools, and methodologies, we will see the
4. Exploring proper data integration methods to widespread utilization of big data across law enforcement
tackle complicated investigation problems. Most and security intelligence agencies in coming few years.
existing studies on security and criminal investi- The big data revolution leads to the vision that the weap-
gation use data from a single data source. How- ons against crimes, extremism, and terrorism are no
ever, more complicated investigation problems longer bullets or bombs, but big data enhanced criminal
require aggregating data from multiple sources.3 analytics.
ACKNOWLEDGMENTS
This work is supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region,
China (Projects: CityU 11502115 and CityU 11525716), the NSFC Basic Research Program (Project: 71671155), and
the Shenzhen Municipal Science and Technology R&D Funding - Basic Research Program (Project No.
JCYJ20160229165300897).
REFERENCES
1. McAfee A, Brynjolfsson E, Davenport TH, Patil DJ, social objects and link analysis in social networks.
Barton D. Big data. The management revolution. Har- Knowl Syst 2012, 26:164–173.
vard Bus Rev 2012, 90:61–67. 18. Brenner W, Zarnekow R, Wittig H. Intelligent Soft-
2. Xu J, Chen H. Criminal network analysis and visuali- ware Agents: Foundations and Applications. Berlin,
zation. Commun ACM 2005, 48:100–107. Germany: Springer Science & Business Media; 2012.
3. Chibelushi C, Sharp B, Shah H. ASKARI: a crime text 19. Jain M, Tsai J, Pita J, Kiekintveld C, Rathi S,
mining approach. Digital Crime Forensic Sci Cyber- Tambe M, Ordónez F. Software assistants for rando-
space 2006, 30:155. mized patrol planning for the lax airport police and
4. Kaisler S, Armour F, Espinosa JA, Money W. Big data: the federal air marshal service. Interfaces 2010,
Issues and challenges moving forward. In: System 40:267–290.
sciences (HICSS), 2013 46th Hawaii International 20. Bard JF. Practical Bilevel Optimization: Algorithms
Conference, IEEE, 17 January, 2013, 995–1004. and Applications. Berlin, Germany: Springer Science &
5. Van der Hulst RC. Introduction to social network Business Media; 2013:9.
analysis (SNA) as an investigative tool. Trends Organ 21. Fricker RD Jr, Rolka H. Protecting against biological
Crime 2009, 12:101–121. terrorism: statistical issues in electronic biosurveillance.
6. Tan W, Blake MB, Saleh I, Dustdar S. Social-network- Chance 2006, 19:4–14.
sourced big data analytics. IEEE Internet Comput
22. Jaber A, Guarnieri F, Wybo JL. Intelligent software
2013, 17:62–69.
agents for forest fire prevention and fighting. Safety Sci
7. Han J, Pei J, Kamber M. Data Mining: Concepts and 2001, 39:3–17.
Techniques. Amsterdam, Netherlands: Elsevier; 2011.
23. Taylor M, Haggerty J, Gresty D, Almond P, Berry T.
8. Tufféry S. Data mining and statistics for decision Forensic investigation of social networking applica-
making. tions. Netw Security 2014, 2014:9–16.
9. Oatley G, Ewart B, Zeleznikow J. Decision support 24. Mena J. Investigative Data Mining for Security and
systems for police: lessons from the application of data Criminal Detection. Oxford, United Kingdom: Butter-
mining techniques to “soft” forensic evidence. Artif worth-Heinemann; 2003.
Intell Law 2006, 14:35–100.
25. Lin C, Hu PJ, Chen H. Technology implementation
10. Getoor L. Link mining: a new data mining challenge.
management in law enforcement: COPLINK system
ACM SIGKDD Explor Newslett 2003, 5:84–89.
usability and user acceptance evaluations. Soc Sci
11. Schroeder J, Xu J, Chen H, Chau M. Automated crimi- Comput Rev 2004, 22:24–36.
nal link analysis based on domain knowledge. J Am
26. Šišlák D, Pechoucek M, Volf P, Pavlícek D, Samek J,
Soc Inf Sci Technol 2007, 58:842–855.
Marík V, Losiewicz P. AGENTFLY: towards multi-
12. Gilbert E, Karahalios K. Predicting tie strength with agent technology in free flight air traffic control. In:
social media. In: Proceedings of the SIGCHI confer- Defence Industry Applications of Autonomous Agents
ence on Human Factors in Computing Systems, ACM, and Multi-Agent Systems, 2007, 73–96. Basel:
4 April, 2009, 211–220. Birkhäuser.
13. Klerks P. The network paradigm applied to criminal 27. He Q, Sycara K, Su Z. A solution to open standard of
organizations: Theoretical nitpicking or a relevant doc- PKI. In: Australasian Conference on Information Secu-
trine for investigators? Recent developments in the rity and Privacy, 13 July, 1998, 99–110. Berlin, Hei-
Netherlands. Connections 2001, 24:53–65. delberg: Springer.
14. Schroeder J, Xu J, Chen H. Crimelink explorer: Using 28. Hegazy IM, Al-Arif T, Fayed ZT, Faheem HM. A
domain knowledge to facilitate automated crime asso- multi-agent based system for intrusion detection.
ciation analysis. In: International Conference on Intel- IEEE Potentials 2003, 22:28–31.
ligence and Security Informatics, 2 June, 2003,
168–180. Berlin, Heidelberg: Springer. 29. Goldberg HG, Wong RW. Restructuring transactional
data for link analysis in the FinCEN AI system. In:
15. Yang CC, Shi X, Wei CP. Tracing the event evolution
AAAI Fall Symposium, January 1998, 38–46.
of terror attacks from on-line news. In: International
Conference on Intelligence and Security Informatics, 30. Aggarwal CC, Zhai C, eds. Mining Text Data. Berlin,
23 May, 2006, 343–354. Berlin, Heidelberg: Springer. Germany: Springer Science & Business Media; 2012.
16. Kim Y, Choi TY, Yan T, Dooley K. Structural investi- 31. Zhong N, Li Y, Wu ST. Effective pattern discovery for
gation of supply networks: a social network analysis text mining. IEEE Trans Knowl Data Eng 2012,
approach. J Oper Manage 2011, 29:194–211. 24:30–44.
17. Zhao Z, Feng S, Wang Q, Huang JZ, Williams GJ, 32. Al-Zaidy R, Fung B, Youssef AM. Towards discover-
Fan J. Topic oriented community detection through ing criminal communities from textual data. In:
Proceedings of the 2011 ACM Symposium on Applied 48. Maren AJ, Harston CT. Pap RM. Handbook of Neu-
Computing, ACM, 21 March, 2011, 172–177. ral Computing Applications: Academic Press; 2014.
33. Tseng YH, Ho ZP, Yang KS, Chen CC. Mining term 49. Brown DE, Gunderson LF. Using clustering to discover
networks from text collections for crime investigation. the preferences of computer criminals. IEEE Trans
Expert Syst Appl 2012, 39:10082–10090. Syst Man Cybern A Syst Hum 2001, 31:311–318.
34. Poelmans J, Van Hulle MM, Viaene S, Elzinga P, 50. Dahbur K, Muscarello T. Classification system for
Dedene G. Text mining with emergent self organizing serial criminal patterns. Artif Intell Law 2003,
maps and multi-dimensional scaling: a comparative 11:251–269.
study on domestic violence. Appl Soft Comput 2011, 51. Linda O, Vollmer T, Manic M. Neural network based
11:3870–3876. intrusion detection system for critical infrastructures.
35. Lee R. Automatic information extraction from docu- In: Neural Networks. IJCNN 2009. International Joint
ments: A tool for intelligence and law enforcement Conference, IEEE 14 June, 2009, 1827–1834).
analysts. In: Proceedings of 1998 AAAI Fall Sympo- 52. Elder JF, Abbott DW. A comparison of leading data
sium on Artificial Intelligence and Link Analysis, mining tools. In: Fourth International Conference on
23 October, 1998. Menlo Park, CA: AAAI Press. Knowledge Discovery and Data Mining, 28 August,
36. Fuller CM, Biros DP, Delen D. An investigation of 1998.
data and text mining methods for real world deception 53. Veltkamp RC, Tanase M. Content-based image
detection. Expert Syst Appl 2011, 38:8392–8398. retrieval systems: a survey.
37. Yin D, Xue Z, Hong L, Davison BD, Kontostathis A, 54. Mitchell TM. Machine Learning. Burr Ridge, IL:
Edwards L. Detection of harassment on web 2.0. Proc McGraw Hill; 1997.
Content Anal Web 2009, 2:1–7.
55. Russell SJ, Norvig P, Canny JF, Malik JM,
38. McGreevy MW. Using Perilog to explore “Decision Edwards DD. Artificial Intelligence: A Modern
Making at NASA”. Approach, vol. 2. Upper Saddle River, NJ: Prentice
39. Connell L. NASA Aviation Safety Reporting Sys- Hall; 2003.
tem (ASRS). 56. Baumgartner K, Ferrari S, Palermo G. Constructing
40. Bovbjerg KM. Personal development under market Bayesian networks for criminal profiling from limited
conditions: NLP and the emergence of an ethics of sen- data. Knowl Syst 2008, 21:563–572.
sitivity based on the idea of the hidden potential of the 57. Franke K, Srihari SN. Computational forensics: an
individual. J Contemp Relig 2011, 26:189–205. overview. In: International Workshop on Computa-
41. Haykin S, Network N. A comprehensive foundation. tional Forensics, 7 August 2008, 1–10. Berlin, Heidel-
Neural Netw 2004, 2:41. berg: Springer.
42. Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M. 58. Nath SV. Crime pattern detection using data mining.
Crime data mining: a general framework and some In: 2006 IEEE/WIC/ACM International Conference
examples. Computer 2004, 37:50–56. on Web Intelligence and Intelligent Agent Technology
43. Strano M. A neural network applied to criminal psy- Workshops, WI-IAT 2006 Workshops, IEEE,
chological profiling: an Italian initiative. Int J Offender December 2006, 41–44.
Ther Comp Criminol 2004, 48:495–503. 59. Wang T, Rudin C, Wagner D, Sevieri R. Learning to
44. Chen H, Chung W, Qin Y, Chau M, Xu JJ, Wang G, detect patterns of crime. In: Joint European Confer-
Zheng R, Atabakhsh H. Crime data mining: an over- ence on Machine Learning and Knowledge Discovery
view and case studies. In: Proceedings of the 2003 in Databases, 23 September, 2013, 515–530. Berlin,
Annual National Conference on Digital Government Heidelberg: Springer.
Research, Digital Government Society of North Amer- 60. Ku CH, Iriberri A, Leroy G. Crime information extrac-
ica, 18 May, 2003, 1–5. tion from police and witness narrative reports. In:
45. Xu JJ, Chen H. Fighting organized crimes: using 2008 I.E. Conference on Technologies for Homeland
shortest-path algorithms to identify associations in Security, IEEE, 12 May, 2008, 193–198.
criminal networks. Decis Support Syst 2004, 61. Stoffel K, Cotofrei P, Han D. Fuzzy methods for foren-
38:473–487. sic data analysis. In: International Conference of Soft
46. Brahan JW, Lam KP, Chan H, Leung W. AICAMS: Computing and Pattern Recognition (SoCPaR), IEEE,
artificial intelligence crime analysis and management 7 December, 2010, 23–28.
system. Knowl Syst 1998, 11:355–361. 62. Gupta P, Sharma A, Jindal R. Scalable machine-
47. Helbich M, Hagenauer J, Leitner M, Edwards R. learning algorithms for big data analytics: a compre-
Exploration of unstructured narrative crime reports: hensive review. WIREs Data Min Knowl Discov 2016,
an unsupervised neural network and point pattern 6:194–214.
analysis approach. Cartogr Geogr Inf Sci 2013, 63. Van Halteren H, Baayen H, Tweedie F, Haverkort M,
40:326–336. Neijt A. New machine learning methods demonstrate
the existence of a human stylome. J Quant Linguist 78. Popp R, Armour T, Numrych K. Countering terrorism
2005, 12:65–77. through information technology. Commun ACM
64. Chau M, Xu JJ, Chen H. Extracting meaningful enti- 2004, 47:36–43.
ties from police narrative reports. In: Proceedings of 79. Zikopoulos P, Eaton C. Understanding Big Data:
the 2002 Annual National Conference on Digital Gov- Analytics for Enterprise Class Hadoop and Streaming
ernment Research, Digital Government Society of Data. New York: McGraw-Hill Osborne Media;
North America, 19 May, 2002, 1–5. 2011. Chapter 3.
65. Yang CC, Ng TD. Terrorism and crime related weblog 80. Raghupathi W, Raghupathi V. Big data analytics in
social network: link, content analysis and information healthcare: promise and potential. Health Inf Sci Syst
visualization. In: Intelligence and Security Informatics, 2014, 2:3.
IEEE, 23 May, 2007 May 23, 55–58. 81. Zikopoulos P, Parasuraman K, Deutsch T, Giles J,
66. Kerschbaum F, Schaad A. Privacy-preserving social Corrigan D. Harness the Power of big data. The IBM
network analysis for criminal investigations. In: Pro- Big Data Platform; McGraw Hill Professional 2012.
ceedings of the 7th ACM workshop on Privacy in the 82. Borthakur D. HDFS architecture guide. Hadoop
electronic society, ACM, 27 October, 2008, 9–14. Apache Project; 2008: 53.
67. Smets K, Goethals B, Verdonk B. Automatic vandalism 83. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P,
detection in Wikipedia: towards a machine learning Anthony S, Liu H, Wyckoff P, Murthy R. Hive: a
approach. In: AAAI workshop on wikipedia and artifi- warehousing solution over a map-reduce framework.
cial intelligence: an evolving synergy, 13 July, Proc VLDB Endowment 2009, 2:1626–1629.
2008, 43–48.
84. Fernández A, del Río S, López V, Bawakid A, del
68. Goldberg HG, Kirkland JD, Lee D, Shyr P, Jesús MJ, Benítez JM, Herrera F. Big data with cloud
Thakker D. The NASD Securities Observation, New computing: an insight on the computing environment,
Analysis and Regulation System (SONAR). In: IAAI, MapReduce, and programming frameworks. WIREs
12 August, 2003, 11–18. Data Min Knowl Discov 2014, 4:380–409.
69. Gao S, Xu D. Conceptual modeling and development 85. Lakshman A, Malik P. Cassandra: a decentralized
of an intelligent agent-assisted decision support system structured storage system. ACM SIGOPS Operat Syst
for anti-money laundering. Expert Syst Appl 2009, Rev 2010, 44:35–40.
36:1493–1504. 86. Floratou A, Patel JM, Shekita EJ, Tata S. Column-
70. Wang GA, Chen H, Xu JJ, Atabakhsh H. Automati- oriented storage techniques for MapReduce. Proc
cally detecting criminal identity deception: an adaptive VLDB Endowment 2011, 4:419–429.
detection algorithm. IEEE Trans Syst Man Cybern A 87. Hunt P, Konar M, Junqueira FP, Reed B. ZooKeeper:
Syst Hum 2006, 36:988–999. Wait-free Coordination for Internet-scale Systems. In:
71. Monge AE. Adaptive detection of approximately dupli- USENIX annual Technical Conference, 23 June, 2010;
cate database records and the database integration 8: 9.
approach to information discovery. Doctoral disserta- 88. McCandless M, Hatcher E, Gospodnetic O. Lucene in
tion, University of California, San Diego, 1997. Action: Covers Apache Lucene 3.0. Greenwich, CT:
72. Helmer GG, Wong JS, Honavar V, Miller L. Intelligent Manning Publications Co; 2010.
agents for intrusion detection. In: Information Tech- 89. Islam M, Huang AK, Battisha M, Chiang M,
nology Conference, IEEE, 1 September, 1998, Srinivasan S, Peters C, Neumann A, Abdelnur A.
121–124. Oozie: towards a scalable workflow management sys-
73. Fanning K, Cogger KO, Srivastava R. Detection of tem for Hadoop. In: Proceedings of the 1st ACM SIG-
management fraud: a neural network approach. MOD Workshop on Scalable Workflow Execution
ISAFM 1995, 4:113–126. Engines and Technologies, ACM, 20 May, 2012, 4.
74. Adderley R, Townsley M, Bond J. Use of data mining 90. Liao SH, Chu PH, Hsiao PY. Data mining techniques
techniques to model crime scene investigator perfor- and applications–a decade review from 2000 to 2011.
mance. Knowl Syst 2007, 20:170–176. Expert Syst Appl 2012, 39:11303–11311.
75. Lu Y, Luo X, Polgar M, Cao Y. Social network analy- 91. Liben-Nowell D, Kleinberg J. The link-prediction
sis of a criminal hacker community. J Comput Inform problem for social networks. J Assoc Inf Sci Technol
Syst 2010, 51:31–41. 2007, 58:1019–1031.
76. Burt RS. Models of network structure. Annu Rev 92. Abbasoglu MA, Gedik B, Ferhatosmanoglu H. Aggre-
Sociol 1980, 6:79–141. gate profile clustering for telco analytics. Proc VLDB
77. Pramanik MI, Zhang W, Lau RY, Li C. A framework Endowment 2013, 6:1234–1237.
for criminal network analysis using big data. In: 13th 93. Fan S, Lau RY, Zhao JL. Demystifying big data analyt-
International Conference on e-Business Engineering ics for business intelligence through the lens of market-
(ICEBE), IEEE, 4 November, 2016, 17–23. ing mix. Big Data Res 2015, 2:28–32.
94. Sarvari H, Abozinadah E, Mbaziira A, McCoy D. 104. Ozgul F, Atzenbeck C, Celik A, Erdem Z. Incorporat-
Constructing and analyzing criminal networks. In: ing data sources and methodologies for crime data
Security and Privacy Workshops (SPW), IEEE, mining. In: IEEE International Conference on Intelli-
2014, 84–91. gence and Security Informatics (ISI), IEEE, 2011,
95. Rogers M. The role of criminal profiling in the com- 176–180.
puter forensics process. Comput Secur 2003, 105. House W. Big data: seizing opportunities, preserving
22:292–298. values (Report for the President), 2014. Washington,
96. Loh S, Wives LK, de Oliveira JP. Concept-based DC: Executive Office of the President. [WWW docu-
knowledge discovery in texts extracted from the web. ment]. Available at: http://www. whitehouse.gov/
ACM SIGKDD Explor Newslett 2000, 2:29–39. sites/default/files/docs/big_data_privacy_report_may_
97. Westphal C. Data Mining for Intelligence, Fraud & 1_2014.pdf
Criminal Detection: Advanced Analytics & Informa- 106. Piatetsky-Shapiro G. Knowledge discovery in data-
tion Sharing Technologies. Boca Raton, FL: CRC bases: 10 years after. ACM SIGKDD Explor News-
Press; 2008. lett 2000, 1:59–61.
98. McCue C. Data Mining and Predictive Analysis: Intel- 107. Bollier D, Firestone CM. The promise and peril of big
ligence Gathering and Crime Analysis. Oxford, United data. Washington, DC: Aspen Institute, Communica-
Kingdom: Butterworth-Heinemann; 2014. tions and Society Program; 2010.
99. Drewes B. Integration of text and data mining. 108. Ramírez-Gallego S, García S, Mouriño-Talín H,
WIT Trans Inform Commun Technol 2002, 10:28. Martínez-Rego D, Bolón-Canedo V, Alonso-
100. McCue C. Data Mining and Predictive Analysis. Betanzos A, Benítez JM, Herrera F. Data discretiza-
Oxford: Elsevier, Butterworth-Heinemann; 2007. tion: taxonomy and big data challenge. WIREs Data
101. Chapman P, Clinton J, Kerber R, Khabaza T, Min Knowl Discov 2016, 6:5–21.
Reinartz T, Shearer C, Wirth R. CRISP-DM 1.0 Step- 109. Wu X, Zhu X, Wu GQ, Ding W. Data mining with
by-step data mining guide. big data. IEEE Trans Knowl Data Eng 2014,
102. Giraud-Carrier C, Povel O. Characterising data min- 26:97–107.
ing software. Intell Data Anal 2003, 7:181–192. 110. Sagiroglu S, Sinanc D. Big data: a review. In: Interna-
103. Sherman R. Data integration advisor: set the stage tional Conference on Collaboration Technologies and
with data preparation. DM Rev 2005, 15: 54–55. Systems (CTS), IEEE, 2013, 42–47.
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: