N 6 Mo LPKHK PCkazs H6 CZVy 5 Fs RZu BTGL Q91 Wdyw TV
N 6 Mo LPKHK PCkazs H6 CZVy 5 Fs RZu BTGL Q91 Wdyw TV
N 6 Mo LPKHK PCkazs H6 CZVy 5 Fs RZu BTGL Q91 Wdyw TV
SOUVENIR
National Conference on Emerging Trends in Information Technology-
Advances in High Performance Computing, Data Sciences & Cyber Security
Volume 8 Issue 1 January-June, 2017
CONTENTS
Page No.
● Intelligent Cyber Security Solutions through High Performance 3-9
Computing and Data Sciences : An Integrated Approach
- Sandhya Maitra, Dr. Sushila Madan
● Applications of Machine Learning and Data Mining for Cyber Security 10-16
- Ruby Dahiya, Anamika
● Fingerprint Image Enhancement Using Different Enhancement Techniques 17-20
- Upender Kumar Agrawal, Pragati Patharia, Swati Kumari, Mini Priya
● Data Mining in Credit Card Frauds: An Overview 21-26
- Vidhi Khurana, Ramandeep Kaur
● Review of Text Mining Techniques 27-31
- Priya Bhardwaj, Priyanka Khosla
● Security Vulnerabilities of Websites and Challenges in Combating these Threats 32-36
- Dhananjay, Priya Khandelwal, Kavita Srivastava
● Security Analytics: Challenges and Future Directions 37-41
- Ganga Sharma, Bhawana Tyagi
● A Survey of Multicast Routing Protocols in MANET 42-50
- Ganesh Kumar Wadhwani, Neeraj Mishra
● Relevance of Cloud Computing in Academic Libraries 51-55
- Dr. Prerna Mahajan, Dr. Dipti Gulati
● A brief survey on metaheuritic based techniques for optimization problems 56-62
- Kumar Dilip, Suruchi Kaushik
● Cross-Language Information Retrieval on Indian Languages: A Review 63-66
- Nitin Verma, Suket Arora, Preeti Verma
● Enhancing the Efficiency of Web Data Mining using Cloud Computing 67-70
- Tripti Lamba, Leena Chopra
IITM Journal of Management and IT
Page No.
enormous data. Advances in Networking, High End necessitating comprehensive security analysis,
Computers, Distributed and Grid computing, Large- assessment and action plans for protecting our critical
scale visualization and data management, Systems infrastructures and sensitive information[1].
reliability, High-performance software tools and
Cyber security in recent times demand secure systems
techniques, and compilation techniques are taking a
which help in detection of intrusions, identification
new era of high performance, parallel and distributed
of attacks, confinement of sensitive information to
computing. Over the past few decades security
security zones, data encryption, time stamping and
concerns are becoming increasingly important and
validation of data and documents, protection of
extremely critical in the realm of communication and
intellectual property, besides others. The current
information systems as they become more
security solutions require a mix of software and
indispensable to the society. With the continuous
hardware to augment the power of security algorithms,
growth of cyber connectivity and the ever increasing
real time analysis of voluminous data, rapid encryption
number of applications, remotely delivered services,
and decryption of data, identification of abnormal
and networked systems digital security has become
patterns, checking identities, simulation of attacks,
the need of the hour. Today government agencies,
validation of software security proof, patrol systems,
financial institutions, and business enterprises are
analysing video material and many more innumerable
experiencing security incidents and cyber-crimes, by
actions [2].
which attackers could generate fraudulent financial
transactions, commit crimes, perform an industrial Analysis of new and diverse digital data streams can
espionage, and disrupt the business processes. The reveal potentially new sources of economic value, fresh
sophistication and the borderless nature of the insights into customer behavior and market trends.
intrusion techniques used during a cyber security But this influx of new data creates challenges for IT
incident, have generated the need for designing new Industry. We need to have Information Security
active cyber defense solutions, and developing efficient measures to ensure a safe, secure and reliable cyber
incident response plans. With the number of cyber network, for the transmission and flow of
threats escalating worldwide, there is a need for information[1].
comprehensive security analysis, assessment and
actions to protect our critical infrastructures and III. High Performance computing
sensitive information[1]. The re-emergence of need for supercomputers for
cyber security stems from their computing capacity
II. Cyber Security
ability to perform large number of checks in an
The spectacular growth of cyber connectivity and the extremely short time particularly in the case of
monumental increase of number of networked financial transactions for the identification of cyber
systems, applications and remotely delivered services crimes using techniques featuring cross-analysis of data
cyber security has taken top precedence amongst other coming from several different sources[2]. The
issues. Attackers are able to effect fraudulent financial knowledge gained through HPC analysis and
transactions, perform industrial espionage, disrupt evaluation can be instrumental providing
business processes and commit crimes with much ease. comprehensive cyber security as it helps interpret the
Additionally government agencies are also multifaceted complexities involved in cyber space
experiencing security incidents and cyber-crimes of comprising complex technical, organizational and
dangerous proportions which can compromise on human systems[3].
Nations Security. The sophisticated intrusion
techniques used in the cyber security incidents and A combined system of Distributed sensor networks
their borderless nature have provided the impetus to and HPC cybersecurity systems such as exascale
design new active cyber defense solutions, and develop computing helps in real-time fast I/O HPC accelerated
efficient and novel incident response plans. The processing. This covers various issues such as data
number of cyber threats are escalating globally, collection, analysis and response to takes care of the
issues of data locality, transport, throughput, latency, existence of a secure perimeter as before where it was
processing time and return of information to defenders confined to secure data centers as data leaks out of
and defense devices. massive data centers into cloud, mobile devices and
An important set of HPC jobs has involved analytics, individual PCS . Most companies do not have policies
discovering patterns in the data itself as in prohibiting storage of data in mobiles while people
cryptography. The data explosion fueling the growth on the other hand prefer storing them on to their
of high performance data analysis originates from the mobiles with huge computing and storage power for
following factors: convenience and efficiency of operations.
1. The efficiency of HPC systems to run data- Cloud-based data mostly exists in commercial data
intensive modeling. centers, on shared networks, on multiple disk devices
2. Advent of larger, more complex scientific in the data center, and multiple data centers for the
instruments and sensor networks such as “smart” purpose of replication. The extremely difficult task
power grids. of developing Cloud security is now made possible
3. Growth of stochastic modeling (financial with new technologies such as HPC and machine
services), parametric modeling (manufacturing) learning.
and iterative problem-solving methods, whose Data from data centers should be moved to cloud only
cumulative results are large volumes of data. for business reasons with benefits outweighing the
4. Availability of newer advanced analytics methods costs of providing cloud security to protect it. Data
and tools: MapReduce/Hadoop, graph analytics, Inventories should be maintained in encrypted form,
semantic analysis, knowledge discovery algorithms tracked and managed well on mobile devices to
and others the escalating need to perform prevent theft of data. Additionally Cloud networks
advanced analytics by commercial applications in should be subjected to thorough penetration
near-real-time such as cloud. testing[5].
Data-driven research necessitates High performance
The value of cyber security data plays a major role in
computing. Big Data fuels the growth of HP data
constructing machine learning models. Value of a data
analysis[3]. Research on High Performance
is the predictive power of a given data model as well
Computing includes mainly networks, parallel and
as the type of hidden trends which reveal as a result of
high performance algorithms, programming
meticulous data analysis. The value of cyber security
paradigms and run-time systems for data science apart
data refers to the nature of data which can be positive
from other areas. High-performance computing
or negative. Positive data such as malicious network
(HPC) refers to systems that can rapidly solve difficult
traffic data either from malware or varied set of cyber
computational problems across a diverse range of
attacks hold higher value than data science problems
scientific, engineering, and business fields by virtue
as it can be used to build machine learning based
of their processing capability and storage capacity.
network security models. From cyber security view
HPC being at the forefront of scientific discovery and
point the predictive power of effective data models
commercial innovation, holds leading competitive
lies in the ability to differentiate normal network traffic
edge for nations and their enterprises[4]. India in an
from abnormal malicious traffic indicating active cyber
endeavour to meet its stated research and education
attack. Machine learning builds classifiers to identify
goals is making every effort towards doubling up its
network traffic as good or bad based on the analysis.
high performance computing capacity and is exploring
The spam filters are based on these techniques to
opportunities to integrate with global research and
identify normal emails from ad’s, phishing and other
education networks.
types of spam. Big Data helps build Classifiers to train
Cyber Security and Data Sciences a machine learning algorithm and also helps evaluate
The challenge of protecting sensitive data increased the classifiers performance. The positive data that a
exponentially in recent times because of the non spam classifier needs to detect is behavior exhibited
by a spam email. Similarly the network traffic aspects of our lives. At the same time these
exhibiting behavior of real cyber attacks is positive dependencies have also given rise to many security
data for a network security model. Negative data refers issues. The attackers in the cyber world are also getting
to normal data such as legitimate emails in case of more creative and ambitious in exploitation of
spam classifier and normal traffic data for a network techniques and causing real-world damages of major
security model. In both the cases the classifier should dimensions by making even proprietary as well as
be able to detect bad behavior without incorrectly personally identifiable information equally vulnerable.
classifying genuine mails or network traffic to be The problem is further compounded as designing
harmful. The various cyber security problems differ effective security measures in a globally expanding
on the basis of quick availability of positive data. In digital world is a demanding task. The issues to be
the case of spam emails positive data is easily available addressed include defining the core elements of the
in abundance for building a classifier. On the other cyber security, Virtual private network security
hand despite increased cyber attacks across various solutions, Security of wireless devices, protocols and
organizations positive data from real cyber attacks and networks, Security of key internet protocols,
malware infections can seldom be accessed. This is protection of information infrastructure and database
true for especially targeted attacks. The pace at which security. The advent of the Internet of Things (IoT)
the hackers modify their techniques to create also increased the need to step up cyber security. The
increasingly sophisticated attacks render libraries of Io T is a network of physical objects with embedded
malware samples quickly obsolete. In case of targeted technology to communicate, sense or interact with
attacks malware is custom built to steal or destroy data their internal states or the external environment where
in a secret manner. The predictive power of a machine a digitally represented object becomes something
learning model relies on the high value of positive greater than the object by itself or possesses ambient
samples in terms of its general nature for identifying intelligence. Despite its manifold advantages the rapid
potentially new cyber attacks. Additionally adoption of IoT by various types of organizations
performance on these models is highly influenced by escalated the importance of security and vulnerability.
the choice of features used to build them. The The computing world underwent a major
prerequisites for interpreting huge amount of positive transformation in terms of increased reliability,
samples are feature selection and appropriate training scalability, quality of services and economy with
techniques. The highly unbalanced nature of training emergence of cloud computing. Nevertheless, remote
data for a machine learning model is owing to negative storage of data in cloud away from owner can lead to
samples always being many orders of magnitude more loss of control of data. The success and wide spread
abundant than positive data samples. The application usage of cloud computing in future depends on
of proper evaluation metrics, sophisticated sampling effective handling of data security issues such as
methods and proper training data set balancing helps accountability, data provenance and identity and risk
us find out if we have the appropriate quantity of management. The face of Cyber security has changed
positive samples or not. The lengthy process of in the recent times with the advent of new technologies
collecting positive samples is one of the first and most such as the Cloud, the internet of things, mobile/
important tasks for building machine learning based wireless and wearable technology[1].
cyber security models. This is how big data is relevant The static data once contained within systems have
to cyber security[6]. now become dynamic and travel through a number
of routers, hosts and data centers. The hackers in cyber
Intelligent Cyber Security Solutions Powered criminals have started using Man-in-the-Middle
by HPC and Data Sciences attacks to eavesdrop on entire data conversations
The advances in Data Sciences and HPC have Spying software and Google Glass to track fingerprint
extended innumerable benefits and conveniences to movements on touch screens, Memory-scraping
our day to day activities and transformed the ongoing malware on point-of-sale systems, theft of specific data
digitization to deeply impact the social and economic by Bespoke attacks.
Context-aware behavioral analytics treats unusual technologies. The cloud based applications which are
behavior as a symptom of an ongoing nefarious activity beyond the realm of firewalls and traditional security
in the computer system. measures can be secured by using a combination of
These cases can no longer be handled by tool based encryption and intrusion detection technologies to
approaches fire walls or antivirus machines. The gain control of corporate traffic. Cloud data can be
previous solutions no more succeed in managing risk protected by Security assertion Markup language, an
in recent technologies, there is an imperative need for XML based open standard format, augmented with
brand new solutions. Analytics help in identifying encryption and intrusion detection technologies. This
unusual or abnormal behaviors. Behavior based also helps control corporate traffic.
analytics approaches include Bio Printing, mobile Proxy based systems designed through SAML secure
location tracking, behavioral profiles, third party Big access and traffic, log activity, watermark files by
Data and external threat intelligence. Now a days embedding security tags into documents and other
hackers carefully analyze a system defenses and use files for tracking their movement and redirect traffic
Trojan horses and due to the velocity volume and through service providers. Such solutions neither
variety of big data security breaches cannot be require software to load on endpoints nor changes to
identified well in time. Solutions based on new end user configurations. Any kind of suspicious
technologies combining machine learning and activity such as failed or unexpected logins etc are
behavioral analytics help detect breaches and trace the alerted by notifications. The security administrators
source. User profiling is built and machine behavior can instantaneously erase corporate information
pattern studied to detect new type of cyber attacks, without effecting personal data of users. Active defense
the emphasis is on providing rich user interfaces which measures such as counter intelligence gathering, sink
help in interactive exploration and investigation. These holing, honey pots and retaliatory hacking can be
tools can detect strange behavior and changes in data. adopted to track and attack hackers. Counter
This problem can be solved by Virtual dispersive intelligence gathering is a kind of reverse malware
technologies which split the message into several analysis in which a cyber expert secretly finds
encrypted parts and routed on different independent information about hackers and their techniques. Sink
servers, computers and/or mobile phones depending holing servers hand out non routable addresses for all
on the protocol. domains within sink hole. Malicious traffic is
intercepted and blocked for later analysis by experts.
This problem can be solved by Virtual dispersive
Isolated systems called Honey pots such as computer,
technologies which split the message into several
data or network sites are set up to attract hackers.
encrypted parts and routed on different independent
Cyber security analysts to catch spammers to prevent
servers, computers and/or mobile phones depending
attacks etc.. Retaliatory hacking is most dangerous
on the protocol.
security measure which usually considered illegal as it
The traditional bottlenecks are thus completely may require infiltration into a hacker community,
avoided. The data dynamically travels on optimum build a hacking reputation to prove the hacking group
random paths also taking into consideration network of your credentials. None of these things being legal
congestion and other issues as well. Hackers find it raises debate over active defense measures. Early
difficult to find data parts. Furthermore in order to warning systems forecast sites and server likely to be
prevent cyber criminals exploiting the weak point of hacked using machine learning algorithms. These
the technology which is the place where two endpoints systems are created with the help of machine learning
must connect to a switch to enable secure and data mining techniques. Most of the algorithms
communication, hidden switches are used by VDN take into the account a website software, traffic
making them hard to find. statistic, file system structure or webpage structure. It
Critical infrastructures can be protected by security uses a variety of other signature features to determine
measures and standards provided by Smart Grid the presence of known hacked and malicious websites.
Notifications can be sent to website operators and solutions could not prevent zero day attacks for
search engines to exclude the results. Classifiers should unidentified malware as they lack predictive power of
be designed to adapt to emerging threats. Such security data science. Data science effectively uses scientific
measure is growing in its scope. The more data that techniques to draw knowledge from data. The ongoing
absorbs the better will be its accuracy[7]. security breaches accentuate the need for new
approaches for identification and prevention of
The cyber threats in recent times necessitate state of
malware. The technological advances in data science
the art dynamic approach to threat management. The
which help develop contemporary cyber security
Cyber security threats rapidly changing with
solutions are storage, computing and behavior. The
technological advancements. An application
storage aspect eases the process of collection and
vulnerability free today may be exposed to a major
storage of huge data on which analytic techniques are
unanticipated attack tomorrow. A few of recent
applicable. On the other hand high performance
examples are of Adobe Flash vulnerability allowing
computing power assists machine learning techniques
remote code execution, NTP (Network Time
to build novel models for identification of malware.
Protocol) issue allowing denial-of-service attacks,
The behavioral aspect had shifted from identification
Cisco ASA firewall exposure allowing for denial-of-
of malware with signatures to identify the specific kind
service attacks, and Apple, thought for a long time to
of behaviors exhibited by an infected computer. Big
be invulnerable, releasing iOS 9, quickly followed by
data plays a key role analytical models which identify
additional releases to correct newly discovered
cyber attacks. Any rule based model based on machine
exposures. The dynamic threats are the key challenges
learning requires large number of data samples to be
to information security and necessitate dynamic
analyzed in order to unearth the set of characteristics
security approaches for their mitigation. Neither were
of a model. Subsequently data is required to cross
these a resultant of negligence on the part of affected
check and assess the performance of a model.
parties nor was it the result of a change affected by
these parties in the products. The information security Application of machine learning tools to enterprise
programs should be proactive, agile and adaptive. A security gives rise to a new set of solutions. These tools
few of the strategies for moving from static to a can analyze networks, learn about them, detect
dynamic is by making vulnerability checks a regular anomalies and protect enterprises from threats[9].
and frequent task with monthly external scans and
Machine learning increased in its popularity with the
internal scans conducted on same schedule or when
advent of high performance computing resources. This
software or configuration changes are made, whichever
has resulted in the development of off-the-shelf
happens first, paying attention to fundamentals such
machine learning packages which allow complex
as checking logs and auditing access rights. Firmware
machine learning algorithms to be trained and tested
updates should be top priority as many of the
on huge data samples. The aforementioned
exposures we face today result from issues found in
characteristics render machine learning as an
the firmware of devices attached to our network score
indispensable tool for developing cyber security
devices such as routers and firewalls, or Internet of
solutions. Machine learning is a broader data science
Things devices, such as printers and copiers. Threat
solution for detecting cyber attacks. Minor changes
sources should be studied on a regular basis[8].
in malware can leave Intrusion Prevention Systems
Data science techniques help in the prediction of types and Next-generation Fire wall perimeter security
of security threats decides reacting to these threats. solutions performing signature matching in network
Data sciences and cyber security were highly isolated traffic ineffective. The rigorous analytical methods of
disciplines until recent times. The cyber security data sciences differentiate abnormal behavior defining
solutions are usually based on signatures which use an infected machine after identifying normal behavior
pattern matching with prior identified malware to through repetitive usage. Therefore contemporary
capture cyber attacks. But these signature based cyber security solutions require big data samples and
advanced analytical methods to build data-driven by exploring the predictive power of machine
solutions for malware identification and detection of learning and data mining approaches.
cyber attacks. This results in spectacular improvement l Machine learning approaches require Big Data
of cyber security efficacy[10]. for training models.
Conclusions l Big Data can be efficiently processed in real time
using High Performance Computing.
l Cyber Security Solutions should be more
proactive and dynamic. l Cloud Computing, IoT can be highly risk prone
in the absence of effective security framework.
l Effective Cyber Security Solutions for future
l The Solution to Future security needs lies in
threats can be achieved by exploiting the
integrating the processing and storage power of
processing and storage power of High
High Performance Computing with predictive
Performance Computing.
power of machine learning and data mining
l Intelligent Cyber Security Solutions can be built techniques.
References
1. S. Maitra, “NCETIT’2017", iitmipu.ac.in, 2017. [Online]. Available: http://iitmipu.ac.in/wp-content/
uploads/2017/02/NCETIT-2017-Brochure.pdf. [Accessed: 14- Feb- 2017].
2. “HPC solutions for cyber security”, Eurotech.com, 2017. [Online]. Available: https://www.eurotech.com/
en/hpc/industry+solutions/cyber+security. [Accessed: 11- Feb- 2017].
3. C. Keliiaa and J. Hamlet, “National Cyber Defense High Performance Computing and Analysis: Concepts,
Planning and Roadmap”, Sandia National Laboratories, New Mexico, 2010.
4. S. Tracy, “Big Data Meets HPC”, Scientific Computing, 2014. [Online]. Available: http://
www.scientificcomputing.com/article/2014/03/big-data-meets-hpc. [Accessed: 11- Feb- 2017].
5. R. Covington, “Risk Awareness:The risk of data theft — here, there and everywhere”, IDG Contributor
Network, 2016.
6. D. Pegna, “Cybersecurity, data science and machine learning: Is all data equal?”, Cybersecurity and Data
Science, 2015.
7. “Hot-technologies-cyber-security”, cyberdegrees, 2017. [Online]. Available: http://www.cyberdegrees.org/
resources/hot-technologies-cyber-security/. [Accessed: 04-Feb- 2017].
8. R. Covington, “Risk Awareness:Is your information security program giving you static?”, : IDG Contributor
Network, 2015.
9. B. Violino, “Machine learning offers hope against cyber attacks”, Network World, 2016.
10. D. Pegna, “Cybersecurity and Data Science:Creating cybersecurity that thinks”, IDG Contributor Network,
2015.
The paper is organized as follow: section II highlights finding correlations or patterns among dozens of fields
the procedure of Machine Learning and Data Mining. in large relational databases. The following are areas
Section III describes the techniques of ML and DM. in which data mining technology may be applied or
Section IV presents and discusses the comparative further developed for intrusion detection
analysis of individual technique and related work.
l Development of data mining algorithms for
Section V presents the conclusion.
intrusion detection: Data mining algorithms can
II. Machine Learning and Data mining be used for misuse detection and anomaly
Procedure detection. The techniques must be efficient and
scalable, and capable of handling network data of
The ML and DM are two terms that are often confused high volume, dimensionality and heterogeneity.
because generally, they both have same techniques.
Machine Learning, a branch of artificial intelligence, l Association and correlation analysis and
was originally employed to develop hniques to enable aggregation to help select and build discriminating
computers to learn. Arthur Samuel in 1959 defined attributes: Association and correlation mining can
Machine Learning as a “field of study that gives be applied to find relationships between system
computers the ability to learn without being explicitly attributes describing the network data. Such
programmed”[3]. ML algorithm applies classification information can provide insight regarding the
followed by prediction, based on known properties selection of useful attributes for intrusion
learned from the training data. ML algorithms need a detection.
well defined problem from the domain where as DM l Analysis of stream data: Due to the transient and
focuses on the unknown properties in the data dynamic nature of intrusions and malicious attacks,
discovered priory. DM focuses on finding new and it is crucial to perform intrusion detection in the
interesting knowledge. An ML approach consists of data stream environment. It is necessary to study
two phases: training and testing. These phases include what sequences of events are frequently
classification of training data, feature selection, training encountered together, finding sequential patterns,
of the model and use of model for testing unknown and identify outliers.
data.
l Distributed data mining: Intrusions can be
Data mining is the process of analyzing data from launched from several different locations and
different perspectives and summarizing it into useful targeted to many different destinations.
information. Data mining software is one of a number Distributed data mining methods may be used to
of analytical tools for analyzing data. It allows users to analyze network data from several network
analyze data from many different dimensions or angles, locations in order to detect these distributed
categorize it, and summarize the relationships attacks.
identified. Technically, data mining is the process of
l Visualization and querying tools: Visualization mining works only on binary data i.e. an item was
tools should be available for viewing any either present in the transaction will be represented
anomalous patterns detected. Intrusion detection by 1 or 0 if not. But, in the real world applications,
systems should also have a graphical user interface data are either quantitative or categorical for which
that allows security analysts to pose queries Boolean rules are unsatisfactory. To overcome this
regarding the network data or intrusion detection limitation, Fuzzy Association Rule Mining was
results. introduced [10], which can process numerical and
categorical variables.
III. Techniques of ML and DM
An algorithm based on Signature Apriori method was
This section focuses on the various ML/DM
proposed by Zhengbing et al. [11] that can be applied
techniques for cyber security. Here, each technique is
to any signature based systems for the inclusion of
elaborated with references to the seminal work. Few
new signatures. The work of Brahmi [12] using
papers of each technique related to their applications
multidimensional Association rule mining is also very
to cyber security.
promising for creating signatures for the attacks. It
A. Artificial Neural Networks:Neural Networks follow showed the detection rate of attacks types DOS, Probe,
predictive model which are based on biological U2R and R2L as 99%, 95%, 75% and 87%
modeling capability and predicts data by a learning respectively. Association rule mining is used in
process. The Artificial Neural Networks (ANN) is NETMINE [35] for anomaly detection. It applied
composed of connected artificial neurons capable of generalization association rule extraction based on
certain computations on their inputs [4]. When ANN Genio algorithm for the identification of recurring
is used as classifiers, the each layer passes its output as items. The fuzzy association rule mining is used by
an input to the next layer and the output of the last Tajbakhsh et al. [38] to find the related patterns in
layer generates the final classification category. KDD 1999 dataset. The result showed good
performance with 100 percent accuracy and false
ANN are widely accepted classifiers that are based on
positive rate of 13%. But, the accuracy falls drastically
perceptron [5] but suffer from local minima and
with fall of FPR.
lengthy learning process. This technique of ANN is
used for as multi-category classifier for signature-based C. Bayesian Networks: A Bayesian is a graphical model
detection by Cannady [6]. He detected 3000 simulated based on probabilities which represents the variables
attacks from a dataset of events. The findings of the and their relationships [15], [16]. The network is
paper reported almost 93% accuracy and error rate designed with nodes as the continuous or discrete
0.070 root mean square. This technique is also used variables and the relationship between them is
by Lippmann and Cunningham [27] for anomaly represented by the edges, establishing a directed acyclic
detection. They used keyword selection based on graph. Each node holds the states of the random
statistics and fed it to ANN which provides posterior variable and the conditional probability form.
probability of attack as output. This approach showed
Livadas et al. [17] presented comparative results of
80% detection rate and hardly one false alarm per day.
various approaches to DOS attack. The anomaly
Also, a five-stage approach for intrusion detection was
detection approach is mainly reactive whereas
proposed by Biven et al. [8] that fully detected the
signature-based is proactive. They tried to detect
normal behavior but FAR is 76% only for some
botnets in Internet Relay Chat (IRC) traffic data. The
attacks.
analysis reported the performance of Bayesian
B. Association Rules and Fuzzy Association Rules: networks as 93% precision and very low FP rate of
Association Rule Mining was introduced by Agarwal 1.39%.Another IDS based on Bayesian networks
et.al. [9], as a way to find interesting co-occurrences classifiers was proposed by Jemili et al. [18] with
in super market data to find frequent set of items which performances of 89%, 99%, 21% and 7% for DOS,
bought together. The traditional association rule Probe, U2R and R2L respectively. Benferhat [19] also
used this approach to build IDS for DOS attack.
D. Clustering: Clustering is unsupervised technique labels of KDD dataset by Zhang et al. [26] with the
to find patterns in high-dimensional unlabeled data. use of Random Forests. The Random forest was used
It is used to group data items into clusters based on a as the proximity measure. The accuracy for the DOS,
similarity measure which are not predefined. Probe, U2R and R2L attacks were 95%, 93%, 90%
and 87% respectively. The FAR is 1%.
This technique was applied by Blowers and Williams
[20] to detect anomaly in KDD dataset at packet level. G. Evolutionary Computation: It is the collective name
They used DBSCAN clustering technique. The study for a range of problem-solving techniques like Genetic
highlighted various machine learning techniques for Algorithms, genetic programming, particle swarm
cyber security. Sequeira and Zaki [21] performed optimization, ant colony optimization and evolution
detection over shell commands data to identify whether strategies based on principles of biological evolution.
the user is a legitimate one or intruder. Out of various
The signature-based model was developed by Li [27]
approaches for sequence matching, the longest
with genetic algorithms used for evolving rules.
common sequence was the most appropriate one. They
Abraham et al. [28] also used genetic programming
stated the performance in terms of 80% accuracies
techniques to classify attacks in DARPA 1998
and 15% false alarm rate.
intrusion detection dataset.
E. Decision Trees: It is a tree like structure where the
H. Inductive Learning: It is a learning method where
leaf node represents or predicts the decision and the
learner starts with specific observations and measures,
non-leaf node represents the various possible
begins to detect patterns and regularities, formulates
conditions that can occur. The decision tree technique
some tentative hypothesis to be explored and ends up
has simple implementation, high accuracy and
with development of some general conclusion and
intuitive knowledge expression. This expression is large
theories. Inductive learning moves from bottom-up
for small trees and less for deeper and wider trees. The
that is from specific observations to broader
common algorithms for creating decision tree are ID3
generalizations and theories. Repeated Incremental
[22] and C4.5 [23].
Pruning to Produce Error Reduction RIPPER [29]
Kruegel and Toth [24] proposed clustering along with applies separate and conquer approach to induce rules
decision tree approach to build a signature detection in two-class problems. Lee et al. [31] provided a
system and compared its performance to framework for signature-based model using various
SNORT2.0.The speed up varies from 105% to 5 %, machine learning and data mining techniques like
depending on the traffic. This paper showed that the inductive learning, association rules, sequential pattern
combination of decision trees with clustering mining etc.
technique can prove an efficient IDS approach. The
I. Naïve Bayes: It is a simple probabilistic classification
decision tree approach using WEKA J48 program was
technique based on Bayes’ Theorem with an
also used in EXPOSURE [25] to detect the malicious
assumption of independence among predictors. In
domains like botnet command, scam hosts, phishing
simple terms, a Naive Bayes classifier assumes that the
sites etc. Its performance is satisfactory in terms of
presence of a particular feature in a class is unrelated
accuracy and FAR.
to the presence of any other feature.Panda and Patra
F. Ensemble Learning: It is a supervised machine [31] presented the comparison of Naïve Bayes with
learning paradigm where multiple learners are trained NN classifier and stated that Naïve Bayes performed
to solve the same problem. As compared with ordinary better in terms of accuracy but not in FAR. Amor et.
machine learning approaches which try to learn one al. [32] used Bayesian network as naïve bayes classifier.
hypothesis from training data, ensemble methods try The paper stated accuracy of 98% with less than 3%
to construct a set of hypotheses and combine them to false alarm rate.
use.
J. Support Vector Machine: A Support Vector Machine
An outlier detector was designed to classify data as (SVM) is a discriminative classifier formally defined
anomaly as well as to classify it to one of the attack by a separating hyper plane. In other words, given
labeled training data (supervised learning), the researches have used accuracy, detection rate, false
algorithm outputs an optimal hyper plane which alarm rate as the evaluation criteria. There have been
categorizes new examples. multiple approaches that are applied for both anomaly
and signature-based detection. Several approaches are
An SVM classifier was built to classify KDD 1999
appropriate for signature-based others are for anomaly
dataset by Li et. al.[33] using ant colony optimization
detection. But, the answer to the question about
for the trainee. This study showed 98% accuracy,
determination of most appropriate approach depends
however it is not performing well for U2R attacks.
on multiple factors like the quality of the training data,
RSVM(Robust Support Vector Machine) was used as
properties of that data, working of the system(online
anomaly classifier by Hu et. al.[34] which showed a
or offline) etc.
better performance with noise having 75% accuracy
with no false alarms. V. Conclusions
IV. Comparative Analysis And Discussion In this paper, we survey a wide spectrum of existing
studies on machine learning and data mining
The analysis of the work using of ML and DM for
techniques applied for the cyber security. Based on
cyber security highlights few facts about the growing
this analysis we then outline key factors that need to
research area in this field. From the comparative
be considered while choosing the technique to develop
analysis presented in Table 1, it is obvious that the
an IDS. These are the quality and properties of the
DARPA 1998, DARPA 1999, DARPA2000 KDD
training data, the system type for which the IDS has
1998, KDD 1999 are the favorite choices of most of
to be devised and the working nature and environment
the researchers for the dataset for IDS. Most of the
of the system. There is a strong need to develop strong for the cyber detection using some fast incremental
representative dataset augmented by network data level. learning ways.
There is also a need to regular updating of the models
References
1. M. Bhuyan, D. Bhattacharyya, and J. Kalita, “Network anomaly detection:Methods, systems and tools,”
IEEE Commun. Surv. Tuts., vol. 16, no. 1, pp. 303–336, First Quart. 2014.
2. Y. Zhang, L. Wenke, and Y.-A. Huang, “Intrusion detection techniques for mobile wireless networks,” Wireless
Netw., vol. 9, no. 5, pp. 545–556, 2003.
3. J. McCarthy, “Arthur Samuel: Pioneer in Machine Learning,” AI Magazine, vol. 11, no. 3, pp. 10-11, 1990.
4. K. Hornik,M. Stinchcombe, and H.White, “Multilayer feedforward networks are universal approximators,”
Neural Netw., vol. 2, pp. 359–366,1989.
5. F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,”
Psychol. Rev., vol. 65, no. 6,pp. 386–408, 1958.
6. J. Cannady, “Artificial neural networks for misuse detection,” in Proc 1998 Nat. Inf. Syst. Secur. Conf.,
Arlington, VA, USA, 1998, pp. 443–456.
7. R. P. Lippmann and R. K. Cunningham, “Improving intrusion detection performance using keyword selection
and neural networks,” Comput.Netw., vol. 34, pp. 597–603, 2000.
8. A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M. Embrechts, “Network-based intrusion detection
using neural networks,” Intell. Eng.Syst. Artif. Neural Netw., vol. 12, no. 1, pp. 579–584, 2002.
9. R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,”
in Proc. Int. Conf. Manage. Data Assoc. Comput. Mach. (ACM), 1993, pp. 207–216.
10. C. M. Kuok, A. Fu, and M. H. Wong, “Mining fuzzy association rules in databases,” ACM SIGMOD Rec.,
vol. 27, no. 1, pp. 41–46, 1998.
11. H. Brahmi, B. Imen, and B. Sadok, “OMC-IDS: At the cross-roads of OLAP mining and intrusion detection,”
in Advances in Knowledge Discovery and Data Mining. New York, NY, USA: Springer, 2012, pp. 13–24.
12. H. Zhengbing, L. Zhitang, and W. Junqi, “A novel network intrusion detection system (NIDS) based on
signatures search of data mining,” in Proc. 1st Int. Conf. Forensic Appl. Techn. Telecommun. Inf. Multimedia
Workshop (e-Forensics ‘08), 2008, pp. 10–16.
13. D. Apiletti, E. Baralis, T. Cerquitelli, and V. D’Elia, “Characterizing network traffic by means of the NetMine
framework,” Comput. Netw., vol. 53, no. 6, pp. 774–789, Apr. 2009.
14. A. Tajbakhsh, M. Rahmati, and A. Mirzaei, “Intrusion detection using fuzzy association rules,” Appl. Soft
Comput., vol. 9, pp. 462–469, 2009.
15. D. Heckerman, A Tutorial on Learning with Bayesian Networks. New York, NY, USA: Springer, 1998.
16. F. V. Jensen, Bayesian Networks and Decision Graphs. New York, NY, USA: Springer, 2001.
17. C. Livadas, R.Walsh, D. Lapsley, andW. Strayer, “Usingmachine learning techniques to identify botnet
traffic,” in Proc 31st IEEE Conf. Local Comput. Netw., 2006, pp. 967–974.
18. F. Jemili, M. Zaghdoud, and A. Ben, “A framework for an adaptive intrusion detection system using Bayesian
network,” in Proc. IEEE Intell. Secur. Informat., 2007, pp. 66–70.
19. S. Benferhat, T. Kenaza, and A. Mokhtari, “A Naïve Bayes approach for detecting coordinated attacks,” in
Proc. 32nd Annu. IEEE Int. Comput. Software Appl. Conf., 2008, pp. 704–709.
20. M. Blowers and J. Williams, “Machine learning applied to cyber operations,” in Network Science and
Cybersecurity. New York, NY, USA: Springer, 2014, pp. 55–175.
21. K. Sequeira and M. Zaki, “ADMIT: Anomaly-based data mining for intrusions,” in Proc 8th ACM SIGKDD
Int. Conf. Knowl. Discov. Data Min., 2002, pp. 386–395.
22. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986.
23. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufmann, 1993.
24. C. Kruegel and T. Toth, “Using decision trees to improve signature based intrusion detection,” in Proc. 6th
Int. Workshop Recent Adv. Intrusion Detect., West Lafayette, IN, USA, 2003, pp. 173–191.
25. L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE: Finding malicious domains using passive
DNS analysis,” presented at the 18th Annu. Netw. Distrib. Syst. Secur. Conf., 2011.
26. J. Zhang, M. Zulkernine, and A. Haque, “Random-forests-based network intrusion detection systems,”
IEEE Trans. Syst. Man Cybern. C: Appl. Rev., vol. 38, no. 5, pp. 649–659, Sep. 2008.
27. W. Li, “Using genetic algorithms for network intrusion detection,” in Proc. U.S. Dept. Energy Cyber Secur.
Group 2004 Train. Conf., 2004, pp. 1–8.
28. A. Abraham, C. Grosan, and C. Martin-Vide, “Evolutionary design of intrusion detection programs,” Int. J.
Netw. Secur., vol. 4, no. 3, pp. 328–339, 2007.
29. W. W. Cohen, “Fast effective rule induction,” in Proc. 12th Int. Conf. Mach. Learn., Lake Tahoe, CA, USA,
1995, pp. 115–123.
30. W. Lee, S. Stolfo, and K. Mok, “A data mining framework for building intrusion detection models,” in Proc.
IEEE Symp. Secur. Privacy, 1999, pp. 120–132.
31. M. Panda and M. R. Patra, “Network intrusion detection using Naïve Bayes,” Int. J. Comput. Sci. Netw.
Secur., vol. 7, no. 12, pp. 258–263, 2007.
32. N. B. Amor, S. Benferhat, and Z. Elouedi, “Naïve Bayes vs. decision trees in intrusion detection systems,” in
Proc ACMSymp. Appl. Comput., 2004, pp. 420–424.
33. Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, and K. Dai, “An efficient intrusion detection system based on support
vector machines and gradually feature removal method,” Expert Syst. Appl., vol. 39, no. 1, pp. 424–430,
2012.
34. W. J. Hu, Y. H. Liao, and V. R. Vemuri, “Robust support vector machines for anomaly detection in computer
security,” in Proc. 20th Int. Conf. Mach. Learn., 2003, pp. 282–289.
I. Introduction
Image Enhancement is one of the necessary step for
better analysis. There are various methods to improve
the contrast of images [1-3]. Fingerprints are unique
patterns, made by friction ridges (raised) and furrows
(recessed), which appear on the pads of the fingers
and thumbs. They form pressure on a baby’s tiny, Loops - prints that recurve back on themselves to form
developing fingers in the womb. The fingerprints are a loop shape. Divided into radial loops (pointing
unique. No two persons have been found to have the toward the radius bone, or thumb) and ulnar loops
same fingerprints — Fingerprints are even more (pointing toward the ulna bone or pinky), loops
unique than DNA, the genetic material in each of account for approximately 60 percent of pattern types.
our cells. Although identical twins can share the same Whorls - form circular or spiral patterns, like tiny
DNA - or at least most of it -they can’t have the same whirlpools. There are four groups of whorls: plain
fingerprints. Friction ridge patterns are grouped into (concentric circles), central pocket loop (a loop with
three distinct types—loops, whorls, and arches—each a whorl at the end), double loop (two loops that create
with unique variations, depending on the shape and an S-like pattern) and accidental loop (irregular
relationship of the ridges: shaped). Whorls make up about 35 percent of pattern
types.
Upender Kumar Agrawal* Arches - create a wave-like pattern and include plain
upeagrawal@gmail.com arches and tented arches. Tented arches rise to a sharper
Pragati Patharia** point than plain arches. Arches make up about five
pathariapragati@gmail.com percent of all pattern types.
Swati Kumari*** 2. Histogram Eqalization
swati.kumari3661@gmail.com
Histogram equalization (HE) is one of the popular
Mini Priya
technique for contrast enhancement of images. It is
minipriya9496@gmail.com
Guru Ghasidas Viswavidyalya, Bilaspur one of the well-known methods for enhancing the
IITM Journal of Management and IT
contrast of a given image in accordance with the PSNR = 20log10 (Max(Y(i,j) RMSE)
samples distribution. HE is a simple and effective
Greater the value of PSNR better the contrast
contrast enhancement technique which distributes
enhancement of the image.
pixel values uniformly such that enhanced image have
linear cumulative histogram. HE has been widely 3. Adaptive Histogram Equalization
applied when the image need enhancement, such as
Adaptive histogram equalization (AHE) is a image
medical image processing radar image processing,
processing technique used to improve contrast in
texture synthesis and speech recognition.
images [1-3]. It differs from ordinary histogram
It stretches the contrast of high histogram regions and equalization in the respect that the adaptive method
compresses the contrast of low histogram region. The computes several histograms, each corresponding to
goal of histogram equalization is to remap the image a distinct section of the image, and uses them to
grey levels so as to obtain a uniform (flat) histogram redistribute the lightness values of the image. It is
in the other words to enhance the image quality .HE therefore suitable for improving the local contrast and
based methods are reviewed and compared with image enhancing the definitions of edges in each region of
quality measurement (IQM) tools such as Peak Signal an image. However, AHE has a tendency to over
to Noise Ratio (PSNR) to evaluate contrast amplify noise in relatively homogeneous regions of an
enhancement. image. A variant of adaptive histogram equalization
called contrast limited adaptive histogram
Peak Signal to Noise Ratio (PSNR)
equalization (CLAHE) prevents this by limiting the
Let, X(i,j) is a source image that contains M by N amplification. The size of the neighbourhood region
pixels and a reconstructed image Y(i,j), where Y is is a parameter of the method. It constitutes a
reconstructed by decoding the encoded version of characteristic length scale: contrast at smaller scales is
X(i,j). In this method, errors are computed only on enhanced, while contrast at larger scales is reduced [4-
the luminance signal; so, the pixel values X(i,j) range 5].Due to the nature of histogram equalization, the
between black (0) and white (255)[6-7]. First, the result value of a pixel under AHE is proportional to
mean squared error (MSE) of the reconstructed image its rank among the pixels in its neighbourhood. This
is calculated. The root mean square error is computed allows an efficient implementation on specialist
from root of MSE. Then the PSNR in decibels (dB) is hardware that can compare the centre pixel with all
computed as; other pixels in the neighbourhood.
Fig 1: Sample variations of individual left hand thumb impression showing arches, loops and whorls.
5. Results And Comparision we enhance the fingerprint image using histogram and
The above discussed methodologies have been adaptive histogram techniques. Results from the above
implemented by using Matlab. For the testing purpose implementation are in described in the following
we have created two Image Database. At first we section.
captured fingerprint image using mobile camera then
Fig 2. Original image and its histogram, Histogram equalization and its histogram,
Adaptive histogram equalization and its histogram.
Comparision of PSNR
visual images. As conclusion, the proposed Technique that the PSNR of adaptive histogram equalization is
produces a fine fingerprint image quality. This graph more than histogram equalization.
shows the comparison of PSNR. The output shows
References
1. Z. M. Win and M. M. Sein, ¯Fingerprint recognition system for low quality images, presented at the SICE
Annual Conference, Waseda University, Tokyo, Japan, Sep. 13-18, 2011.
2. Dr. Muna F. Al-Samaraie, “A New Enhancement Approach for Enhancing Image of Digital Cameras by
Changing the Contrast”, International Journal of Advanced Science and Technology Vol. 32, July, 2011.pp.-
13-22.
3. Mustafa Salah Khalefa 1, Zaid Amin Abduljabar 2 and Huda Ameer Zeki, “Fingerprint Image Enhancement
by Develop Mehtre Technique”, Advanced Computing: An International Journal ( ACIJ ), Vol.2, No.6,
November 2011,pp.-171-182.
4. D. Ezhilmaran and M. Adhiyaman, “A Review Study on Fingerprint Image Enhancement Techniques”,
International Journal of Computer Science & Engineering Technology (IJCSET)Vol. 5 No. 06 Jun 2014,
ISSN : 2229-3345,625-631.
5. Darshan Charan Nayak, “Comparative Study of Various Enhancement Techniques for Finger Print Images”,
(IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) , 2015, ISSN
:0975-9646, 1900-1905.
6. C.Nandini and C.N.Ravikumar, “Improved fingerprint image representation for recognition,” International
journal of computer science and information technology, MIT Publication, Vol. 01-no.2, 2011,pp.59-64.
7. J.Choudhary, Dr.S.Sharma, J.S.Verma, “A new framework for improving low quality fingerprint images,”
international journal of computer technology and application. Vol.2, no.6, pp.1859 -1866,2011.
categories: (1) schemes that defraud numerous victims II. Literature Review
out of comparatively small amounts, such as several Vast research has been carried out in the field of data
hundred dollars, per victim; and (2) schemes that mining and fraud detection but the challenge in
defraud comparatively less numerous victims out of dealing with the increasing number of frauds remains
large amounts, such as thousands or millions of dollars the same. Data mining enables a user to seek valuable
per victim. information and their interesting relationships [24].
The objective of this paper is to describe generalized A number of data mining techniques are available such
architecture of Financial Fraud detection as well as as decision trees, neural networks (NN), Bayesian belief
the techniques of preventing the frauds. Special focus networks, case based reasoning, fuzzy rule-based
has been laid on Credit Card Financial Frauds. The reasoning, hybrid methods, logistic regression, text
remainder of the paper is divided in the following mining, feature selection etc. Financial fraud is a
sections: Section II deals with a detailed review of serious problem worldwide and more so in fast growing
literature. Section III deals with a framework for countries like China[21]. According to Kirkos et al.
Financial Fraud Detection. Section IV deals with Fraud [7], some estimates stated that fraud cost US business
detection in Credit Cards. Section V gives a concluding more than $400 billion annually. An innovative fraud
remark on the review carried out. detection mechanism was developed on the basis of
Zipf ’s Law with a purpose of assisting the auditors in suggested that novel combination of meta-heuristic
reviewing the bulbous volumes of datasets while at approaches, namely the genetic algorithms and the
the same time intending to identify any potential fraud scatter search when applied to real time data, may yield
records[26]. The study of Bolton and Hand [22] fraudulent transactions which are classified
provides a very good summary of literature on fraud correctly[5]. Padhy et al (2012) provided a detailed
detection problems. Some researchers used methods survey of data mining applications and its feature
such as ID3 decision tree, Bayesian belief, back- scope. A number of researchers also discussed the
propagation Neural Network to detect and report the application of data mining in anomaly detection [17,
financial frauds[7,12]. Fuzzy logic based techniques 19, 20, 23].
based on soft computing were also incorporated to
deal with the frauds [15, 16]. Panigrahi et. al.[25] III. Framework of FFD
suggested a four component fraud detection solution Methodological framework for review is a three step
with an idea to determine a set of suspicious process: i) Research Definition ii) Research
transactions and then predict the frauds by running Methodology and iii) research analysis. Research
Bayesian learning algorithm. Further, a set of fuzzy definition is a phase mining technique.Goal of the
association rules were extracted from a data set research is to create a classification framework for data
containing genuine and fraudulent transactions w.r.t mining techniques applicable to FFD. Research scope
credit cards to analyze and compare the frauds. It was here is the literature comprising application of data
mining techniques on FFD published from 1997 to Various data mining techniques used in credit card
2008. Phase to is the research methodology. In this fraud detection are logistic regression, support vector
phase the online academic databases are searched for machine and random forests. Credit card fraud
FFD. In each iteration these databses are filtered out detection scheme scans all the transactions inclusive
to obtain the articles that were published in the of fraudulent transactions[10]. Data obtained from
academic journals(1997-2008) and should present the data warehouse is divided into various dataset.
data mining techniques along with application to FFD. Dataset comprises of primary attributes (account
A detailed process for FFD has been depicted in Fig number, sale, purchase, date name and many others)
1. All the obtained articles consistency are verified and and derived attributes (for instance transactions
final result of classification is passed to third phase of grouped monthly). Derived attributes are not precise,
the framework. Research analysis phase includes the which causes approximation of results and therefore
analysis of the selected where the topic or area of not accurate information. Therefore derived attributes
research is identified for formulating the research goal are limitation to the credit card fraud detection scheme.
and definingg the scope of the performed research. The implemented architecture [Fig2] comprises of
Here identified research area: the academic reserch on database interface subsystem and credit card fraud
FFD that applies data papers to formulate conclusion (CCF) detection engine. The former enables the
and results based on the analysis of paper[8]. reading of transactions, i.e. it acts as an interface for
banking software.
IV. Fraud Detection in Credit Cards
In the CCF detection subsystem, the host server checks
Credit card fraud is sort of identity theft, where an
every transaction rendered to it using neural networks
unauthorized person makes fraudulent transactions.
and transactions business rules.
It can be classified into: Application fraud and
Behaviour fraud. Application fraud occurs when a V. Conclusion
fraudster gets a credit card issued from companies by
Data mining gained weightage in the areas where
providing false information[3]. It is very serious
finding the patterns, forecasting, discovery of
because victim may learn about the fraud too late.
knowledge etc., is required and becomes obligatory in
different industrial domains. Various techniques and internet users to internet users. A detailed review was
algorithms such as feature selection, classification, conducted to understand how these financial frauds
memory based reasoning, clustering etc., aids in fraud can be detected and avoided using data mining
detection in areanas of insurance, financial frauds etc.. techniques. A special reference to Credit card frauds
Financial sector has been majory affected ny fradulent was mentioned to understand the architecture of credit
activities due to increase in conversion rate of non- card fraud detection.
References
1. Bose, R.K. Mahapatra, “Business data mining — a machine learning perspective”, Information Management,
vol.39, no.3, pp.211–225, 2001.
2. Coalition against Insurance Fraud, “Learn about fraud,” http://www.insurancefraud.org/
learn_about_fraud.htm, Last accessed 23 January 2017.
3. Credit Card Fraud: An Overview, Legal Information Institute, web: https://www.law.cornell.edu/wex/
credit_card_fraud, Last Accessed: 23 January 2017.
4. D. Sánchez, M. A. Vila, L. Cerda, and J. M. Serrano, “Association rules applied to credit card fraud detection,”
Expert Syst. Appl., vol. 36, no. 2 PART 2, pp. 3630–3640, 2009.
5. E. Duman and M. H. Ozcelik, “Detecting credit card fraud by genetic algorithm and scatter search,” Expert
Syst. Appl., vol. 38, no. 10, pp. 13057–13063, 2011.
6. E. Joyner, “Enterprisewide Fraud Management”, Banking, Financial Services and Insurance, Paper 029, 2011
7. E. Kirkos, C. Spathis and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial
statement”, Expert Systems with Applications, vol.32, pp.995–1003, 2007.
8. E. W. T. Ngai, L. Xiu, and D. C. K. Chau, “Application of data mining techniques in customer relationship
management: A literature review and classification,” Expert Syst. Appl., vol. 36, no. 2 PART 2, pp. 2592–
2602, 2009.
9. FBI, Federal Bureau of Investigation, Financial Crimes Report to the Public Fiscal Year, Department of
Justice, United States, 2007, http://www.fbi.gov/publications/financial/fcs_report2007/
financial_crime_2007.htm.
10. F. N. Ogwueleka, “Data Mining Application In Credit Card Fraud Detection System”, Journal of Engineering
Science and Technology, vol. 6, no. 3, pp.311 – 322, 2011.
11. F.N. Ogwueleka, and H.C. Inyiama, “Credit card fraud detection using artificial neural networks with a
rule-based component’, The IUP Journal of Science and Technology, vol.5, no.1, pp.40-47, 2009.
12. J.E. Sohl and A.R. Venkatachalam, “A neural network approach to forecasting model Selection”, Information
& Management, vol.29, no.6, pp. 297–303, 1995.
13. J.L. Kaminski, “Insurance Fraud”, OLR Research Report, http://www.cga.ct.gov/2005/rpt/2005-R-0025.htm.
2004
14. “Mass Marketing Fraud(MMF)”, Strategy, Policy & Training Unit, Department of Justice, http://
www.justice.gov/criminal-fraud/mass-marketing-fraud, Last Accessed: 23 January 2017.
15. M. Delgado, D. Sa´nchez, and M.A. Vila, “Fuzzy cardinality based evaluation of quantified sentences”,
International Journal of Approximate Reasoning, vol.23, pp.23–66, 2000.
16. M. Delgado, N. Marý´n, D. Sa´nchez, and M.A.Vila, “Fuzzy association rules: General model and
applications”, IEEE Transactions on Fuzzy Systems, vol.11, no.2, pp.214–225, 2003.
17. N. Kaur, “Survey paper on Data Mining techniques of Intrusion Detection”, International Journal of Science,
Engineering and Technology Research, vol. 2, no. 4, pp. 799-804, 2013.
18. N. Padhy, P. Mishra, and R. Panigrahi, “The Survey of Data Mining Applications and Feature Scope”,
International Journal of Computer Science, Engineering and Information Technology, vol. 2, no. 3,pp. 43-58,
2012.
19. P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava and P.N.Tan, “Data mining for network intrusion
detection”, Proceedings of NSF Workshop on Next Generation Data Mining, pp. 21-30, 2002.
20. P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández and E. Vázquez, “Anomaly-based network intrusion
detection: Techniques, systems and challenges”, Computers and security, vol.28, no. 1, pp. 18-28, 2009.
21. P. Ravisankar, V. Ravi, G. Raghava Rao, and I. Bose, “Detection of financial statement fraud and feature
selection using data mining techniques,” “, Decision Support Systems, vol. 50, no. 2, pp. 491–500, 2011.
22. R. Bolton, and D. Hand, ‘Statistical fraud detection: A review”, Statistical Science, vol.17, pp.235–255,
2002.
23. S. Agrawal and J. Agrawal, “Survey on anomaly detection using data mining techniques,” Procedia Comput.
Sci., vol. 60, no. 1, pp. 708–713, 2015.
24. S. H. Weiss, and N. Indurkhya, “Predictive Data Mining: A Practical Guide”, , CA: Morgan Kaufmann
Publishers, 1998.
25. S. Panigrahi, A. Kundu, S. Sural, and A. Majumdar, “Credit card fraud detection a fusion approach using
Dempster–Shafer theory and bayesian learning”, Information Fusion, pp.354–363, 2009.
26. S.-M. Huang, D.C. Yen, L.-W. Yang and J.-S. Hua, “An investigation of Zipf ’s Law for fraud Detection”,
Decision Support Systems, vol.46, no. 1, pp. 70–83, 2008.
27. Tutorialspoint, “Data mining Tasks”, http://www.tutorialspoint.com/ data_mining/ dm_tasks.htm, Last
Accessed: 24 January 2017.
sentiment analysis using self organizing maps and ant performing the desired operations the data set should
clustering [17]; and text mining in various other fields be consistent with the system.
such as stock prediction [18], web mining [19], digital
library [20] and so on. C. Data Mining
After the document being converted into the
III. Text mining Models intermediate form data mining techniques can be
Generally text mining is a four step process which is applied to different type of data according (structured,
text preprocessing, data selection, data mining and post semi- structured and unstructured) to recognize
processing.. relationships and patterns. The various data mining
Finally, complete content and organizational editing techniques are discussed in detail in section IV.
before formatting. Please take note of the following
items when proofreading spelling and grammar: D. Data Post processing
It includes the tasks of evaluation and visualization of
A. Data Cleaning the knowledge coming out after performing text
The textual data available for mining is generally mining operations.
collected over web from the tweets, discussion forums
and blogs. The data set available from these sources is IV. Techniques Used in Data Mining
in various formats i.e. “unstructured”. We need to The progress of Information Technology has produced
“clean” the data by performing parsing of data, missing large amount of data and data repositories in diverse
value treatment, removing inconsistencies. After areas. The research made in databases has further given
performing the desired operations the data set should rise to the techniques used to store and process the
be consistent with the system. data for decision making. Thus, Data mining is a
B. Data selection and transformation process of finding useful patterns from large amount
The textual data available for mining is generally of data and is also termed as knowledge discovery
collected over web from the tweets, discussion forums process which states the knowledge mining or
and blogs. The data set available from these sources is extraction from large amount of data.
in various formats i.e. “unstructured”. We need to Machine Learning Algorithms
“clean” the data by performing parsing of data, missing
value treatment, removing inconsistencies. After l Unsupervised Machine Learning :It is a type of
machine learning algorithm that is used to draw
conclusion from datasets that consists of input data classifier. Rules are generated from it that further helps
without the labeled responses. The most familiar in making decisions.
unsupervised learning method is cluster analysis,
Types of classification models:
that is used for exploratory data analysis to find
hidden patterns or grouping in data. l Classification by decision tree induction
l Supervised Machine Learning Algorithm: It is a l Bayesian Classification
type of machine learning algorithm that uses a l Neural Networks
identified dataset (called the training dataset) in l Support Vector Machines (SVM)
order to make predictions. The training data set l Classification Based on Associations
comprises of input data and response values. From
this dataset, the supervised learning algorithm B. Clustering RulesTechnique:
searches for a model that can make predictions of It is the task of grouping objects in such a way that
the response values for a new dataset. A test dataset objects in the same group or cluster are similar in one
is often used to validate the model. Using larger sense or another to each other than to those objects
training datasets often yield models with higher present in another groups. Thus it is an identification
predictive power that can generalize well for new of similar classes of objects. By using clustering
datasets. techniques we can further identify dense and sparse
A. Classification Technique: regions in object space and can discover overall
distribution pattern and correlations among data
Classification is the commonly used data mining attributes. Types of clustering methods involves
technique that employs training dataset or pre-
classified data to generate a model that is used to l Partitioning Methods
classify records according to rules. This technique of l Hierarchical Agglomerative (divisive) methods
data mining is used to find out in which group each l Density based methods
data instance is related within a given dataset using
l Grid-based methods
the training dataset. It is used for classifying data into
different classes according to some constraints. Credit l Model-based methods
Risk analysis and fraud detection are the application
C. Association Rules Technique:
of this technique. This algorithm employs decision
tree or neural network-based classification algorithms. Association is a data mining technique that discovers
Classification is a Supervised learning that involves the probability of the co-occurrence of items in a
the following steps: collection. The relationships between co-occurring
items are expressed as association rules. These rules
Step 1: Rules are extracted using the learning algorithm are if/then statements that help uncover relationships
from (create a model of) the training data. The training between seemingly unrelated data in a relational
data are pre classified examples (class label is known database or other information repository. An example
for each example). of an association rule would be “If a customer buys a
Step 2: Evaluation of the rules on test data. Usually dozen eggs, he is 80% likely to also purchase milk.”
split known data into training sample (2/3) and test Therefore both eggs and milk together are associated
sample (1/3). with each other and are likely to be placed together to
increase the sales of both the product. Thus association
Step 3: Apply the generated rules on new data. rules helps industries and businesses to make certain
Thus, the classifier-training algorithm uses the pre- decisions, such as cross marketing, customer shopping,
classified examples to determine the set of parameters designing of catalogue etc. Association Rule algorithms
required for proper discrimination. The algorithm then should be able to generate rules with confidence values
encodes these parameters into a model called as a less than one. Although the number of possible
References
1. Ah Hwee Tan et al., “Text Mining: The state of the art and the challenges”, Proceedings of the Pakdd Workshop on
Knowledge Disocovery from Advanced Databases, pp. 65-70, 2000.
2. R. Feldman and I. Dagan. Kdt - knowledge discovery in texts. In Proc. of the First Int. Conf. on Knowledge
Discovery (KDD), pages 112–117, 1995.
3. Marti A. Hearst, Untangling text data mining, pp. 3-10, 1999, University of Maryland.
4. S.Grimes. “Unstructured data and 80 percent rule.” Carabridge Bridgepoints, 2008
5. H. P. Luhn, “A Business Intelligence System”, Ibm Journal of Research & Development, vol. 2, no. 4, pp. 314-319,
1958.
6. M. E. Maron, J. L. Kuhns, “On Relevance Probabilistic Indexing and Information Rctrieval”, Journal of the Acm,
vol. 7, no. 3, pp. 216-244, 1960.
7. Larsen, Bjornar, and Chinatsu Aone. “Fast and effective text mining using linear-time document clustering.”
Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM, 1999.
8. Jiang, Chuntao, et al. “Text classification using graph mining-based feature extraction.” Knowledge-Based Systems
23.4 (2010): 302-308.
9. Liu, Wei, and Wilson Wong. “Web service clustering using text mining techniques.” International Journal of
Agent-Oriented Software Engineering 3.1 (2009): 6-26.
10. Ronen Feldman, I. Dagan, H. Hirsh, “Mining Text Using Keyword Distributions”, Journal of Incelligent
Information Systems, vol. 10, no. 3, pp. 281-300, 1998.
11. J. Mothe, C. Chrisment, T. Dkaki, B. Dousset, D. Egret, “Information mining: use of the document dimensions
to analyse interactively a document set”, European Colloquium on IR Research: ECIR, pp. 66-77, 2001.
12. M. Ghanem, A. Chortaras, Y. Guo, A. Rowe, J. Ratcliffe, “A Grid Infrastructure For Mixed Bioinformatics Data
And Text Mining”, Computer Systems and Applications 2005. The 3rd ACS/IEEE International Conference,
vol. 29, pp. 41-1, 2005.
13. Haralampos Karanikas, C. Tjortjis, B. Theodoulidis, “An Approach to Text Mining using Information Extraction”,
Proc. Workshop Knowledge Management Theory Applications (KMTA 00, 2000.
14. Qinghua Hu et al., “A novel weighting formula and feature selection for text classification based on rough set
theory”, Natural Language Processing and Knowledge Engineering 2003. Proceedings. 2003 International Conference
on IEEE, pp. 638-645, 2003.
15. Nahm, Un Yong, and Raymond J. Mooney. “Mining soft-matching association rules.” Proceedings of the eleventh
international conference on Information and knowledge management. ACM, 2002.
16. Li, Nan, and Desheng Dash Wu. “Using text mining and sentiment analysis for online forums hotspot detection
and forecast.” Decision support systems 48.2 (2010): 354-368.
17. Chifu, Emil ªt, Tiberiu ªt Leþia, and Viorica R. Chifu. “Unsupervised aspect level sentiment analysis using Ant
Clustering and Self-organizing Maps.” Speech Technology and Human-Computer Dialogue (SpeD), 2015
International Conference on. IEEE, 2015.
18. Nikfarjam, Azadeh, Ehsan Emadzadeh, and Saravanan Muthaiyah. “Text mining approaches for stock market
prediction.” Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on.
Vol. 4. IEEE, 2010.
19. Kosala, Raymond, and Hendrik Blockeel. “Web mining research: A survey.” ACM Sigkdd Explorations Newsletter
2.1 (2000): 1-15.
20. Fuhr, Norbert, et al. “Digital libraries: A generic classification and evaluation scheme.” International Conference
on Theory and Practice of Digital Libraries. Springer Berlin Heidelberg, 2001.
(1) Poor Access Grant and Lack of Sufficient (6) Inadequate Authentication
Authorization Authentication this involves confirming the identity
Authorization it is a process where a requester is of an entity/person claiming that it is a trusted one.
allowed to perform an authorized action or to receive Sometimes a developer doesn’t provide a link for
a service. Often a web application grants the access of administrative access. Yet administrative access is
some of its features to specified users only. The web provided through another folder on the server. If a
application verifies the credentials of users trying to hacker identifies its path it becomes very easy to exploit
access these features through a Login page. This type the application.
of vulnerability exists in an application if users can
access these features without verification through (7) Spoofing
certain links or tabs and access other users’ accounts This is an attack where an attacker tries to masquerades
also. another program or user by falsifying the content/data.
Hacker injects malicious piece of code to replace the
(2) Poorly Implemented Functionality original content.
This kind of vulnerability exists in a website due to its
own code which results in harmful consequences such (8) Cross-Site Scripting
as password leak, consuming large amount of resources This type of attack is possible when a website
and giving access to administrative features. The containing input fields accepts scripts as well and leads
security breaches may lead to the disclosure of any to the phishing attack. The script gets stored in the
confidential or sensitive data from any web application. database and executed every time the page is attacked.
For example, <script>alert(message)</script>. Message
(3) Inadequate Exception and Error Handling could be a cookie also. When any user visits the page
Mechanisms and application searches for username or password,
The error messages and exception handling code the script will be executed.
should return only limited amount of information
which prevents an attacker to identify a place for SQL
(9) Denial of Service Attack
Injection. For Instance consider the following code. This kind of attack prevents normal users to access a
website. The attacker attempts to access database server
…catch(Exception e) {Console.WriteLine(e.Message);} and performs SQL injections on it so that database
If it is an SQL exception, this code cn display information becomes inaccessible. The attacker may also try to gain
related too database. access as normal user with wrong password. After few
(4) Brute Force Attack attempts the user is locked out. The attacker may also
gain access to web server and sends specially crafted
This is the process of trial and error in order to guess requests so that web server is crashed.
users’ credentials such as user name, password, security
questions for the purpose of hacking a user’s account. (10) SQL Injection
(5) Data/Information Leak It is an attack where any malicious script/code is
inserted into an instance of SQL server/database for
This kind of security breache may lead to the disclosure execution which eventually will try to fetch any
of any confidential or sensitive data from any web database information.
application. This vulnerability exists in web
applications as a result of improper use of technology (11) Poor Session Management
for developing application. It can cause revealing of If an attacker can predict a unique value that identifies
developer’s comments, source code, etc. It can give a particular user or session (session hijacking) he can
enough information to hacker for exploiting the use it to enter in the system as a genuine user. This
system. problem also occurs when logout activity just redirects
the user to home page without termination of current (20) Sidejacking [11]:
session. The old session IDs can be used for It is a hacking vulnerability where an attacker tries to
authorization. capture all the cookies and may even get access to the
user mailboxes etc.
(12) Application Configuration Settings
Certain configuration settings exist in a web (21) Social vulnerability (hacking), session
application by default such as debug settings, hijacking [4, 5, 10, 11]:
permissions, hardcoded user names, passwords and It is a popular hijacking mechanism where an attacker
admin account information. An attacker may use this gains unauthorized access to the information. xviii.
information to obtain unauthorized access. Mis-configuration [24]: in appropriate or inadequate
configuration of the web application may even lead to
(13) Cross site request forgery [6,7]: the security breaches.
It is a vulnerability which includes exploitation of a
website by transmitting unauthorized commands from (22) Absence of secure network infrastructure [9]:
a user that a website trusts. Thus it exploits the trust Absence of any intrusion detection or protection
of a website which it has on its user browser. system or failover systems etc may even lead to violation
of the security breaches.
(14) Xml injection [1]:
(23) Off the shelf components [9, 11]:
It is an attack where an attacker tries to inject xml
code with aim of modifying the xml structure thus These components are purchased from third party
violating the integrity of the application. vendors so there occurs a suspicion about their security
aspect.
(15) Malicious file execution [3]:
(24) Firewall intrusion detection system [8, 9,10]:
Web applications are often vulnerable to malicious file
A firewall builds a secured wall between the outside/
execution and it usually occurs the code execution
external network and the internal network which is
occurs from a non trusted source.
kept to be trusted.
(16) Cookie cloning [11]: (25) Path traversal [3]:
Where an attacker after cloning the user/browser It is a vulnerability where malicious untrusted input
cookies tries to change the user files or data or may causes non desirable changes to the path.
even harm the injected code.
(26) Command injection [3]:
(17) Xpath injection [3]: It is the injection of any input value which is usually
It occurs when ever a website uses the information embedded into the command to be executed.
provided by the user so as to construct an xml query
for xml data.
(27) Parameter manipulation [5]:
It is similar to XSS where an invader inserts malicious
(18) Cookie sniffing [11]: code/script into the web application.
It is a session hijacking vulnerability with the aim of (28) LDAP injection [3]:
intercepting the unencrypted cookies from web
applications. It is similar to SQL and Xpath injection where queries
are being targeted to LDAP server.
(19) Cookie manipulation [5]: (29) Bad code or fault in implementation [2]:
Here an attacker tries to manipulate or change the Iimproper coding or fault in the implementation of
content of the cookies and thus can cause any harm to the web application may even lead to the violation of
the data or he may even change the data. the security of the web application.
(30) Clickjacking [6]: working the mutants, one should be sincere enough
It is an attack where a user’s click may be hijacked so to incorporate them as injecting && (and) instead of
that the user would be directed to some other link || (or) or any such other modification may lead to fault
which may contain some malicious code. injection which could result in a security vulnerability
as vulnerabilities do not take semantics into
(31) Content injection [8, 6]: consideration [1]. This may even pose a challenge to
It is a vulnerability where an attacker loads some static the security testing of any such web application. Usage
content that may be some false content into the web of insecure cryptographic storage may even pose a
page. challenge to the web application security testing [1].
Security testing of web applications may face
(32) File injection [8]: repudiation attacks where any receiver is not able to
It refers to the inclusion of any unintended file and is prove that the data received came from a specific sender
a typical vulnerability often found in web applications. or from any other unintended source [1]. Also the
Example: remote file inclusion. web development languages which we use may lack in
enforcing the security policy which may even violate
Challenges faced by security testing of web the integrity and confidentiality of the web application
applications [11]. This may even pose a security risk. At times it is
One of the concerns of security testing of web also possible that an invader is able to launder more
applications is the development of automated tools information than intended, in such a case again this
for testing the security of web applications [3]. Increase may lead to the set back to the integrity of the data
in the usage of Rich Internet Applications (RIAs) also which could be another challenge for a security tester.
poses a challenge for security testing of web Conclusion
application. This is due to the fact that the crawling
techniques which are used for exploration of the web In this paper we have describes various kinds of security
applications used for earlier web applications do not vulnerabilities that may exist in a website if proper
fulfil the requirements for RIAs [3]. RIAs being more consideration is not taken during development. A
users friendly and responsive due to the usage of AJAX website developer must employ all possible measures
technologies. Another challenge could be the usage of to combat any known threats during the whole
unintended invalid inputs which may result in security development cycle of a website from its design,
attacks [1]. And these security breaches may lead to implementation to testing. If any security loop hole
extensive damage to the integrity of the data. While remains undetected hackers can use it for exploiting
the system.
References
1. An Approach Dedicated for Web Service Security Testing, S´ebastienSalva, Patrice Laurencot and IssamRabhi.
2010 Fifth International Conference on Software Engineering Advances.
2. Security Testing of Web Applications: a Search Based Approach for Cross-Site Scripting Vulnerabilities,
Andrea Avancini, Mariano Ceccato , 2011- 11th IEEE International Working Conference on Source Code
Analysis and Manipulation.
3. SUPPORTING SECURITY TESTERS IN DISCOVERING INJECTION FLAWS. Sven T¨urpe, Andreas
Poller, Jan Trukenm¨uller, J¨urgenRepp and Christian Bornmann, Fraunhofer-Institute for Secure Information
Technology SIT, Rheinstrasse 75,64295 Darmstadt, Germany, 2008 IEEE,Testing: Academic & Industrial
Conference - Practice and Research Techniques.
4. A Database Security Testing Scheme of Web Application, Yang Haixia ,Business College of Shanxi University,
Nan Zhihong, Scholl of Information Management,Shanxi University of Finance &Economics,china.
Proceedings of 2009 4th International Conference on Computer Science & Education.
5. Mapping software faults with web security vulnerabilities. Jose Fonseca and Marco Vieira. International
conference on Dependable Systems &Networks : Anchorage, Alaska,june 2008 IEEE.
6. D-WAV: A Web Application Vulnerabilities Detection Tool Using Characteristics of Web Forms. Lijiu Zhang,
Qing Gu, Shushen Peng, Xiang Chen, Haigang Zhao, Daoxu Chen State Key Laboratory of Novel Software
Technology, Department of Computer Science and Technology, Nanjing University. 2010 Fifth International
Conference on Software Engineering Advances.
7. Enhancing web page security with security style sheets Terri Oda and Anil Somayaji (2011) IEEE.
8. Security Testing of Web Applications: a Search Based Approach for Cross-Site Scripting Vulnerabilities,
Andrea Avancini, Mariano Ceccato , 2011- 11th IEEE International Working Conference on Source Code
Analysis and Manipulation.
9. Assessing and Comparing Security of Web Servers. Naaliel Mendes, AfonsoAraújoNeto, JoãoDurães, Marco
Vieira, and Henrique Madeira CISUC, University of Coimbra. 2008 14th IEEE Pacific Rim International
Symposium on Dependable Computing.
10. Firewall Security: Policies, Testing and Performance Evaluation. Michael R. Lyu and Lorrien K. Y. Lau.
Department of computer science and engineering. The Chinese University of Hong kong, Shatin, HK. 2000
IEEE.
11. Top 10 Free Web-Mail Security Test Using Session Hijacking Preecha Noiumkar, Thawatchai Chomsiri,
Mahasarakham University, Mahasarakham, Thailand. Third 2008 International Conference on Convergence
and Hybrid Information Technology. Development of Security Engineering Curricula at US Universities.Mary
Lynn Garcia, Sandia National Laboratories.1998 IEEE.
possible tracks that enable security and privacy in a Fig. 1 shows the basic stages of BDA process[14] .
malicious big data context. Cybenko and Landwehr[7] Initially, data to be analyzed is selected from real-time
stud-ied historical data from a variety of cyber- and streams of big data and is pre-processed (i.e. cleaned).
national security domains in United state such as This is called ETL (Extract Transform Load). It can
computer vulner-ability databases, offensive and take up to 60% of the effort of BDA, e.g., catering for
defense, co-evolution of wormbots such as Conficker inconsistent, incomplete andmissing values,
etc. They claim that security analytics can provide the normalizing, discretizing and reducing data, ensuring
ultimate solution for cyber-security. Cardenas et statistical quality of data through boxplots, cluster
al[9]provide details of how the security analytics analysis, normality testing etc., and understanding data
landscape is changing with the introduction and through descriptive statistics (correlations, hypothesis
widespread use of new tools to leverage large quantities testing, histograms etc.). Once data is cleaned, it is
of structured and unstructured data. It also outlines stored in BDA databases (cloud, mobile, network
some of the fundamental differences between security servers etc.) and analyzed with analytics. The results
analytics and traditional analytic. Camargo et al[10] are then shown in interactive dashboards using
research on the use of big data analytics for security computer visualization.
and analyze the perception of people for security. They
found that big data can indeed provide a long-term IV. Challenges in Security Analytics
solution for citizen’s security, in particular cyber The big data is a recent technology and has been widely
security. adopted to provide solutions to organsational decision
making[11]. One of the most important area to benefit
III. Big Data And The Basic Bda Process from the advancements in big data analytics is cyber
Big data is data whose complexity hinders it from being security. This area is now being stated as security
managed, queried and analyzed efficiently by the analytics. An important goal for security analytics is
existing database architectures[4]. The “complexity” to enable organisations to identify unknown indicators
of big data is defined through 4V’s: 1) volume – of attack, and uncover things like when compromised
referring to terabytes, petabytes, or even exabytes credentials are being used to bypass defenses[2].
(10006 bytes) of stored information, 2) variety – However, handling unstructured data and combing it
referring to the co-existence of unstructured, semi- with structured data to arrive at an accurate assessment
structured and structured data, and 3) velocity – is one of the big challenges in security analytics.
referring to the rapid pace at which big data is being
In the past, information security was really based on
generated and 4) veracity- to stress the importance of
event correlation designed for monitoring and
maintaining quality data within an organization.
detecting known attack patterns[9]. This model alone
The domain of Big Data Analytics (BDA) is concerned is no longer adequate as multidimensional cyber-
with the extraction of value from big data, i.e., insights attacks are dynamic and can use different tactics and
which are nontrivial and previously unknown, implicit techniques to find their way into and out of an
and potentially useful. These insights have a direct organization. In addition, the traditional set of security
impact on deciding or manipulating the current devices is designed and optimized to look for particular
business strategy [14]. The assumption is that patterns aspects of attacks: a network perspective, an attack
of usage, occurrences or behaviors exist in big data. perspective, a malware perspective, a host perspective,
BDA attempts to fit mathematical models on these a web traffic perspective, etc[12]. These different
patterns through different data mining techniques such technologies see isolated aspects of an attack and lack
as Predictive Analytics, Cluster Analysis, Association the bigger picture.
Rule Mining, and Prescriptive Analytics [13]. Insights
1. Cyber-attacks are extremely difficult to distinguish
from these techniques are typically represented on
or investigate, because until all the event data is
interactive dashboards and help corporations maintain
combined, it’s extremely hard to determine what
the competitive edge, increase profits, and enhance
an attacker is trying to accomplish[6,8].
their CRM.
thwarting an attack when it happens[4,5]. According and an efficient, low-maintenance solution that
to a very reputed organization providing security should scale up. Leverage IT investments by
solutions “Organizations are failing at early breach integrating with the existing IT environment and
detection, with more than 92 percent of breaches extending current controls and processes into Big
undetected by the breached organization.” It is clear Databases.
that we need to play a far more active role in protecting
6. As far as possible provide block layer encryption,
our organizations[8]. We need to constantly monitor
which will improve security but also enable big
what is going on within our infrastructure and have
data clusters to scale and perform[7,8].
an established, cyclical means of responding before
attacks wreak havoc on our networks and reputations. 7. Leverage security tools or third-party products.
Therefore, some of the primary requirements for the Tools may include SSL/TLS for secure
security analytics solution are: communication, Kerberos for node
authentication, transparent encryption for data-
1. Secure sensitive data entering Big database systems
at-rest[13].
and then provide control access to Protected data
by monitoring which applications and which users VI. Conclusion
gets access to which original data.
Security analytics is the new technical foundation of
2. Protection of sensitive data that maintains usable, an informed, reliable detection and response strategy
realistic values for accurate analytics and modeling for cyber attacks. Mature security organizations
on data in its encrypted form. recognize this and are leading with building their
security analytics capabilities today. A security analytics
3. Assure global regulatory compliance. Securely
system combines and integrates the traditional ways
capture, analyze and store data from global
of cyber threat detection to provide security analysts a
sources, and ensure compliance with international
platform with both enterprise-scale detection and
data security, residency and privacy regulations.
investigative capabilities. It will not only help identify
Address compliance comprehensively, not system-
events that are happening now, but will also assess the
by-system.
state of security within the enterprise in order to predict
4. Optimize performance and scalability. what may occur in the future and enable more
5. Integrate data security, with quick implementation proactive security decisions.
References
1. Gahi, Y., Guennoun, M., & Mouftah, H. T. (2016, June). Big Data Analytics: Security and privacy challenges.
In Computers and Communication (ISCC), 2016 IEEE Symposium on (pp. 952-957). IEEE.
2. Verma, R., Kantarcioglu, M., Marchette, D., Leiss, E., & Solorio, T. (2015). Security analytics: essential data
analytics knowledge for cybersecurity professionals and students. IEEE Security & Privacy, 13(6), 60-65.
3. Oltsik, J. (2013). The Big Data Security Analytics Era Is Here. White Paper, Retrieved from https://
www.emc.com/collateral/analyst-reports/security-analytics-esg-ar.pdf on on 30th December, 2016
4. Shackleford D. (2013). SANS Security Analytics Survey, WhitePaper, SANS Institute InfoSec Reading Room.
Downloaded on 30th December, 2016.
5. Gawron, M., Cheng, F., & Meinel, C. (2015, August). Automatic detection of vulnerabilities for advanced
security analytics. In Network Operations and Management Symposium (APNOMS), 2015 17th Asia-Pacific
(pp. 471-474). IEEE.
6. Gantsou, D. (2015, August). On the use of security analytics for attack detection in vehicular ad hoc networks.
In Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC), 2015 International
Conference on (pp. 1-6). IEEE.
7. Cybenko, G., & Landwehr, C. E. (2012). Security analytics and measurements. IEEE Security & Privacy,
10(3), 5-8.
8. Cheng, F., Azodi, A., Jaeger, D., & Meinel, C. (2013, December). Multi-core Supported High Performance
Security Analytics. In Dependable, Autonomic and Secure Computing (DASC), 2013 IEEE 11th International
Conference on (pp. 621-626). IEEE.
9. Cardenas, A. A., Manadhata, P. K., & Rajan, S. P. (2013). Big data analytics for security. IEEE Security &
Privacy, 11(6), 74-76.
10. Camargo, J. E., Torres, C. A., Martínez, O. H., & Gómez, F. A. (2016, September). A big data analytics
system to analyze citizens’ perception of security. In Smart Cities Conference (ISC2), 2016 IEEE International
(pp. 1-5). IEEE.
11. Alsuhibany, S. A. (2016, November). A space-and-time efficient technique for big data security analytics. In
Information Technology (Big Data Analysis)(KACSTIT), Saudi International Conference on (pp. 1-6). IEEE.
12. Rao, S., Suma, S. N., & Sunitha, M. (2015, May). Security Solutions for Big Data Analytics in Healthcare.
In Advances in Computing and Communication Engineering (ICACCE), 2015 Second International Conference
on (pp. 510-514). IEEE.
13. Marchetti, M., Pierazzi, F., Guido, A., & Colajanni, M. (2016, May). Countering Advanced Persistent
Threats through security intelligence and big data analytics. In Cyber Conflict (CyCon), 2016 8th International
Conference on (pp. 243-261). IEEE.
14. T. Mahmood and U. Afzal, “Security Analytics: Big Data Analytics for cyber-security: A review of trends,
techniques and tools,” 2nd National Conference on Information Assurance (NCIA), 2013
constructed (the underlying routing structure). Some of the tree based multicast routing protocols
According to this method, existing multicast routing are, bandwidth efficient multicast routing protocol
approaches for MANETs can be divided into tree based (BEMRP) [3], multicast zone routing protocol
multicast protocols, mesh based multicast protocols (MZRP) [4], multicast core extraction distributed ad
and hybrid multicast protocols. hoc routing protocol (MCEDAR) [5], differential
destination based multicast protocol (DDM) [6], ad
In tree-based protocols, there is only one path between
hoc multicast routing protocol utilizing increasing id
a source-receiver pair. It is efficient but main drawback
numbers (AMRIS) [7], and ad hoc multicast routing
of these protocols is that they are not robust enough
protocol (AMRoute) [8].
to operate in highly mobile environment. [2]
Depending on the number of trees per multicast group, Bandwidth-Efficient Multicast Routing
tree based multicast can be further classified as source Protocol (BEMRP)
based multicast tree and group shared multicast tree. It tries to find the nearest forwarding nodes, rather
In source tree based multicast protocols, the tree is than the shortest path between source and receiver.
rooted at the source, whereas in shared-tree-based Hence, it reduces the number of data packet
multicast protocols, a single tree is shared by all the transmissions. To maintain the multicast tree, it uses
sources within the multicast group and is rooted at a the hard state approach in which control packets are
node referred to as the core node. The source tree based transmitted (to maintain the routes) only when a link
multicast perform better than the shared tree based breaks, resulting in lower control overhead, but at the
protocol at heavy load because of efficient traffic cost of a low packet delivery ration. In BEMRP, the
distribution, But the latter type of protocol are more receiver initiates the multicast tree construction. When
scalable. The main problem in a shared tree based a receiver wants to join the group, it initiates flooding
multicast protocol is that it heavily depends on the of Join control packets the existing members of the
core node, and hence, a single point failure at the core multicast tree, on receiving these packets, respond with
node affects the performance of the multicast protocol. Reply packets. When many such Reply packet reach
the requesting node, it chooses one of them and sends is repeated until a tree node is found (see Figure. 2). If
a Reserve packet on the path taken by the chosen Reply no reply message returns to P, a localized broadcast is
packet. used.
packet to some receivers, and will partition the address When a node has less than R/2 parents, it periodically
list to distinct parts for each chosen next hop. issues new join messages to get more parents. When a
data packet arrives at an mgraph member, the member
In order to reduce the packet size, DDM can operate
only forwards the packet to those nearby member core
in soft-state mode. Each node in soft-state mode
nodes that it knows.
records the set of receivers for which it has been the
forwarder. Each multicast packet only describes the Mesh-based protocols may have more than one path
change of the address list since the last forwarding by between a source-receiver pair thereby provide
a special DDM block in the packet header. For redundant routes for maintaining connectivity to
instance, if R4 moves to another place and loses group members. Because of the availability of multiple
connection to R3, the DDM block in the packet paths between the source and receiver mesh based
header describes that R4 is removed. Then B knows protocols are more robust compared to tree based.[2]
that it only has to forward the packet to R3.
On-Demand Multicast Routing Protocol
Multicast Core-Extraction Distributed Ad Hoc (ODMRP)
Routing (MCEDAR) ODMRP provides richer connectivity among group
MCEDAR is a multicast extension to the CEDAR members and builds a mesh for providing a high data
architecture which provides the robustness of mesh delivery ratio even at high mobility. It introduces a
structures and the efficiency of tree structures. “forwarding group” concept to construct the mesh and
MCEDAR uses a mesh as the underlying a mobility prediction scheme to refresh the mesh only
infrastructure, but the data forwarding occurs only necessarily.
on a sender-rooted tree. MCEDAR is particularly
The first sender floods a join message with data payload
suitable for situations where multiple groups coexist
piggybacked. The join message is periodically flooded
in a MANET.
to the entire network to refresh the membership
At first, MCEDAR partitions the network into disjoint information and update the multicast paths. An
clusters. Each node exchanges a special beacon with interested node will respond to the join message. Note
its one hop neighbors to decide that it becomes a that the multicast paths built by this sender are shared
dominator or chooses a neighbor as its dominator. A with other senders. In other words, the forwarding
dominator and those neighbors that have chosen it as node will forward the multicast packets from not only
a dominator form a cluster. A dominator then becomes this sender but other senders in the same group (see
a core node and issues a message to nearby core nodes Figure. 7).
for building virtual links between them. All the core
Due to the high overhead incurred by flooding of join
nodes form a core graph.
messages, a mobility prediction scheme is proposed
When a node intends to join a group, it delegates its to find the most stable path between a sender-receiver
dominating core node P to join the appropriate pair. The purpose is to flood join messages only when
mgraph instead of itself. An mgraph is a subgraph of the paths indeed have to be refreshed. A formula based
the core graph and is composed of those core nodes on the information provided by GPS (Global
belonging to the same group. P joins the mgraph by Positioning System) is used to predict the link
broadcasting a join message which contains a joinID. expiration time between two connected nodes. A
Only those members with smaller joinIDs reply an receiver sends the reply message back to the sender via
ACK message to P (see Figure. 6). Other nodes the path having the maximum link expiration time.
receiving the join message forward it to their nearby
core nodes. An intermediate node Q only accepts at A Dynamic Core Based Multicast Routing
most R ACK messages where R is a robustness factor. Protocol (DCMP)
Q then puts the nodes from which it receives the ACK DCMP aims at mitigating the high control overhead
message into its parent set and the nodes to which it problem in ODMRP. DCMP dynamically classifies
forwards the ACK message into its child set.
the senders into different categories and only a portion ACMRP proposes a novel mechanism to re-elect a new
of senders need issue control messages. In DCMP, core node which is located nearby all members
senders are classified into three categories: active regularly. The core node periodically floods a query
senders, core senders, and passive senders. Active message with TTL set to acquire the group
senders flood join messages at regular intervals. Core membership information and lifetime of its
senders are those active senders which also act as the neighboring nodes. The core node will select the node
core node for one or more passive senders. A passive that has the minimum total hop count of routes toward
sender does not flood join messages, but depends on a group members among neighboring nodes as the new
nearby core sender to forward its data packets. The core node.
mesh is created and refreshed by the join messages
issued by active senders and core senders. Multicast Protocol for Ad Hoc Networks with
Swarm Intelligence (MANSI)
All senders are initially active senders. When a sender
S has packets to send, it floods a join message. Upon MANSI relies on only one core node to build and
receiving this message, an active sender P delegates S maintain the mesh and applies swarm intelligence to
to be its core node if P is close to S and has smaller ID tackle metrics like load balancing and energy
than S. Afterwards, the multicast packets sent by S conservation. Swarm intelligence refers to complex
will be forwarded to P first and P relays them through behaviors that arise from very simple individual
the mesh. behaviors and interactions. Although each individual
has little intelligence and simply follows basic rules
Adaptive Core Multicast Routing Protocol using local information obtained from the
(ACMRP) environment, globally optimized behaviors emerge
when they work collectively as a group. MANSI utilizes
ACMRP presents an adaptive core mechanism in
this characteristic to lower the total cost in the multicast
which the core node adapts to the network and group
session.
status. In general mesh-based protocols, the mesh
provides too rich connectivity and results in high The sender that first starts sending data takes the role
delivery cost. Hence, ACMRP forces only one core of the core node and informs all nodes in the network
node to take responsibility of the mesh creation and of its existence. Reply messages transmitted by
maintenance in a group. The adaptive core mechanism interested nodes construct the mesh. Each forwarding
also handles any core failure caused by link failures, node is associated with a height which is identical to
node failures, or network partitions. the highest ID of the members that use it to connect
to the core node. After the mesh creation, MANSI
A new core node of a group emerges when the first
adopts the swarm intelligence metaphor to allow nodes
sender has multicast packets to send. The core node
to learn better connections that yield lower forwarding
floods join messages and each node stores this message
cost. Each member P except the core node periodically
into its local cache. Interested members reply a JREP
deploys a small packet, called FORWARD ANT,
message to the core node. Forwarding nodes are those
which opportunistically explores better paths toward
nodes who have received a JREP message. If a sender
the core.
only desires to send packets (it’s not interested in
packets from other senders), it sends an EJREP message A FORWARD ANT stops and turns into a
back to the core node. Those nodes receiving this BACKWARD ANT when it encounters a forwarding
EJREP message only forward data packets from this node whose height is higher than the ID of P. A
sender. If a new sender wishes to send a packet but has BACKWARD ANT will travel back to P via the reverse
not connected to the mesh, it encapsulates the packet path. When the BACKWARD ANT arrives at each
toward the core node. The first forwarding node strips intermediate node, it estimates the cost of having the
the encapsulated packet and sends the original packet current node to join the forwarding set via the
through the mesh. forwarding node it previously found. The estimated
cost, as well as a pheromone amount, is updated on The Core-Assisted Mesh Protocol (CAMP)
the node’s local data structure. The pheromone CAMP is a receiver-initiated protocol. It assumes that
amounts are then used by subsequent FORWARD an underlying unicast routing protocol provides correct
ANTs that arrive at this node to make a decision which distances to known destinations. CAMP establishes a
node they will travel to next. mesh composed of shortest paths from senders to
MANSI also incorporates a mobility-adaptive receivers. One or multiple core nodes can be defined
mechanism. Each node keeps track of the normalized for each mesh, and core nodes need not be part of the
link failure frequency (nlff ) which reflects the dynamic mesh, and nodes can join a group even if all associated
condition of the surrounding area. If the nlff exceeds core nodes are unreachable.
the threshold, the node will add another entry for the It is assumed that each node can reach at least one
second best next hop into its join messages. Then the core node of the multicast group which it wants to
additional path to the core node increases the reliability join. If a joining node P has any neighbor that is a
of MANSI. mesh node, then P simply tells its neighbors that it is
a new member of the group. Otherwise, P selects its
Neighbor Supporting Ad Hoc Multicast
next hop to the nearest core node as the relay of the
Routing Protocol (NSMP)
join message. Any mesh node receiving the join
NSMP utilizes the node locality concept to lower the message transmits an ACK message back to P. Then P
overhead of mesh maintenance. For initial path connects to the mesh. If none of the core nodes of the
establishment or network partition repair, NSMP group is reachable, P broadcasts the join message using
occasionally floods control messages through the an expanded ring search.
network. For routine path maintenance, NSMP uses
local path recovery which is restricted only to mesh For ensuring the shortest paths, each node periodically
nodes and neighbor nodes for a group. looks up its routing table to check whether the
neighbor that relays the packet is on the shortest path
The initial mesh creation is the same with that in to the sender. The number of packets coming from
MANSI. Those nodes (except mesh nodes) that detect the reverse path for a sender indicates whether the node
reply messages become neighbor nodes, and neighbor is on the shortest path. A special message will be issued
nodes do not forward multicast packets. After the mesh to search a mesh node and the shortest path can be re-
creation phase (see Figure. 11), all senders transmit established. At last, to ensure that two or more meshes
LOCAL_REQ messages to maintain the mesh at eventually merge, all active core nodes periodically send
regular interval. Only mesh nodes and neighbor nodes messages to each other and force nodes along the path
forward the LOCAL_REQ messages. In order to that are not members to join the mesh.
balance the routing efficiency and path robustness, a
receiver receiving several LOCAL_REQ messages III. Present Status of Multicast Routing
replies a message to the sender via the path with largest Protocols
weighted path length. Multicasting is a mechanism in which a source can
Since only mesh nodes and neighbor nodes accept send the same communication to multiple
LOCAL_REQ messages, the network partition may destinations. In multicast routing a multicast tree is
not be repaired. Hence, a group leader is elected among to be found out to a group of destination nodes along
senders and floods request messages through the which the information will be disseminated to different
network periodically. Network partition can be nodes in parallel. Multicast routing is more efficient
recovered by the flooding of request messages. When as compared to unicast because in this data is forwarded
a node P wishes to join a group as a receiver, it waits to many intended destination in one go rather than
for a LOCAL_REQ message. If no LOCAL_REQ sending individually. At the same time it is not as
message is received, P locally broadcasts a MEM_REQ expensive as broadcasting in which the data is flooded
message. to all the nodes in the network. It is extremely suitable
for a bandwidth constrained network like MANET.
Traditional multicast routing protocols for wireless by multiple factors of noise jamming, signal
network cannot be implemented as it is in mobile ad- interference and etc, the actually available effective
hoc network which poses new problems and challenges bandwidth for mobile terminals will be much smaller
for the design of an efficient algorithm for MANET. than the maximum bandwidth value in theory.
Mobile Ad Hoc network mainly showed the following The limitation of mobile terminal: although the user
aspects: terminals in mobile Ad Hoc network have
Dynamic network topology structure: In mobile Ad characteristics of smart and portable, they use the
Hoc network, the node has a arbitrary mobility, the fugitive energy like battery as their power and with a
network topology structure may change at any time, CPU of lower performance and smaller memory,
and this change mode and speed are difficult to predict. especially each of the host computers doubles the
router, hence, there are quite high requirements on
Limited bandwidth transmission: Mobile Ad Hoc routing protocols.
network applies wireless transmission technology as
its communication means, it has a lower capacity Distributed control: there is no central control point in
relative to the wireless channel. Furthermore, affected mobile Ad Hoc network, all the user terminals are equal,
and the network routing protocols always apply the node failure. For example if any nodes moves out of
distributed control mode, so it has stronger robustness transmission range dividing tree into two or more sub-
and survivability than center-structured network. tree which makes the communication difficult among
Multihop communication: as the restriction of wireless all the nodes in the tree. In addition the overhead
transceiver on signal transmission range, the mobile involved in maintaining the multicast tree is relatively
Ad Hoc network is required to support multihop larger as compared to other protocols.
communication, which also brings problems of hidden Resource requirement for mesh based multicast routing
terminals, exposed terminals, equity and etc. protocols is much larger as compared to tree based
Security: as the application of wireless signal channel, protocols. It also suffers from routing loop problems
wired power, distributed control and etc, it is and special measures are taken to avoid such problems
vulnerable to be threatened by security, such as which incur extra overhead on the overall
eavesdropping, spoofing, service rejecting and etc communication system.
attacking means. The biggest advantage of such protocols are their
Till date so many multicast routing protocols have robustness, if one link fails it will not affect the entire
been proposed and they have their own advantages communication system. Therefore such protocols are
and disadvantages to adapt to different environments. suitable for harsh environments where topology of the
Therefore the hope for a standard multicast routing network is changing very rapidly.
protocol which will be suitable for all network scenarios Hybrid routing protocol is a combination of both the
is highly unrealistic. tree and mesh and is suitable for an environment with
At the same time, it is very difficult to confirm moderate mobility. It is as efficient as tree based
multicast routing algorithms or protocols adapted to protocols and at the same time it survives the frequent
specific application fields for mobile Ad Hoc network, breaks in the network due to high mobility of nodes.
because the application of Ad Hoc network requires a A comparison of all multicast routing protocols discussed
combination and integration of the fixed network with above has been summarized in Table1 at the end.
the mobile environment. So there still needs a deeper
research of multicast application in the mobile Ad Hoc V. Conclusion
network environment. Mobile Ad hoc network faces variety of challenges like
IV. Comparison Of Multicast Routing Dynamic network topology structure, Limited
Protocols bandwidth transmission, The limitation of mobile
The design goal of any multicast routing protocol to terminal, Distributed control, Multihop
transmit information to all intended nodes in an communication and Security therefore routing is more
optimum way and incur minimum redundancy in the difficult in such challenging environment as compare
process. to other networks.
All the protocols try to deal with many problems like Multicast routing is a mode of communication in
nodes mobility, looping, routing imperfections, which data is sent to group of users by using single
whether on demand construction, routing update, the address. On one hand, the users of mobile Ad Hoc
control over packet transmission methods (net-wide Network need to form collaborative working groups
flooding broadcast or broadcast subjected to member and on the other hand, this is also an important means
nodes) etc. of fully using the broadcast performances of wireless
In all tree based multicast routing protocols a unique communication and effectively using the limited
path is obtained between any pair of nodes which saves wireless channel resources.
the bandwidth required for initializing muticast tree This paper summarizes and comparatively analyzes the
as compared to bandwidth requirement of any other routing mechanisms of various existing multicast
structure. The disadvantage of these protocols is the routing protocols according to the characteristics of
survivability of communication system in case of link/ mobile Ad Hoc network.
References
1. T. Nadeem, and S. Parthasarathy, “Mobility Control for Throughput Maximization in Ad hoc Networks,” Wireless
Communication and Mobile Computing, Vol. 6, pp. 951 967, 2006.
2. CHEN-CHE HUANG AND SHOU-CHIH LO, “A Comprehensive Survey of Multicast Routing Protocols for Mobile
Ad Hoc Networks”
3. T. Ozaki, J.B. Kim, and T. Suda, “Bandwidth efficient Multicast Routing for Multi hop Ad hoc Networks,” in Proceedings
of IEEE INFOCOM, Vol. 2, pp. 1182 1191, 2001.
4. X. Zhang, L. Jacob, “MZRP: An Extension of the Zone Routing Protocol for Multicasting in MANETs,” Journal of
Information Science and Engineering, Vol. 20, pp. 535 551, 2004.
5. P. Sinha, R. Sivakumar, and V. Bharghavan, “MCEDAR: Multicast Core Extraction Distributed Ad hoc Routing,”
IEEE Wireless Commun. and Net.Conf. (WCNC), pp. 13131317, 1999.
6. L. S. Ji and M.S. Corson, “Differential Destination Multicast a MANET Multicast Routing for Multihop Ad hoc
Network, in Proceedings of IEEE INFOCOM, Vol. 2, pp. 11921201, 2001.
7. C. W. Wu, Y. C. Tay, C. K. Toh, “Ad hoc Multicast Routing Protocol Utilizing Increasing IdNumberS (AMRIS)
Functional Sspecification,” Internet-Draft, draft-ietf-manet-amris-spec-00.txt, 1998.
8. J. Xie, R. Talpade, T. McAuley, and M. Liu, “AMRoute: Ad hoc Multicast Routing Protocol,” ACM Mobile Networks
and Applications (MONET) Journal, Vol. 7, No.6, pp. 429439, 2002.
9. E. M. Royer and C. E. Perkins, “Multicast Operation of the Ad-hoc On-demand Distance Vector Routing Protocol”, in
Proc. ACM MOBICOM, pp. 207-218, Aug. 1999.
10. C. E. Perkins and E. M. Royer, “Ad-hoc On-demand Distance Vector Routing”, in Proc. IEEE WMCSA, pp. 90-100,
Feb. 1999.
11. L.-S. Ji and M. S. Corson, “Explicit Multicasting for Ad Hoc Networks”, Mobile Networks and Applications”, Vol. 8, No.
5, pp. 535-549, Oct. 2003.
12. C. W. Wu and Y. C. Tay, “AMRIS: A Multicast Protocol for Ad Hoc Networks”, in Proc. IEEE MILCOM, Vol. 1, pp.
25-29, Nov. 1999.
13. J. G. Jetcheva and D. B. Johnson, “Adaptive Demand-driven Multicast Routing in Multi-hop Wireless Ad Hoc Networks”,
in Proc. ACM MOBIHOC, pp. 33-44, Oct. 2001.
14. P. Sinha, R. Sivakumar, and V. Bharghavan, “CEDAR: A Core Extraction Distributed Ad Hoc Routing Algorithm”,
IEEE Journal on Selected Areas in Communications, Vol. 17, No. 8, pp. 1454-1466, Aug. 1999.
http://convergenceservices.in/blog
products that live in the cloud, which are secure, backed- precious time for the computer staff, which they can
up and accessible from any Internet connection. The invest on running other services without worrying about
best live example of this is Gmail, which is increasingly upgrading, backup, compatibility, and maintenance of
being used by organizations and individuals to run their servers, which is taken care of by Google. Libraries use
e-mail services. Google Apps being free for educational computers for running services, such as, Integrated
institutions is widely used for running a variety of Library Management Software (ILMS), website or
applications, especially the email services, which were portal, digital library or institutional repository. These
earlier being using on their own computer servers. This are either maintained by parent organization’s computer
has proved to be cost effective organizations since they staff or library staff, which involves huge investments
pay-per-use for applications and services and saves on hardware, software, and helps staffs to maintain the
http://www.globaldots.com/cloud-computing-types-of-cloud/
services and undertake the backups and upgrades, when condensed and offered as a service, upon which
new version of the software gets released. other higher levels of service can be built. The
customer has the freedom to build his own
Library professionals in most of the cases are not being
applications, which run on the provider’s
adequately trained in maintaining servers and often
infrastructure. To meet manageability and
find it difficult to undertake some of these activities
scalability requirements of the applications, PaaS
without the support of IT staff from within the
providers offer a predefined combination of OS
organization or through external sources. In the present
and application servers, such as LAMP Platform
day, Cloud Computing has become the latest
(Linux, Apache, MySql and PHP), restricted
buzzword in the field of libraries, which is blessing in
J2EE, Ruby, Google’s App Engine, Force.com,
disguise to operate various ICT services without any
which are some of the popular PaaS examples.
problem since third-party services will manage servers
and undertake upgrades and take back-up of data. 3. Infrastructure as a Service (IaaS): IaaS provides
Currently, some of the libraries have adopted the use basic storage and computing capabilities as
of cloud computing services as an emerging technology standardized services over the network. Servers,
to operate their services despite the fact that there are storage systems, networking equipment, data
certain areas of concern in using cloud services such center space are pooled and made available to
as privacy, security, etc. manage workloads. The customer would typically
deploy his own software on the infrastructure.
Types of Cloud Computing Some of the common examples are Amazon,
There are four types of Cloud Computing: GoGrid, 3 Tera, et al.
1. Private/Internal Cloud: Cloud operated internally
Application of Cloud Computing in Libraries
for a single enterprise.
Libraries are shifting their services with the attachment
2. Public/External Cloud: Applications, Storage and of cloud and networking with the facilities to access
other resource materials that are made available these services anywhere and anytime.
to the general public by the service providers.
In the libraries, the following possible areas were
3. Community Cloud: A Public Cloud tailored to a identified where cloud computing services and
particular community. applications may be applied:
4. Hybrid Cloud: A Combination of the internal and 1. Building Digital Library/Repositories: In the
external cloud. This type of hybrid cloudin the present situation, every library requires a digital
Community clod and Hybrid Cloud are used library to offer their resources, information and
interchangeably. services at an efficient level to ensure access via
the network. Therefore, every library has a digital
Cloud Computing Models library that is developed through the use of any
Cloud Computing Providers offer their services which digital library software.
can be grouped into three categories:
2. Searching Library Data: OCLC is one of the
1. Software as a Service (SaaS): In this model, a best examples for utilizing cloud computing for
complete application is offered to the customer, sharing libraries data for years together. OCLC
as a service on demand. A single request of the World Cat service is one of the well-accepted
service runs on the cloud & multiple end users services for searching library data that now is
are serviced. Today SaaS is offered by the available on the cloud. OCLC is offering various
companies that are: Google, Salesforce, Microsoft services pertaining to circulation, cataloguing,
and Zoho. acquisition and other library related services on
2. Platform as a Service (PaaS): In this model, a the cloud platform through the web share
layer of software or development environment is management system. A Web share management
system facilitates in the development of an open social networking services, such as, Twitter and
and collaborative platform in which each a library Facebook play a dominating role in building
can share their resources, services, ideas and community power. This cooperative effort of
problems with the library community on the libraries will create time saving efficiencies and a
clouds. On the other hand, the main objective of wider recognition, cooperative intelligence for
web-scale services is to provide cloud based better decision-making and provides the platform
platforms, resources and services with cost-benefit for innovation and sharing the intellectual
and effectiveness to share the data and building conversations, ideas and knowledge.
the broaden collaboration in the community.
5. Library Automation: For library automation
3. Website Hosting: Website hosting is one of the purpose, Polaris offers variant cloud- based
earliest adoptions of cloud computing as services, such as, acquisitions, cataloguing, process
numerous organizations including libraries prefer system, digital contents and provision for
to host their websites on third party service inclusion of cutting edge technologies used in
providers rather than hosting and maintaining libraries and also supports various standards such
their own servers Google Sites, which serve as an as MARC21, XML, Z39.50, Unicode and so on
example of a service for hosting websites externally which directly related to library and information
of the library’s servers and allowing for multiple science area. Apart from this, nowadays a majority
editors to access the site from varied locations. of the software vendors such as Ex-Libris, OSS
Labs are also offering this service on the cloud
4. Building Community Power: The Cloud
and third party services providing hosting of this
Computing technology offers tremendous
service (SaaS approach) on the cloud to save
opportunities for libraries to build networks
libraries from investing in hardware for this
among the library and information science
purpose. Besides cost-benefit, the libraries will be
professionals as well as other interested people
free from taking maintenance that is software
including information seekers by using social
updates, backup and other facilities.
networking tools. One of the most well-known
In the present situation of Indian Libraries in India, the design detail will help in ensuring a successful
cloud computing in libraries is in the development deployment. Certainly cloud computing can bring
phase. Libraries are attempting to offer their users about strategic, transformation and even revolutionary
cloud-based services however in reality they are not benefits fundamental to digital libraries. As regards to
fully successful mainly due to lack of good service organizations providing digital libraries, with
providers and technical skills of LIS professionals in significant investment in traditional software and
the field of library management using advanced hardware infrastructure, migration to the cloud will
technology. Yet some of the services such as digital highlight considerable technology transition; for less-
libraries, web documentation and using Web2.0 constrained organizations or those with infrastructure
technologies are operating on a successful mode. Some nearing end-of-life, adaptation of cloud computing
of the excellent examples of successful cloud technology may be more immediate.
computing libraries include Dura cloud, OCLC
No doubt, libraries are shifting towards cloud
services and Google-based cloud services. In the
computing technology in the present times and taking
current state, countless commercial as well as open
advantages of these services, especially in building
sources venders (i.e. OSS) are clubbing the cloud
digital libraries, social networking and information
computing technology into their services and products.
communication with manifold flexibilities yet some
However, cloud computing technology is not totally
issues related to security, privacy, trustworthiness and
accepted in the Indian libraries although they are trying
legal issues are still not completely resolved. Therefore,
to develop themselves in this area.
it is high time for libraries to think seriously before
Conclusion clubbing libraries services with cloud-based
technologies and provide reliable and rapid services
Cloud Computing represents an exciting opportunity
to their users. Another responsibility of LIS
to bring on-demand applications to Digital Library
professionals in this virtual era is to make cloud based
in an environment of reduced risk and enhanced
services a reliable medium to disseminate library
reliability. However, it is important to understand that
services to their target users with ease of use and
existing applications cannot just be unleashed on the
trustworthiness.
cloud as they are in existence. A careful attention to
References
1. Aravind Doss, and Rajeev Nanda. (2015). “Cloud Computing: A Practitioner’s Guide.” TMH. New Delhi.
P-265.
2. https://www.ibm.com/cloud-computing
3. Anna Kaushik and Ashok Kumar. (2013). “Application of Cloud Computing in Libraries.” International Journal
of Information Dissemination and Technology. 3 (4): 270-273.
4. Jadith Mavodza. “Impact of Cloud Computing on the Future of Academic Libraries and Services.” Proceedings
at the 34th Annual Conference of the International Association of Scientific and Technological University
Libraries (IATUL), Cape Town, South Africa.
5. Anthony T Velte. and Others. (2015). “Cloud Computing: A Practical Approach”. TMH: New Delhi.
P- 1-23.
6. Aravind Doss, and Rajeev Nanda. (2015). “Cloud Computing: A Practitioner’s Guide.” TMH. New Delhi.
P-265-268.
In recent years , popular metaheuristic techniques such iteration attempt to move towards the better solutions.
as Evolutionary algorithm, Genetic algorithm, Ant In recent years the population based metaheuristic
Colony Optimization, Particle Swarm Optimization, techniques have been gaining comparatively more
Bee colony optimization, Simulated Annealing, Tabu popularity and more new population based techniques
Search etc. have been widely used for different are getting reported in literature [21, 22, 23]. Keeping
optimization problems[11,12, 13, 16, 17, 21, 24, 25, this in mind this paper majorly focus on the population
26]. All of the above techniques have certain based techniques. However the details of the single
underlying working principle and various strategic solution based or trajectory based metaheuristic
constructs that may enable them to solve the problems techniques can be found in the literature [21, 22, 23
efficiently. However, in recent few years a new kind of ]. In the next section we describe two popular
metahueristic which is unlike the above approaches, population based metahuristic techniques.
do not belong to a specific metaheuristic category but
combines the approaches form the different areas like III. Population based metaheuristic techniques
computer science, biology, artificial intelligence and The majority of population based methods either
operation research etc. These new class of metaheuristic belongs to class of Evolutionary algorithms or Swarm
techniques are normally referred as Hybrid Intelligence based methods. The inherent mechanism
metaheuristc. In order to improve the performance, of evolutionary algorithm is mainly based on the
concept of quantum computing has also been applied Darwin’s theory of the survival of the fittest. The
to solve the optimization problems. With the intent population of solutions improves iteratively generation
of further improving the performance of the after generation. Fitter solutions are selected to
approaches various quantum inspired metaheuristic reproduce the better solutions for the next generation.
techniques have been proposed in literatures [14]. However, in Swarm intelligence based techniques,
instead of a single agent, the collective intelligence of
The lists of metaheuristic techniques are extensive and
the group is exploited to find the better solutions
it is difficult to summarize them in a brief survey, this
iteratively.
paper also not intended to do so. Rather, this paper
attempt to give a brief introductory overview of few Evolutionary algorithms refer to a class of
popular metaheuristic techniques. In the next section metaheuristic techniques whose underlying working
classification of the metaheuristic based techniques has mechanism is based on the Darwin’s theory of
been described. evolution. According to this theory the fitter living
beings which can better adapt in the changing
II. Classification of metaheuritstic techniqeus environment can survive and can be selected to
Many criteria can be found for the classification of reproduce the better offspring. This generic class of
various metaheuristic techniques. However the more techniques includes evolutionary programming,
common classification of metaheuristic techniques, Genetic algorithms, Genetic programming,
based on the use of single solution and population of evolutionary strategies etc.[15,18,19,20,29]. Though
solutions can be found in literature. The popular single these techniques differ in their algorithmic approach,
solution based techniques also known as the trajectory yet their core underlying working is similar. The
methods include, Simulated Annealing, Tabu Search, evolutionary algorithms are mainly characterized by
Variable Neighborhood Search, Guided Local Search, three important aspects, first the solution or individual
Iterated local search [27,28]. The single solution based representation, second the evolution function and third
approaches start with single initial solution and population dynamics throughout the algorithmic runs.
gradually move off from this solution depicting a All of the evolutionary techniques in every generation
trajectory movement in large search space [ 27, 28]. or algorithmic iteration attempt to select the better
solutions in terms of its objective function values.
Unlike single solution based metaheuristic techniques
These solutions further apply the mechanism of
the population based metaheuristic techniques begin
recombination and mutation operator to produce the
with a population of solutions and in every algorithmic
depend upon the considered problem and or also on Salesman problem [5]. In majority of the cases, where
the solution representation. With the help of crossover ACO is applied the problem subjected to is represented
operator two or more solutions may exchange their with a graph. ACO is a population based
genetic materials or some part of the solutions and metaheuristic. Various ants of real world, in search of
create new individuals. The cross over rate of the their food, work in a group and they find the shortest
population indicates the total number of chromosomes path from nest to the food source. This very behaviour
or solutions that would undergo the crossover or of real ants has inspired the ant colony optimization,
recombination. Each chromosome in the population in which a group of simple agents work in co-operation
has a fitness value determined by the objective in order to achieve the complex task. The real world
function. This fitness value is used by selection ants attempt to find the quality food sources nearest
operator to evaluate the desirability of the chromosome to their colony. In this pursuit they deposit some
for next generation. Generally, fitter solutions are chemicals on the search path also known as
preferred by the selection operator but some less fitter pheromones. The paths with good food sources and
chromosomes can also be considered in order to lesser distance from nest is likely to get more amount
maintain the population diversity. Crossover operator of pheromones. Paths with higher pheromone density
is applied on the selected chromosomes to recombine are highly likely to be selected by following ants. Such
them and generate new chromosome which might have behaviour of ants gradually leads towards the
better fitness. Mutation operator is applied to maintain emergence of the shortest path from nest to good food
the population diversity throughout the optimization source. Alternatively, it can be observed that the
process by introducing random modifications in the indirect communication or communication through
population.The Evoluationary algorithms have been enviroment, by using pheromone trails and without
applied for the optimization problems of the diverse any central control among ants, they are likely to find
area. It has been succesfully applied for the different the shortest path from their colony to food source. In
combinatorial optimization problems and constrained addition, artficial ants of Ant Colony Optimization have
optimization problems[7]. In recent years, it is also some extra characteristics which real ants do not have.
getting popularity in the area of multi-criteria These characteristics include presence of memory in
optimization problem. Finding the trade-off solutions artificial ants of ACO, which helps in constructing the
for the multi-objective optimization problem is a feasible candidate solutions and awareness about its
complex task. Evoluationary algorithms based environment for better decsion making during the
techniques like NSGA-II has been successfully applied solutions construction. In ACO, ants probabilistically
for several multi-objective optimization problem construct solutions using two important information
[1,3,8,9,10]. known as pheromone information and heuristic
information. The pheromone information τ(ij) repersents
In recent years the quantum inspired Genetic
the amount of pheromone on edge or solution
algorithm is also getting a lot of attention. It applies
component (i,j) and η(ij) repersents the preference of
the pricipal of quantum computing combined with
selection of node j from node i, during solution
evolutionary algorithm [14]. Insetead of binary,
construction. Both of these values are reperented using
numeric or symbolic repersentation, Quantum
numeric values. Both of these values influence the process
inspired algorithm applies Q-bit repersentation and
of search towards higher pheromone values and heuristic
Q-gate operator is used as a variation operator.
information values. In addition, the pheromone
Next we describe the swarm intelligence based information or denstiy on the path are updated at every
technique, Ant colony optimization or ACO. algorithmic iteration. The pheromone information
repersents the past search experience while heuristic
Ant Colony Optimization (ACO) information is problem specfic which remains unchanged
Ant colony optimization is a metaheuristic wich is throughout the algorithmic run of ACO. The solution
inspried by the behaviour of the real ants. This in each iteration is probabilistically constructed using
approach was first applied for solving Travelling the following formula:
P(ij) repersents the probability of selection of node j After the completion of solution construction, a
after node i in partially consturcted solution, mechanism of evaporation is applied with the intent
l indicates the available nodes for the solution of forgetting the unattractive choices and no path
construction or the nodes which are not already part become too dominating as it may lead towards the
of partially constructed solution. Here α and β indicate premature convergence. The path update at every
the relative importance for pheromone information iteration performed using the following formula:
and heuristic information respectively.
In the above formula, ρ indicates the pheromone decay initial work of ant system, many variants of ant based
coefficient, τ(0) indicate some intial pheromone value optimization techniques have been proposed in
deposited on the edge (ij). literature for solving various combinatorial
optimization problems such as Travelling salesman
In addition, daemon actions such as local search can
problem, vehicle routing problem, production
be applied as an optional action to further improve
scheduling, quadratic assignment problems, among
the quality of solution. The first ant colony based
others[4,5,6]. An abstract view of the ACO is as
optimization technique was proposed in [6] to solve
follows:
the single objective optimization problems. After the
Procedure ACO
Initialize pheromone matrix τ,
Initialize heuristic factor η,
While stopping criteria not met do
Perform ProbailisticSolutionsConstuction( )
Perform LocalSearchProcess( ) // optional action
Perform PheromoneUpdateProcess()
End While
End Procedure
Return best solution.
Figure 2. An ACO procedure [ 4,5,6]
An ant based system consists of multiple stages as nodes. As an optional action, local serach can be
shown in figure 2. In the first step, evaluation function performed for further improvement of the quality of
and the value of pheromone information (τ) are solution. Once each ant completes the process of the
initialized. In the next step, at each algorithmic solution constuction, the process of pheromone update
iteration, each ant in a colony of ants incremently using evaporation mechanism is performed. The best
constructs the solution by probabilistically selecting solution/solutions in terms of the value of the given
the feasible components or nodes from the available objective function is chosen to update the pheromone
information. The algorithmic iteration of solution Classification rules, Bayesian networks, Protein folding
construction and pheromone update ends when it among others [4]. In recent years it has been also
meets some predefined condition and the best solution gaining popularity for solving various multi-objective
is returned. This could be some predefined number optimization problems.
of generation or the condition of stagnation when there
is no further imporvment in solution is found. Conclusion
In this survey we have briefly described the
The ACO has been widely and succesfully applied for
metaheuristic based techniques for solving various
the various problems which include Travelling
optimization problems. Considering the distinction
Salesman problem, vehicle routing, Sequential
between the metaheuristic techniques based single
ordering, Quadratic Assignment, Graph coloring,
solutions approach and population based approaches,
Course timetabling, Project sheduling, Total weighted
we described introductory idea of two popular and
tardiness, Open shop, Set covering, Multiple knapsack,
widely used population based approaches including
Maximum clique, Constraint satisfaction,
Genetic algorithm and Ant colony optimization.
References
1. Asllani, A., & Lari, A. (2007). ‘Using genetic algorithm for dynamic and multiple criteria web-site
optimizations’, European journal of operational research, Vol. 176, No. 3, pp. 1767-1777
2. Basseur, M., Talbi, E., Nebro, A. & Alba, E. (2006). ‘Metaheuristics for Multiobjective Combinatorial
Optimization Problems: Review and recent issues’, INRIA Report, September 2006, pp. 1-39
3. Coello-Coello, C. A., Lamont, G. B. & van Veldhuizen, D. A. (2007). ‘Evolutionary Algorithm for solving
multi-objective problems, Genetic and Evolutionary Computation Series’, Second Edition, Springer.
4. Dorigo, M. & stutzle, T. (2004). Ant colony optimization, Cambridge: MIT Press, 2004
5. Dorigo, M. & Gambardella, L.M.,(1997) ‘Ant colonies for the traveling salesman problem’, BioSystems, vol.
43, no. 2, pp. 73–81, 1997.
6. Dorigo, M., Maniezzo,V. & Colorni, A., (1996) ‘Ant System: Optimization by a colony of cooperating
agents,’ IEEE Transactions on Systems, Man, and Cybernetics—Part B, vol. 26, no. 1, pp. 29–41, 1996.
7. Kazarlis, S.A., Bakirtzis, A.G. & Petridis, V (1996). ‘A genetic algorithm solution to the unit commitment
problem’, IEEE Transactions on Power System, Volume 11, Number 1, pp. 82-92
8. Deb, K., Pratap, A., Agarwal, S & Meyarivan, T. (2002). ‘A fast and elitist multiobjective Genetic Algorithm:
NSGA-II’, IEEE Transaction on Evolutionary Computation, Vol. 6, No. 2. pp. 182-197
9. Deb, K. (2010). Multi-objective optimization using Evolutionary algorithms. Wiley India.
10. Doerner, K. F., Gutjahr, W. J., Hartl, R. F., Strauss, C. and Stummer, C (2004). “Pareto ant colony optimization:
A metaheuristic approach to multiobjective portfolio selection,” Annals of Operations Research, vol. 131,
pp. 79–99,2004.
11. T’kindt, V., Monmarch´e, N., Tercinet, F. & La¨ugt, D (2002). “An ant colony optimization algorithm to
solve a 2-machine bicriteria flowshop scheduling problem,” European Journal of Operational Research, vol.
142, no. 2, pp. 250–257, 2002
12. Wang L., Niu, Q. & Fei, M.(2007) ‘A Novel Ant Colony Optimization Algorithm’, Springer Verlag Berlin
Heidelberg. LNCS 4688, pp. 277– 286, 2007
13. Goldberg, D. E. (1989). Genetic Algorithm in Search, Optimization and Machine Learning, Pearson
Education, India
14. Han, K.–H. & Kim, J.–H., (2000)‘Genetic quantum algorithm and its application to combinatorial
optimization problem,’ in Proc. Congress on Evolutionary Computation, vol. 2, pp. 1354-1360, La Jolla,
CA,2000.
15. X. Yao, Y. Liu, Fast evolutionary programming, in: Evolutionary Programming, 1996, pp. 451–460.
16. F. Vandenbergh, A. Engelbrecht, A study of particle swarm optimization particle trajectories, Information
Sciences 176 (2006) 937–971.
17. S. Kirkpatrick, C. Gelatt, M. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671–680.
18. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection
(Complex Adaptive Systems), first ed., The MIT Press, 1992.
19. T. Bäck, H.P. Schwefel, An overview of evolutionary algorithms for parameter optimization, Evolutionary
Computation 1 (1993) 1–23.
20. S. Baluja, Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function
Optimization and Competitive Learning, Technical Report, Carnegie Mellon University, Pittsburgh, PA,
USA, 1994.
21. F. Glover, Tabu search for nonlinear and parametric optimization (with links to genetic algorithms), Discrete
Applied Mathematics 49 (1994) 231– 255.
22. M. Birattari, L. Paquete, T. Stützle, K. Varrentrapp, Classification of Metaheuristics and Design of Experiments
for the Analysis of Components, Technical Report AIDA-01-05, FG Intellektik, FB Informatik, Technische
Universität Darmstadt, Darmstadt, Germany, 2001.
23. E.G. Talbi, Metaheuristics: From Design to Implementation, first ed., Wiley-Blackwell, 2009.
24. S. Jung, Queen-bee evolution for genetic algorithms, Electronics Letters 39 (2003) 575–576.
25. D. Karaboga, An Idea Based on Honey Bee Swarm for Numerical Optimization, Technical Report TR06,
Erciyes University, 2005.
26. D. Karaboga, B. Akay, A survey: algorithms simulating bee swarm intelligence, Artificial Intelligence Review
31 (2009) 61–85.
27. N. Mladenovic, A variable neighborhood algorithm – a new metaheuristic for combinatorial optimization,
in: Abstracts of Papers Presented at Optimization Days, Montréal, Canada, 1995, p. 112.
28. N. Mladenovic, P. Hansen, Variable neighborhood search, Computers and Operations Research 24 (1997)
1097–1100.
29. X. Yao, Y. Liu, G. Lin, Evolutionary programming made faster, IEEE Transactions on Evolutionary
Computation 3 (1999) 82–102.
References
1. Dong Zhou, Mark Truran, Tim Brailsford, Vincent Wade, Helen Ashman,” Translation Techniques in Cross-
Language Information Retrieval.
2. J. Cardeñosa, C Gallardo, Adriana Toni,” Multilingual Cross Language Information Retrieval A new approach”.
3. UNL Center. UNL specifications v 2005. http://www.undl.org/unlsys/unl/unl2005-e2006/
4. D. Manning, C., P. Raghavan, and H. Schütze, “An Introduction toInformation Retrieval”, 2009.
5. Nurul Amelina, Nasharuddin, Muhamad Taufik Abdullah, “Crosslingual Information Retrieval”,Electronic
Journal of Computer Science and Information Technology,Vol. 2,No. 1.
6. Abusalah, M., J. Tait, M. Oakes, “Literature Review of Cross Language Information Retrieval”,2005
7. Nurul Amelina, Nasharuddin, Muhamad Taufik Abdullah,”Crosslingual Information Retrieval”,Electronic
Journal of Computer Science and Information Technology,Vol. 2,No. 1,
8. Bateman, J.A; Henschel, R. and Rinaldi, F. “The Generalized Upper Model 2.0.” 1995. http:// http://
www.fb10.unibrem en.de/anglistik/langpro/webspace/jb/gum/index.htm
3. Efficiency: Mining rules from semi structure and However, in spite of improved movement and
unstructured as in the semantic web is a great attention, there are considerable, continual concerns
challenge. Lot of time and memory consumption about cloud computing that ultimately compromise
leads to decreased efficiency. the vision of cloud computing as a new IT
procurement model. Fundamentally Cloud Mining
4. Security: The data on web is accessed publicly.
is novel approach to faced search interface for your
There is no data that is hidden, so this is another
data. The major challenge which is a security of web
challenge in Web Mining.
mining is been offered by SaaS (Software-as-a Service)
B) Cloud Computing and used for dropping the cost which is termed as
cloud mining technique. It’s been targeted to change
The computer resources these days are consumed as
the existing framework of web mining to generate an
utility by various companies the same manner one
influential framework by Hadoop and map Reduce
consumes electricity or a rented house. There is no
communities for projecting analytics. [9]
need to fabricate and retain computing infrastructures
in-house. There are three types of cloud private, public In the next section we have discussed how to use Map/
and hybrid. Cloud services are mainly categorized into Reduce Model in Cloud Computing and what are the
three types: Software as a Service (SaaS), Platform as a various benefits of using this model.
Service (PaaS) and Infrastructure as a Service(IaaS)[8].
There are various benefits of Cloud, some of which II. Cloud Computing and Map/ Reduce Model
are mentioned below: The term cloud is a representation designed for the
Internet, an intellection of the Internet’s fundamental
l Self-service provisioning: It all depends on the
infrastructure that helps to spot the point at which
end users, which type of services they yearn for.
accountability moves from the user to an external
Users can revolve around multiple computing
provider. Cloud Computing is one of the most
assets for almost any type of workload on-demand.
captivating areas where lots of services are being
l Elasticity: Companies can scale up as computing utilized. The main objective of Cloud computing is
needs increase and then scale down again as to fully utilize the resources dispersed at various
demands decrease. places[10]. Map/ Reduce model which is a
l Pay per use: There is a flexibility of using the programming model, proposed by Google is used for
services and computing resources as per the need processing voluminous data sets. Map/Reduce Model
of demand of the user. This facility permits users processes around 20 petabytes of data in a single day.
to pay only for the resources and workloads they This model is gaining more popularity in cloud
utilize. computing these days[11][12]. Map/ Reduce model
is used for parallel and disseminated processing of huge
Cloud computing is most impressive technology data sets on clusters[13]. Some of the applications of
because it is cost efficient and flexible. Cloud Mining’s Map/Reduce are:
Software as Service (SaaS) is used for implementing At Google:
Web Mining, as it reduces the cost and increases the
security. Compared to all the other web mining l Index building for Google Search
techniques, Web usage mining is immeasurably used l Article clustering for Google News
and have known productive outcomes[7]. l Statistical machine translation
At Yahoo!:
C) Web Mining and Cloud Computing l Index building for Yahoo! Search
One of the mostly used technologies in Web Mining l Spam detection for Yahoo! Mail
is Web Usage Mining[1]. Web Usage mining using At Facebook:
Cloud Computing is majorly adopted these days due
l Ad optimization
to its reduced cost efficiency and flexibility[6].
l Spam detection
References
1. M. U. Ahmed and A. Mahmood, “Web usage mining:,” International Journal of Technology Diffusion, vol.
3, no. 3, pp. 1–12, Jul. 2012.
2. S. K. Pani, et.al L “Web Usage Mining: A Survey On Pattern Extraction From Web Logs”, International
Journal Of Instrumentation, Control & Automation (IJICA), Volume 1, Issue 1, 2011.
3. Singh, Brijendra, and Hemant Kumar Singh. “Web data mining research: a survey.” In Computational
Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on, pp. 1-10. IEEE,
2010.
4. J Vellingiri, S.Chenthur Pandian, “A Survey on Web Usage Mining”, Global Journal of Computer Science
and Technology .Volume 11 Issue 4 Version 1.0 March 2011.
5. Li, J., Xu, C., Tan, S.-B, “A Web data mining system design and research”. Computer Technology and
Development 19: pp. 55-58, 2009
6. Robert Grossman , Yunhong Gu, “Data mining using high performance data clouds: experimental studies
using sector and sphere”, Proceedings of the 14th ACM SIGKDD international conference on Knowledge
discovery and data mining, August 24-27, 2008
7. J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web usage mining,” ACM SIGKDD Explorations
Newsletter, vol. 1, no. 2, p. 12, Jan. 2000.
8. Khanna, Leena, and Anant Jaiswal. “Cloud Computing: Security Issues And Description Of Encryption
Based Algorithms To Overcome Them.” International Journal of Advanced Research in Computer Science
and Software Engineering 3 (2013): 279-283.
9. V. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White. Visualization of navigation patterns on a web
site using modelbased clustering. In In Proceedings of the sixth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining,pages 280{284, Boston, Massachusetts, 2000.
10. Zhu, W., & Lee, C. (2014). A new approach to web data mining based on cloud computing. Journal of
Computing Science and Engineering, 8(4), 181–186. doi:10.5626/jcse.2014.8.4.181
11. “MapReduce.” Wikipedia. N.p.: Wikimedia Foundation, 11 Jan. 2017. Web. 2 Jan. 2017.
12. Divestopedia, and Securities Institute. What is MapReduce? - definition from Techopedia. Techopedia.com,
2017. Web. 2 Jan. 2017.
13. Posted, and Margaret Rouse. What is MapReduce? - definition from WhatIs.com. SearchCloud Computing,
25 June 2014. Web. 2 Jan. 2017.
14. Hornung, T., Przyjaciel-Zablocki, M., & Schätzle, A. (2017). Giant data: MapReduce and Hadoop » ADMIN
magazine. Retrieved January 10, 2017, from http://www.admin-magazine.com/HPC/Articles/MapReduce-
and-Hadoop
15. Lee, K.-H., Lee, Y.-J., Choi, H., Chung, Y. D., & Moon, B. (2012). Parallel data processing with MapReduce.
ACM SIGMOD Record, 40(4), 11. doi:10.1145/2094114.2094118
3. Loss of control over end user actions Certain regulations like the EU Data Protection Directive
Employees can harm the company by downloading a require these disclosures. Following legally-mandated
report of all customer contacts, upload the data to a breach disclosures, regulators can levy fines against a
personal cloud storage service and then access that company and it’s not uncommon for consumers whose
information once he left the company and joins some data was compromised to file lawsuits.
competitor. It can be misused when companies are in
8. Increased customer churn
dark about the working moment of their employees.
It is one of the more common insider threats today. If customers even suspect that their data is not fully
protected by enterprise-grade security controls, they
4. Malware infections that unleash a targeted may take their business elsewhere to a company they
attack can trust. A growing chorus of critics is instructing
Cloud services are the vector of data exfiltration. Study consumers to avoid cloud companies who do not
reveals that a novel data exfiltration technique is that protect customer privacy.
where attackers encoded sensitive data into video files 9. Revenue losses
and uploaded them to social media. There are
numerous malware that exfiltrates sensitive data via a According to the Ponemon BYOC study, 64% of
private social media accounting the case of the Dyre respondents confirmed that their companies can’t
malware variant, cyber criminals used file sharing confirm if their employees are using their own cloud
services to deliver the malware to targets using phishing in the workplace. In order to reduce the risks of
attacks. unmanaged cloud usage, companies first need visibility
into the cloud services in use by their employees. They
5. Contractual breaches with stake holders need to understand what data is being uploaded to
Contracts among business parties often restrict how which cloud services and by whom. With this
data is used and who is authorized to access it. When information, IT teams can begin to enforce corporate
employees move restricted data into the cloud without data security, compliance, and governance policies to
authorization, the business contracts may be violated protect corporate data in the cloud. The cloud is here
and legal action could ensue. The cloud service to stay, and companies must balance the risks of cloud
maintains the right to share all data uploaded to the services with the clear benefits they bring.
service with third parties in its terms and conditions, In this era of digitization, data security is paramount
thereby breaching a confidentiality agreement the to every business. In past, on-premise servers were the
company made with a business partner. business technology model, but now there are more
choices. For the last several years, a debate has flowed
6. Diminished trust of customer through businesses. How will cloud computing affect
Data breaches results in diminished trust of customers. them? Should they adopt a public cloud approach,
The biggest breach reported was that where cyber opt for private cloud, or stick with their on-premise
criminals stole over 40 million customer credit and servers? The use of cloud computing is steadily rising.
debit card numbers from different Target. The breach In fact, a recent study has shown that cloud services
led customers to stay away from Target stores, and led are set to reach over $130 billion by 2017. Before
to a loss of business for the company, which ultimately making any decisions, it’s important to think about
impacted the company’s revenue. how this shift towards cloud computing will affect
cyber security for your business.
7. Data breach requiring disclosure and
notification to victims Measures or models of cloud computing in
If sensitive or regulated data is put in the cloud and a cyber security
breach occurs, the company may be required to disclose Boehm et al. poised that all dilemmas that arise in
the breach and send notifications to potential victims. software engineering are of an economic nature rather
than a technical nature, and that all decisions ought third party data control, which arises in cloud
to be modeled in economic terms: maximizing benefit; computing because user data is managed by the cloud
minimizing cost and risk. Their work is perfectly provider and may potentially be exposed to malicious
compatible with the philosophy of value-based third parties. They also discuss strategies that maybe
software engineering, as it models system security not used to mitigate these security concerns.
by an arbitrary abstract scale but rather by an economic
Center for Internet Security (2009)used mean time to
function (MFC), quantified in monetary terms (dollars
incident discovery, incident rate, mean time between
per hour), in such a way as to enable rational decision
security incidents, mean time to incident recovery,
making.
vulnerability scan coverage, percentage of systems
Brunette and Mogull (2009) discuss the promise and without known severe vulnerabilities, mean time to
perils of cloud computing, and single out security as mitigate vulnerabilities, number of known
one of the main concerns of this new computing vulnerability instances, patch policy compliance, mean
paradigm. They have cataloged and classified the types time to patch and proposed a set of MTTF-like metrics
of security threat that arise in cloud computing. Their to capture the concept of cyber security.
work can be used to complement and provides a
comprehensive catalog of security threats that are Benefits of Cyber security in Cloud Computing
classified according to their type. Cyber security has numerous benefits in cloud based
applications like improvement in gathering and threat
Black et al. (2009) discussed about categorization of
model, enhanced collaboration, reduction of lag time
metrics and measures and among different type of
between detection and remediation. With the increase
metrics. These metrics can be used as standard by
in cyber-attacks in era of cloud computing
organization to compare between current situations
organization need to take precautions and adequate
and expected one. This provides the organization
measures to deal with threats. The four pillars of cloud
facility to raise the level in order to meet the goal.
based cyber security comprise updated Technologies,
Jonsson and Pirzadeh (2011) proposed a framework extremely protected platforms, skilled manpower and
to measure security by regrouping the security and high bandwidth connectivity. Learning collection can
dependability attributes on the basis of already existing support real time integrated security information.
conceptual model applicable on application areas Usage of cyber security ensures that security while
varying from small to large scale organization. They maintaining sensitive data. The concept of out-of-band
discussed how different matrices are related to each channels can be used to deal with cyber-attacks. 41%
other. They categorize the security metric into of business employ infrastructure-as-a-service (IaaS)
protective and behavior metrics. Choice of measures for mission-critical workloads. Cloud-based cyber
affect the results and accuracy of a metric. security solution developed by PwC and Google can
Carlin and Curran (2011) founded that using cloud provide advanced detection, analysis, collective
computing companies can decrease the budget by learning, high performance, scalability in analytic
18%. The findings comprise mainly three services processes to enable an advanced security operations
Software-as-a-service (SaaS), Platform-as-a-service capability (ASOC).This will create honeypots and
(PaaS) and Infrastructure-as-a-service (IaaS). Three dummies for maintaining connection to end point for
kinds of model public private and hybrid, encryption analysis and learning.
is not a way to fully protect the data. Conclusion
Chow et al. (2009) discusses the three types of security This paper discusses about numerous benefits of cloud
concern raised in cloud computing- provider-related based system and various risks related to it. We also
vulnerabilities, which represent traditional security discussed the various models which talks about how
concerns; availability, which arises in any shared to maximize the benefits, minimizing cost and risks.
system, and most especially in cloud computing; and On the basis of classification of metrics and measures
of cloud computing we can facilitate organization to concerns. At last we can say that usage of cyber security
raise the efficiency and to meet their goals. Various ensures security while maintaining sensitive data as
strategies maybe used to mitigate these security well.
References
1. Rabia, L., Jouini, M., Aissa, A., Mili, A., 2013. A cybersecurity model in cloud computing environments.
Journal of King Saud University –Computer and Information Sciences.
2. Boehme, R., Nowey, T., 2008. Economic security metrics. In: Irene, E.,Felix, F., Ralf, R. (Eds.), Dependability
Metrics, 4909, pp. 176–187.
3. Brunette, G., Mogull, R., 2009. Security guidance for critical areas offocus in cloud computing V 1.2.
Cloud Security Alliance.
4. Black, P.E., Scarfone, K., Souppaya, M., 2009. Cyber Security Metricsand Measures. Wiley Handbook of
Science and Technology forHomeland Security.
5. Jonsson, E., Pirzadeh, L., 2011. A framework for security metricsbased on operational system attributes. In:
International Workshopon Security Measurements and Metrics – MetriSec2011,Bannf, Alberta, Canada.
6. Carlin, S., Curran, K., 2011. Cloud computing security. InternationalJournal of Ambient Computing and
Intelligence.
7. Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuok, R.,Molina, J., 2009. Controlling data in
the cloud: outsourcingcomputation without outsourcing control. In: ACM Workshop onCloud computing
Security (CCSW).
8. The Center for Internet Security, The CIS Security Metrics v1.0.0, 2009. <https://www.cisecurity.org/tools2/
metrics/CIS_Security_Metrics_v1.0.0.pdf>.
victim’s computer(s), exceeding the limit that the server, over which any amount of data can be sent
victim’s servers can support and making the server’s securely. All browsers support SSL, and many Web
crash. sites use the protocol to obtain confidential user
information, such as credit card numbers.
c) E-mail spoofing- A spoofed e-mail is one, which
misrepresents its origin. It shows its origin to be b) HTTPS- Hyper Text Transfer Protocol combined
different from which it originates. with SSL to ensure security. S-HTTP is designed
to transmit individual messages securely. SSL and
d) Phishing- It is another criminally fraudulent
S- HTTP, can be seen as complementary rather
process, in which a fake website resembling the
than competing technologies. Both protocols have
original site is designed. Phishing is an attempt to
been approved by the Internet Engineering Task
acquire sensitive information such as usernames,
Force (IETF) as a standard.
passwords and credit card details, by masquerading
as a trustworthy entity in an electronic c) Firewall- Firewalls can be implemented in both
communication. hardware and software, or a combination of both
to prevent unauthorized access. Firewalls are
e) Salami Attack- is an attack which is difficult to
frequently used to prevent unauthorized Internet
detect and trace, also known as penny shaving.
users from accessing private networks connected
The fraudulent practice of stealing money
to the Internet, especially intranets. All messages
repeatedly in small quantities, usually by taking
are entering or leaving the intranet pass through
advantage of rounding to the nearest cent (or other
the firewall, which examines each message and
monetary units) in financial transactions.
blocks those messages that do not meet the
f) Virus / Worm Attacks – Malicious Programs are specified security criteria.
dangerous may it be Viruses, worms, logic bombs,
d) SET- Secure Electronic Transaction is a standard
trap doors, Trojan Horse, etc. As they are programs
developed jointly by Visa International,
written to infect and harm the data by altering or
MasterCard, and other companies. The SET
deleting the information, or by making a backdoor
protocol uses digital certificates to protect credit
entry for unauthorized person.
card transactions that are conducted over the
g) Forgery- Counterfeit currency notes, postage, and Internet. The SET standard is a significant step
revenue stamps, mark sheets, etc. can be forged towards securing Internet transactions, paving the
using sophisticated computers, printers, and way for more merchants, financial institutions,
scanners. and consumers to participate in electronic
commerce.
III. Security Measures
e) PGP- Pretty Good Privacy provides confidentiality
Security has become a necessity, and need to keep data
by encrypting messages to be transmitted or data
safe, achieve it and many techniques are available. By
files to be stored using an encryption algorithm.
using these techniques, one can ensure the
PGP uses the “public key” encryption approach -
confidentiality, authentication, privacy and integrity
messages are encrypted using the publicly available
of their information. Information can be of any type;
key, but can only be deciphered by the intended
may it be in the form of text, image, audio or video.
recipient via the private key.
The need for security means to prevent unwanted
access to confidential information, this can be attained f) Anti-Virus- To secure PC, laptop, smartphone
by the following ways:- from any malicious attack the user must install a
good anti- virus and always update the anti-virus
a) SSL- Secure Socket Layer is a protocol developed
software fortnightly for better security.
by Netscape. It was designed so that sensitive data
can be transmitted safely via the Internet. SSL g) Steganography- It is the process of hiding a secret
creates a secure connection between a client and a message with an ordinary message. The original
user will view the standard message and will fail There are two basic types of cryptosystems such as
to identify that the message contains a hidden or symmetric cryptosystems and asymmetric
encrypted message. The secret message can be cryptosystems. Symmetric cryptography is a concept
extracted by only the authentic users who are aware in which both sender and receiver shares the same key
of the hidden message beneath the ordinary file. for encryption and decryption process. In contrast to
Steganography is now gaining popularity among symmetric cryptography, asymmetric cryptography
the masses because of ease of use and abundant uses a pair of keys for encryption and decryption
tools available. transformations. The public key is used to encrypt
h) Cryptography- It is the “scrambling” of data done data, and the private key is used to decrypt the message.
using some mathematical calculations and only 1) Symmetric Key Encryption Algorithms
authentic user with a key and algorithm can
“unscramble” it. It allows secure transmission of Symmetric Key is also known as a private key or
private information over insecure channels. conventional key; shares the unique key for
transmitting the data safely. The symmetric key was
IV. Cryptography the only way of enciphering before the 1970s.
Cryptology is the study of reading, writing, and Symmetric Key Encryption can be performed using
breaking of codes. It comprises of cryptography (secret Block Cipher or Stream Cipher.
writing) and cryptanalysis (breaking code). Stream Cipher takes one bit or one byte as an input,
Cryptography is an art of mangling information into process it and then convert it into 1bit or 1-byte cipher-
apparent incomprehensibility in a way permitting a text. Like RC4 is a stream cipher used in every mobile
secret method of unscrambling [11]. Human has a phone.
requirement to share private information with only
intended recipients. Cryptography gives a solution to Block Cipher works with a single block or chunks of
this need. data or message instead of a single stream, character,
or byte. Block ciphers mean that the encryption of
Cryptographic algorithms play a significant role in the
any plaintext bit in a given block depends on every
field of network security. To perform cryptography,
other plaintext bit in the same block. Like DES, 3DES
one requires the secure algorithm which helps the
have a block size of 64 bits (8bytes), and AES has a
conversion efficiently, securely if carried out with a
block size of 128 bits (16 bytes).
key. Encryption is the way to transform a message so
that only the sender and recipient can read, see or 2) Need for Cryptography
understand it. The mechanism is based on the use of
It has given a platform which can ensure not only
mathematical procedures to scramble data so that it is
confidentiality but also integrity, availability, and non-
tough for anyone else to recover the original message.
repudiation of messages/ information. Symmetric Key
encryption algorithm focuses on privacy & reverse order for decryption [2] [21].
confidentiality of data.
d) AES- AES is also a symmetric key algorithm based
3) Symmetric Key Block Cipher Algorithm on the substitution–permutation Network
[4][7][23].
The paper focuses on Symmetric Key block ciphers.
DES, 3DES, AES, IDEA, Blowfish are among most AES use a 128-bit block as plain text, which is
used and popular algorithms of Block ciphers. organized as 4*4 bytes array also called as State
and is processed in several rounds. It has variable
a) DES- DES is based on Feistel network. It takes
Key length 128, 192 or 256-bit keys. Rounds are
64 bit Plain Text as an input and 64 bit Cipher
variable 10, 12, or 14 depends on the key length
Text comes as an output. Initially a 64 bit Key is
(Default # of Rounds = key length/32 + 6). For
sent which is later converted to 56 bits (by
128 bit key, number of rounds are 10; 192 bit
removing every 8th bit). Later using 16 iterations
key, 12 rounds and for 256 bit key, 14 rounds. It
with permutation, expansion, substitution,
only contains a single S- box (which takes 8bits
transpositions and basic mathematical functions
input, and give 8 bits output) which consecutively
encryption is performed and decryption is the
work 16 time. Originally the cipher text block
reverse process of encryption.
was also variable, but later it was fixed to 128 bits.
b) 3DES – Triple DES is an enhancement of Data
The Encryption and decryption process consists
Encryption Standard. To make it more secure the
of 4 different transformations applied
algorithm execute three times with three different
consecutively over the data block bits, in a fixed
keys and 16*3=48 rounds; and a key length of
number of iterations, called rounds. The
168 bits (56*3) [22]. The 3DES encryption
decryption process is direct inverse of the
algorithm works in a sequence Encrypt-Decrypt-
encryption process. Hence the last round values
Encrypt (EDE). The decryption process is just
of both the data and key are first round inputs for
reverse of Encryption process (Decrypt- Encrypt-
the decryption process and follows in decreasing
Decrypt). 3DES is more complicated and
order. AES is extremely fast and compact cipher.
designed to protect data against different attacks.
For implementers its symmetric and parallel
3DES has the advantage of reliability and a longer
structure provides great and an effective resistance
key length that eliminates many attacks like brute
against cryptanalytic attacks. The larger block size
force. 3DES higher security was approved by the
prevents birthday attacks and large key size
U.S. Government. Triple DES has one big
prevents brute force attacks
limitation; it is much slower than other block
encryption methods. e) BlowFish- It is a symmetric block cipher and
works on of 64-bit block size. Key length is
c) IDEA-International Data Encryption Algorithm
variable from 32 bits to 448 bits. It has16 rounds
is another symmetric key block cipher algorithm
and is based on Feistel network. It has a simple
developed at ETH in Zurich, Switzerland. It is
structure and it’s easy to implement. It encrypts
based on substitution-permutation structure. It
data on 32 bit microprocessors at a rate of 18 clock
is a block cipher that uses a 64 bit plain text,
cycles per byte so much faster than AES, DES, and
divided equally into 16 bits each (16*4=64); with
IDEA. Since the key size is large it is complex to
8 and s half rounds and a Key Length of 128-bits.
break the code in the blowfish algorithm. It is
For each round 6 sub keys are required 4 before
vulnerable to all the attacks except the weak key
the round and 2 within the round (8*6= 48 sub
class attack. It is unpatented and royalty-free. It
keys+ 4 sub keys are used after last or eighth round
requires less than 5K of memory to run Blowfish
that makes total 52 sub- keys). IDEA does not
[6] [18].
use S-boxes. IDEA uses the same algorithm in a
3. Forouzan, B.A., &Mukhopadhyay, D. (2010). Cryptography and Network Security. Tata McGraw-Hill, New
Delhi, India
4. Gatliff, B. (2003). Encrypting data with the Blowfish algorithm. Available at http://www.design-reuse.com/
articles/5922/ encrypting-data-with-the-blowfish-algorithm.
5. Kak, A. (2015). Computer and Network Security- AES: The Advanced Encryption Standard.Retrieved from
https://engineering.purdue.edu/kak/compsec/NewLectures/Lecture8.pdf
6. Koukou, Y.M., Othman, S.H., Nkiama, M. M. S. H. (2016). Comparative Study of AES, Blowfish, CAST-
128 and DES Encryption Algorithm. IOSR Journal of Engineering, 06(06), pp. 1-7.
7. Kumar, A., Tiwari, N. (2012).Effective Implementation and Avalanche Effect of AES. International Journal of
Security, Privacy and Trust Management (IJSPTM).
8. Mahindrakar, M.S. (2014). Evaluation of Blowfish Algorithm based on Avalanche Effect. International Journal
of Innovations in Engineering and Technology, 1(4), pp. 99-103.
9. Menezes, A., Van, P., Orschot, O. and Vanstone, S. (1996). Handbook of Applied Cryptography, CRC Press.
10. Mollin, R.A. (2006). An Introduction to Cryptography. Second Edition, CRC Press
11. National Bureau of Standards (1997). Data Encryption Standard. FIPS Publication 46.
12. Paar, C., Pelzl, J. (2010). Understanding Cryptography: A Textbook for Students and Practitioners’. Springer,
XVIII, 372.
13. Ramanujam, S., &Karuppiah, M. (2011). Designing an algorithm with high Avalanche Effect. International
Journal of Computer Science and Network Security. 11(1).
14. Saeed, F., & Rashid, M. (2010). Integrating Classical Encryption with Modern Technique. International Journal
of Computer Science and Network Security, 10(5).
15. Schneier B. (1994). Applied Cryptography. John Wiley& Sons Publication, New York.
16. Schneier, B. (1994).Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish), Fast Software
Encryption, Cambridge Security Workshop Proceedings, Springer-Verlag, 1994, Available at http://
www.schneier.com/paper-blowfish-fse.html
17. Shailaja, S. & Krishnamurthy, G.N. (2014). Comparison of Blowfish and Cast-128 Algorithms Using Encryption
Quality, Key Sensitivity and Correlation Coefficient Analysis. American Journal of Engineering Research, 7(3),
pp. 161-166.
18. Stallings, W. (2011). Cryptography and Network Security: Principles and Practice. Pearson Education, Prentice
Hall: USA
19. Thaduri, M., Yoo, S. and Gaede, R. (2004). An Efficient Implementation of IDEA encryption algorithm using
VHDL. Elsevier
20. Tropical Software, Triple DES Encryption, Available at http://www.tropsoft.com/strongenc/des3.htm,
21. Wagner, R. N. The Laws of Cryptography. Retrieved From http://www.cs.utsa.edu/~wagner/laws/
DES algorithm is further replaced by Rijndael 10 rounds of encryption is performed for 128 bit key,
algorithm and named as Advance encryption algorithm 12 rounds for 192 bit keys, and 14 rounds for 256 bit
or AES [8], [9].AES has more flexible key strength keys. Following Algorithm Encrypt the data [11].
that may be help in future manipulation for betterment Step 1:- Input a plaintext of 128 bits of block cipher
of it. which will be negotiated as 16 bytes.
RSA was named on their inventor names in 1977, Ron Step 2: - Add Round Key: - each byte is integrated
Rivest, Adi Shamir and Len Adleman[10].This with a block of the round key using bitwise XOR.
algorithm is asymmetric and still in use. RSA
algorithms have dual benefit as it used for data Step 3:- Byte Substitution: - the 16 input bytes are
encryption as well as digital signatures. substituted by examining S- box. The result will be a
4x4 matrix.
II. AES Step 4:- Shift row: - Every row of 4x4 matrices will be
Now a Days Security is Equally essential as Speed of shifted to left. Entry which will be left placed on the
data communication and Advance Encryption right side of row.
standard has best suited for it as it provide speed as Step 5:- Mix Columns: - Every column of four bytes
well as increase security with hardware. Because of its will be altered by applying a distinctive mathematical
dual base which consists of hardware as well as software function (Galois Field).
this System is more advance and secure than basic DES
[8]. Step 6:- Add Round Key: - The 16 bytes of matrix
will be contemplated as 128 bits and will be XORed
AES also advance in the sense of its structure as it uses to 128 bits of the round key.
key in bytes instead of bits whereas in DES number of
rounds for encryption of data is not fixed, it depends Step 7:- This 128 bits will be taken as 16 bytes and
on the size of the plain text it has to encrypt. If size of similar rounds will be performed.
text is 128 bit it will treated as 16 Bytes and these 16 Step 8:- At the 10th round which will be last round a
Bytes then arranged in form of 4x4 matrixes. In AES ciphered text will be produced.
References
1. www.britannica.com/topic/cryptography.
2. ENISA’s Opinion Paper on Encryption December 2016.
3. https://www.tutorialspoint.com/cryptography/data_encryption_standard. htm.
4. https://www.tutorialspoint.com/cryptography/cryptosystems.htm.
5. http://www2.itif.org/2016-unlocking-encryption.pdf.
6. http://www.infoplease.com/encyclopedia/science/data-encryption.html.
7. http://www.infoplease.com/encyclopedia/society/cryptography.html.
8. http://www.ijarcce.com/upload/2016/march-16/IJARCCE%20227.pdf0.
9. https://www.britannica.com/topic/AES#ref1095337.
10. http://www.di-mgt.com.au/rsa_alg.html.
11. https://www.irjet.net/archives/V3/i10/IRJET-V3I10126.pdf.
12. ahttps://en + b =.wikipedia c. .org/wiki/Advanced(1) (1) _Encryption_Standard.
13. https://www.tutorialspoint.com/cryptography/advanced_encryption_stan dard.html.
14. A Novel Approach to Enhance the Security Dimension of RSA Algorithm Using Bijective Function.
15. http://paper.ijcsns.org/07_book/201608/20160809.pdf.
16. Research and Implementation of RSA Algorithm for Encryption and Decryption.
17. https://globaljournals.org/GJCST_Volume13/4-A-Study-of-Encryption-Algorithms.pdf
IP address which immerses the some extra IP address number of addresses which is sufficient to present as
especially for always on devices. IPv4 are not able to well as future scenario. IPv6 can allot 340 undecillion
fulfill the IP demands. addresses to unique devices which is sufficient to
handle present traffic.
l Internet Routing Table Expansion:
l Improved Packet Handling:
Routing table is used by routers to make best path so
network and entities connected to internet increases Ipv6 packet has eliminated the un required field which
so does the number of network routes. These IPv4 is not required from IPv4 and include required fields
routes consume a great deal of memory and processor which is not present in the IPv4 header. IPv6 simplified
resources on internet routers. Which will increases the with fewer fields this improve packet handling by
complexity of the network as well as takes lots of space. intermediate routers and also provides support for
extensions and options for increased scalability.
l Lack of end to end Connectivity:
l Eliminates need of NAT:
Due to better use of IP address IANA introduce public
As mention earlier IP version4 does not have sufficient
and private addressing. By using private address multiple
Ip addresses. So this problem is solved by Public and
devices are able to connect through the internet by single
Private addresses. But use of private addresses required
IP address. But it needs translation between public address
NATing which is an overhead. In IPv6 NATing
to private ip address as well as private to public IP address.
concept is eliminated because of large number of IPv6
Network Address Translation (NAT) is an technology
addresses.
commonly implemented within IPv4 network NAT
provide a way for multiple devices to share a single public l Integrated Security:
IP address. This is an overhead which leads to increase IPv4 is the first IP version which is mostly focuses on
complexity of the network and increases the possibility the how we can transfer data from two or more devices.
of error [4]. This requirement was successfully accomplished by
IPv4. But as an technology increases chance to theft
III. Improvement that IPv6 Provides
also increases. Ipv4 does not provide any security fields.
In early 1990’s the internet engineering task By keeping in a mind IPv6 has integrated security. It
force(IETF) grew concerned about the issues with IPv4 provides authentication and privacy capabilities.
and began to look for replacement this activity leads
to development of IP version 6. IPv6 overcome the IV. Internet Protocol Version 6 (IPv6)
limitation of IPv4 some are listed below: On Monday Jan 31 2011 IANA allocated the last two
/8 IPv4 address block to Regional internet registries
l Internet address space: (RIR) so IANA implement IPv6. The packet format
It increases address space 128 bit long instead of 32bit of IPv6 kept simple by adding fewer fields. All Fields
which is in IPv4. Due to increases the size it has more of IPv6 are described in the packet format in figure 1.
Version: Version is same as IPv4 which is used to EH are optional and are placed between IPv6 header
identify the version of the packet. It is of 4 bit long and payload. EH are used for fragmentation, for
field. For IPv6 always set version field to 0110 and security, to support mobility and more [6].
0100 for IPv4.
V. IPv4 and IPv6 Coexistence
Traffic Class: This field is same as type of service field
in IPv4. It is of 8 bit long field used for real time There is not a single date to move IPv6. Both Ipv4
application. It can be used to inform router and and Ipv6 will coexist. The transition is expected to
switches to maintain same path for the packet flow so take years. IETF (Internet engineering task force) has
that packet are not reordered. created various protocols and tools to help network
administrator migrate their network to IPv6. These
Payload Length: Payload length field is 16 bit long migration techniques are divided into three categories:
field. It is equivalent to total length field in IPv4.
Define entire packet size including header and optional Dual Stack: It allows Ipv4 and IPv6 to coexist on the
extensions [5]. same network. Dual stack devices run both IPv4 and
IPv6 protocol stack simultaneously.
Next header: Next Header field is 8 bit long field which
is similar to time to live field of IPv4. These values are Tunneling: It is method to transporting IPv6 packet
decremented by one by each router that forwards the over an IPv4 network. IPv6 packet is encapsulated
packets when value reaches zero packet is discarded inside an IPv4 packet similar to other type of data.
and ICMPv6 message is forwarded to sending host
Translation: NAT64 allows IPv6 enabled device to
indicate that packet did not reach to destination.
communicate with IPv4 enabled device using a
Source Address: It is of 128 bit long. This is used to translation technique similar to NAT for IPv4
specify the address of the sender who tries to send the
message. VI. Comparision and Analysis
Destination Address: It is of 128 bit long. This address IPv6 provides 340 undecillion addresses roughly equal
is used to specify the destination address that to sender to every grain of sand on earth. Some field are renamed
wants to sends the message. same. Some field from IPv4 is not used. Some field
changed name and position. In addition new field has
IPv6 packet might also contain extension header (EH)
been added to IPv6 which is not IPv4 [7]. The detailed
which provides optional network layer information.
comparison between Internets Protocol version 4 and
Version 6 are shown in Table 1. In Table 1 first column we are currently used. Definitely IPv6 is the best among
shows the various characteristic factor bases on these two because it comes after the IPv4 so it eliminate the
two are differ. While second column is for IPv4 and drawbacks of IPv6. IPv4 is the popular protocol which
third column is for IPv6 [8]. we use since long time due to this both protocol keeps
their equal importance. In this paper we can clearly
VII. Conclusion see that the IPv6 is better replacement of IPv4 which
IPv6 and IPv4 both are the Internet Protocols which will take time to overcome the IPv4.
References
1. W. Stalling, Data and Computer Communication, 5th Edition,upper saddle river, NJ: Prentice Hall, 2012.
2. M. Mackay and C. Edwards, “A Managed IPv6 Transitioning Architecture for Large Network Deployments,”
IEEE Internet Computing, vol. 13, no. 4, pp. 42 –51, july-aug. 2009.
3. S. Bradner and A. Mankin, IPng: internet protocol next generation reading, MA: Addision-Wesley, 2011.
4. R. Gillign and R. allon ,”IPv6 Transition mechanism overview” Connexions, oct 2002.
5. E. Britton, J. Tavs and R. Bournas, “TCP/IP: The next generation”, IBM sys, J.No. 3, 1995.
6. C. Huitema, IPv6 the new internet protocol, Upper saddle river, NJ. Prentice Hall, 1996
7. R. Hinden,”IP next generation overview” connexions, Mar 1995.
8. Fernandez, P. Lopez, M. A. Zamora, and A. F. Skarmeta, “Lightweight MIPv6 with IPSec support (Online
First, DOI: 10.3233/MIS-130171),” Mobile Information Systems, http://iospress.metapress.
9. G. Huston, “IPv4 Address Report,” Tech. Rep., Sep. 2010. [Online]. Available: http://www.potaroo.net/
tools/ipv4
10. S. Deering and R. Hinden, “Internet Protocol, Version 6 (IPv6) Speci- fication,” 1998, IETF RFC 2460.
11. S. Thomson, T. Narten, and T. Jinmei, “IPv6 Stateless Address Autoconfiguration,” 2007, IETF RFC 4862
12. R. Hinden and S. Deering, “IP Version 6 Addressing Architecture,” 2006, IETF RFC 4291
who, acted like a technology expert, offers free IT help to be good policies for successful defense against the
or innovation enhancements in return for login social engineering and all personnel should ensure to
accreditations. [1,4] Another regular case is a assailants, follow them. It is not about typical software system
acted like a specialist, requests access to the for Social engineering attacks but the people which in
organization’s system as a major aspect of an analysis themselves are quite fickle. There are certain counter
or experiment in return for Rs.1000/- . On the off measures which we can help in reduction of these
chance that an offer seems to be very genuine. Then is attacks.[18]
defiantly it is a quid pro quo.
Below mentioned are the prevention techniques for
D. Pretexting individual defense.
In pretexting preplanned situation is created (pretext) A. We should always be vigilant of any email which
to trap a targeted customer in order to reveal some asks for personal financial information or warns
sensitive information. In these type of situations of termination of online accounts instantly.
customer perform actions that are expected by a hacker B. If an email is not digitally signed, you cannot
and he caught into the trap and reveal his/her sensitive ensure if the same isn’t forged or spoofed. It is
information. [4] An elaborate lie, it most often involves highly recommendable to check the full headers
some prior research or setup and the use of this as anyone can mail by any mail.
information for impersonation (e.g., date of birth,
Social Security number, last bill amount) to establish C. Generally fraudulent person would ask for
legitimacy in the mind of the target. [5] information such as usernames, passwords, credit
card numbers, social security numbers, etc. This
E. Piggybacking kind of information is not asked normally by even
Other name for piggybacking is tailing. When a the authorized company representative. Hence one
unauthorized person physically follows an authorized should be careful.
person into an organization’s private area or system. D. You may find Phisher emails are generally not
Say for example sometimes a person request another personalized you may find something like this
person to hold the gate as he has forgotten his access “Dear Customer”. This is majorly because of the
card. Another example is to borrow someone’s laptop fact that these are intended to trap innocent people
or system for some times and installing malicious by sending mass mailers. Authorized mails will
software by entering into his restricted information have personalized beginning. However one should
zone. be vigilant as phisher could send specific email
intending to trap an individual. It could well then
F. Hoaxing
be like our case study.
Hoaxing is an endeavor to trap the people into thinking
something false is genuine. It likewise may prompt to E. One should very careful while contacting financial
sudden choices being taken because of fear of an institutions. It has to be thoroughly checked while
unfortunate incident. entering your critical information like bank card,
hard-copy correspondence, or monthly account
III. Preventions statement. Always keep in mind that the e-mails/
By educating self, user can prevent itself from the links could look very authentic however it could
problem of social engineering to large extent. be spurious.
Extremely common and easy way is not to give the F. One should always ensure that one is using a secure
password to anyone and by taking regular backup of website while submitting credit card or other
the data. There has to be strict action. Application of sensitive information via your Web browser.
authentication system like smart cards or biometrics
is a key. By doing this, you can prevent a high G. You should log on and change the password on
percentage of social engineering attempts. There has regular basis.[15]
H. Every bank, credit and debit card statements we humans are highly unpredictable due to sheer
should be properly checked and one should ensure curiosity and never ending greed without concern for
that all transactions are legitimate the consequences. We could very well face our own
I. You should not assume that website is legitimate version of a Trojan tragedy [11]. Biggest irony of social
just by looking at the appearance of the same. engineering attacks is that humans are not only the
J. One should avoid filling forms in email messages biggest problem and security risk, but also the best
or pop-up windows that ask for personal financial tool to defend against these attacks. Organizations
information. These are generally used by should definitely fight social engineering attacks by
spammers as well as phisher for attack in forming policies and framework that has clear sets of
future.[10] roles and responsibilities for all users and not just
security personnel. Also organization should make sure
IV. Conclusion that, these policies and procedures are executed by users
In today’s world, perhaps we could have most secured properly and without doubt regular training needs to
and sophisticated network or clear policies however be imparted given such incidents’ regular occurrence.
References
1. “Ouch” The monthly security newsletter for computer users issue(November 2014)
2. “Mosin Hasan, Nilesh Prajapati and Safvan Vohara” on “CASE STUDY ON SOCIAL ENGINEERING
TECHNIQUES FOR PERSUASION” in International journal on applications of graph theory in wireless
ad hoc networks and sensor networks (GRAPH-HOC) Vol.2, No.2, June 2010
3. “Christopher Hadnagy “ -A book on “Social Engineering -The Art of Human Hacking “Published by Wiley
Publishing, Inc. in 2011
4. The story of HP pretexting scandal with discussion is available at Davani, Faraz (14 August 2011). “HP
Pretexting Scandal by Faraz Davani”. Scribed. Retrieved 15 August 2011.
5. “Pretexting: Your Personal Information Revealed”, Federal Trade Commission
6. “Tim Thornburgh” on “Social Engineering: The Dark Art” published in ACM digital library Proceeding
New York in infoSecCD ’04 Proceedings of the 1st annual conference on Information security curriculum
development page 133-135.
7. “Valericã GREAVU-ªERBAN, Oana ªERBAN” on “ Social Engineering a General Approach” in Informatica
Economicã vol. 18, no. 2/2014
8. Malware : Threat to the Economy, Survey Study by Mosin Hasan, National Conference IT and Business
Intelligence (ITBI - 08)
9. White paper: Avoiding Social Engineering and Phishing Attacks,Cyber Security Tip ST04-014, by Mindi
McDowell,Carnegie Mellon University, June 2007.
10. Book of ‘People Hacking’ by Harl
11. FCAC Cautions Consumers About New “Vishing” Scam, Financial Consumer Agency of Canada, July 25,
2006.
12. Schulman, Jay. Voice-over-IP Scams Set to Grow, VoIP News, July 21, 2006.
13. Spying Linux: Consequences, Technique and Prevention by Mosin Hasan, IEEE International Advance
Computing Conference (IACC’09)
14. Redmon,- audit and policy Social Engineering manipulating source , Author: Jared Kee,SANS institute.
15. White paper ‘Management Update: How Businesses Can Defend against Social Engineering Attacks’ published
on March 16, 2005 by Gartner.
16. White paper, Social Engineering:An attack vector most intricate to tackle by Ashish Thapar.
17. The Origin of Social Engineering Bt Heip Dand MacAFEE Security Journal, Fall 2008.
18. Psychology: A Precious Security Tool by Yves Lafrance,SANS Institute,2004.
19. SOCIAL ENGINEERING: A MEANS TO VIOLATE A COMPUTER SYSTEM, By Malcolm Allen,
SANS Institute, 2007
20. Inside Spyware – Techniques, Remedies and Cure by Mosin hasan Emerging Trends in Computer Technology
National Conference