N 6 Mo LPKHK PCkazs H6 CZVy 5 Fs RZu BTGL Q91 Wdyw TV

Download as pdf or txt
Download as pdf or txt
You are on page 1of 93

IITM Journal of Management and IT

SOUVENIR
National Conference on Emerging Trends in Information Technology-
Advances in High Performance Computing, Data Sciences & Cyber Security
Volume 8 Issue 1 January-June, 2017

CONTENTS

Research Papers & Articles

Page No.
● Intelligent Cyber Security Solutions through High Performance 3-9
Computing and Data Sciences : An Integrated Approach
- Sandhya Maitra, Dr. Sushila Madan
● Applications of Machine Learning and Data Mining for Cyber Security 10-16
- Ruby Dahiya, Anamika
● Fingerprint Image Enhancement Using Different Enhancement Techniques 17-20
- Upender Kumar Agrawal, Pragati Patharia, Swati Kumari, Mini Priya
● Data Mining in Credit Card Frauds: An Overview 21-26
- Vidhi Khurana, Ramandeep Kaur
● Review of Text Mining Techniques 27-31
- Priya Bhardwaj, Priyanka Khosla
● Security Vulnerabilities of Websites and Challenges in Combating these Threats 32-36
- Dhananjay, Priya Khandelwal, Kavita Srivastava
● Security Analytics: Challenges and Future Directions 37-41
- Ganga Sharma, Bhawana Tyagi
● A Survey of Multicast Routing Protocols in MANET 42-50
- Ganesh Kumar Wadhwani, Neeraj Mishra
● Relevance of Cloud Computing in Academic Libraries 51-55
- Dr. Prerna Mahajan, Dr. Dipti Gulati
● A brief survey on metaheuritic based techniques for optimization problems 56-62
- Kumar Dilip, Suruchi Kaushik
● Cross-Language Information Retrieval on Indian Languages: A Review 63-66
- Nitin Verma, Suket Arora, Preeti Verma
● Enhancing the Efficiency of Web Data Mining using Cloud Computing 67-70
- Tripti Lamba, Leena Chopra
IITM Journal of Management and IT

Page No.

● Role of Cloud computing in the Era of cyber security 71-74


- Shilpa Taneja, Vivek Vikram Singh, Dr. Jyoti Arora
● Cryptography and its Desirable Properties in terms of different algorithm 75-81
- Mukta Sharma, Dr. Jyoti Batra Arora
● A Review: RSA and AES Algorithm 82-85
- Ashutosh Gupta, Sheetal Kaushik
● Evolution of new version of internet protocol (IPv6) : Replacement of IPv4 86-89
- Nargish Gupta, Sumit Gupta, Munna Pandey
● Social Engineering – Threats & Prevention 90-93
- Amanpreet Kaur Sara, Nidhi Srivastava

2 National Conference on Emerging Trends in Information Technology


Intelligent Cyber Security Solutions through
High Performance Computing and Data Sciences :
An Integrated Approach
Sandhya Maitra*
Dr. Sushila Madan**
Abstract
The recent advances in Data Sciences and HPC despite transforming the ongoing digitization to have a
positive impact on the social and economic aspect of our lives, have at the same time, given birth to
several security issues. Thus the face of Cyber security has changed in the recent times with the advent
of new technologies such as the Cloud, the internet of things, mobile/wireless and wearable technology.
The technological advances in data science which help develop contemporary cyber security solutions
are storage, computing and behavior. On the other hand high performance computing power facilitates
the usage of sophisticated machine learning techniques to build innovative models for identification of
malware. Big data holds vital importance in building analytical models which identify cyber attacks.
Besides High performance computing is necessary for supporting all aspects of data-driven research.
An integrated approach combining the technological benefits provided by predictive power of data
sciences and the aggregated parallel processing power of high performance computing would help
devise intelligent and powerful cyber security solutions supporting proactive and dynamic approach to
threat management to counteract the multitude of potentially new emerging cyber attacks.
Keywords: High Performance computing, Data Sciences, Machine Learning, Cyber Security

I. Introduction methods. This requires contextual problem solving


The researchers all over the world face challenges based on multidisciplinary approaches. The scale,
related to upsurge of voluminous data of many areas diversity, and complexity of Big Data necessitates the
such as Bioinformatics, Medicine, Engineering & advent of new architecture, techniques, algorithms,
Technology, GIS and Remote Sensing, Cognitive and analytics to manage it and extract value or hidden
science and Statistical data. Advanced algorithms, knowledge from it. Analytics research encompasses a
visualization techniques, data streaming metho- large range of problems of data mining research[1].
dologies and analytics are the need of the hour. These Data is increasingly becoming cheap and ubiquitous.
have to be developed within the constraints of storage The rapid growth in computer science and
and computational power, algorithm design, information technology in the recent times has led to
visualization, scalability, distributed data architectures, the generation of massive amount of data. This
data dimension reduction and implementation to avalanche of data has made a strong impact on almost
name a few. The other issues to be considered include all aspects of human life and fundamentally changed
optimization, uncertainty quantification, systems every field in science and technology. A multitude of
theory, statistics and types of model development new types of data is collected from web logs, sensors,
mobile devices, transactions and various instruments.
Sandhya Maitra* The emerging technologies such as data mining and
Research Scholar machine learning enable us to interpret this massive
Banasthali Vidyapith data. The High Performance Computing (HPC)
Dr. Sushila Madan** techniques are increasingly being used by organizations
Professor to efficiently and effectively deal with processing and
Lady Shri Ram College for Women storage challenges thrown by explosive growth of such
IITM Journal of Management and IT

enormous data. Advances in Networking, High End necessitating comprehensive security analysis,
Computers, Distributed and Grid computing, Large- assessment and action plans for protecting our critical
scale visualization and data management, Systems infrastructures and sensitive information[1].
reliability, High-performance software tools and
Cyber security in recent times demand secure systems
techniques, and compilation techniques are taking a
which help in detection of intrusions, identification
new era of high performance, parallel and distributed
of attacks, confinement of sensitive information to
computing. Over the past few decades security
security zones, data encryption, time stamping and
concerns are becoming increasingly important and
validation of data and documents, protection of
extremely critical in the realm of communication and
intellectual property, besides others. The current
information systems as they become more
security solutions require a mix of software and
indispensable to the society. With the continuous
hardware to augment the power of security algorithms,
growth of cyber connectivity and the ever increasing
real time analysis of voluminous data, rapid encryption
number of applications, remotely delivered services,
and decryption of data, identification of abnormal
and networked systems digital security has become
patterns, checking identities, simulation of attacks,
the need of the hour. Today government agencies,
validation of software security proof, patrol systems,
financial institutions, and business enterprises are
analysing video material and many more innumerable
experiencing security incidents and cyber-crimes, by
actions [2].
which attackers could generate fraudulent financial
transactions, commit crimes, perform an industrial Analysis of new and diverse digital data streams can
espionage, and disrupt the business processes. The reveal potentially new sources of economic value, fresh
sophistication and the borderless nature of the insights into customer behavior and market trends.
intrusion techniques used during a cyber security But this influx of new data creates challenges for IT
incident, have generated the need for designing new Industry. We need to have Information Security
active cyber defense solutions, and developing efficient measures to ensure a safe, secure and reliable cyber
incident response plans. With the number of cyber network, for the transmission and flow of
threats escalating worldwide, there is a need for information[1].
comprehensive security analysis, assessment and
actions to protect our critical infrastructures and III. High Performance computing
sensitive information[1]. The re-emergence of need for supercomputers for
cyber security stems from their computing capacity
II. Cyber Security
ability to perform large number of checks in an
The spectacular growth of cyber connectivity and the extremely short time particularly in the case of
monumental increase of number of networked financial transactions for the identification of cyber
systems, applications and remotely delivered services crimes using techniques featuring cross-analysis of data
cyber security has taken top precedence amongst other coming from several different sources[2]. The
issues. Attackers are able to effect fraudulent financial knowledge gained through HPC analysis and
transactions, perform industrial espionage, disrupt evaluation can be instrumental providing
business processes and commit crimes with much ease. comprehensive cyber security as it helps interpret the
Additionally government agencies are also multifaceted complexities involved in cyber space
experiencing security incidents and cyber-crimes of comprising complex technical, organizational and
dangerous proportions which can compromise on human systems[3].
Nations Security. The sophisticated intrusion
techniques used in the cyber security incidents and A combined system of Distributed sensor networks
their borderless nature have provided the impetus to and HPC cybersecurity systems such as exascale
design new active cyber defense solutions, and develop computing helps in real-time fast I/O HPC accelerated
efficient and novel incident response plans. The processing. This covers various issues such as data
number of cyber threats are escalating globally, collection, analysis and response to takes care of the

4 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

issues of data locality, transport, throughput, latency, existence of a secure perimeter as before where it was
processing time and return of information to defenders confined to secure data centers as data leaks out of
and defense devices. massive data centers into cloud, mobile devices and
An important set of HPC jobs has involved analytics, individual PCS . Most companies do not have policies
discovering patterns in the data itself as in prohibiting storage of data in mobiles while people
cryptography. The data explosion fueling the growth on the other hand prefer storing them on to their
of high performance data analysis originates from the mobiles with huge computing and storage power for
following factors: convenience and efficiency of operations.
1. The efficiency of HPC systems to run data- Cloud-based data mostly exists in commercial data
intensive modeling. centers, on shared networks, on multiple disk devices
2. Advent of larger, more complex scientific in the data center, and multiple data centers for the
instruments and sensor networks such as “smart” purpose of replication. The extremely difficult task
power grids. of developing Cloud security is now made possible
3. Growth of stochastic modeling (financial with new technologies such as HPC and machine
services), parametric modeling (manufacturing) learning.
and iterative problem-solving methods, whose Data from data centers should be moved to cloud only
cumulative results are large volumes of data. for business reasons with benefits outweighing the
4. Availability of newer advanced analytics methods costs of providing cloud security to protect it. Data
and tools: MapReduce/Hadoop, graph analytics, Inventories should be maintained in encrypted form,
semantic analysis, knowledge discovery algorithms tracked and managed well on mobile devices to
and others the escalating need to perform prevent theft of data. Additionally Cloud networks
advanced analytics by commercial applications in should be subjected to thorough penetration
near-real-time such as cloud. testing[5].
Data-driven research necessitates High performance
The value of cyber security data plays a major role in
computing. Big Data fuels the growth of HP data
constructing machine learning models. Value of a data
analysis[3]. Research on High Performance
is the predictive power of a given data model as well
Computing includes mainly networks, parallel and
as the type of hidden trends which reveal as a result of
high performance algorithms, programming
meticulous data analysis. The value of cyber security
paradigms and run-time systems for data science apart
data refers to the nature of data which can be positive
from other areas. High-performance computing
or negative. Positive data such as malicious network
(HPC) refers to systems that can rapidly solve difficult
traffic data either from malware or varied set of cyber
computational problems across a diverse range of
attacks hold higher value than data science problems
scientific, engineering, and business fields by virtue
as it can be used to build machine learning based
of their processing capability and storage capacity.
network security models. From cyber security view
HPC being at the forefront of scientific discovery and
point the predictive power of effective data models
commercial innovation, holds leading competitive
lies in the ability to differentiate normal network traffic
edge for nations and their enterprises[4]. India in an
from abnormal malicious traffic indicating active cyber
endeavour to meet its stated research and education
attack. Machine learning builds classifiers to identify
goals is making every effort towards doubling up its
network traffic as good or bad based on the analysis.
high performance computing capacity and is exploring
The spam filters are based on these techniques to
opportunities to integrate with global research and
identify normal emails from ad’s, phishing and other
education networks.
types of spam. Big Data helps build Classifiers to train
Cyber Security and Data Sciences a machine learning algorithm and also helps evaluate
The challenge of protecting sensitive data increased the classifiers performance. The positive data that a
exponentially in recent times because of the non spam classifier needs to detect is behavior exhibited

Volume 8, Issue 1 • January-June, 2017 5


IITM Journal of Management and IT

by a spam email. Similarly the network traffic aspects of our lives. At the same time these
exhibiting behavior of real cyber attacks is positive dependencies have also given rise to many security
data for a network security model. Negative data refers issues. The attackers in the cyber world are also getting
to normal data such as legitimate emails in case of more creative and ambitious in exploitation of
spam classifier and normal traffic data for a network techniques and causing real-world damages of major
security model. In both the cases the classifier should dimensions by making even proprietary as well as
be able to detect bad behavior without incorrectly personally identifiable information equally vulnerable.
classifying genuine mails or network traffic to be The problem is further compounded as designing
harmful. The various cyber security problems differ effective security measures in a globally expanding
on the basis of quick availability of positive data. In digital world is a demanding task. The issues to be
the case of spam emails positive data is easily available addressed include defining the core elements of the
in abundance for building a classifier. On the other cyber security, Virtual private network security
hand despite increased cyber attacks across various solutions, Security of wireless devices, protocols and
organizations positive data from real cyber attacks and networks, Security of key internet protocols,
malware infections can seldom be accessed. This is protection of information infrastructure and database
true for especially targeted attacks. The pace at which security. The advent of the Internet of Things (IoT)
the hackers modify their techniques to create also increased the need to step up cyber security. The
increasingly sophisticated attacks render libraries of Io T is a network of physical objects with embedded
malware samples quickly obsolete. In case of targeted technology to communicate, sense or interact with
attacks malware is custom built to steal or destroy data their internal states or the external environment where
in a secret manner. The predictive power of a machine a digitally represented object becomes something
learning model relies on the high value of positive greater than the object by itself or possesses ambient
samples in terms of its general nature for identifying intelligence. Despite its manifold advantages the rapid
potentially new cyber attacks. Additionally adoption of IoT by various types of organizations
performance on these models is highly influenced by escalated the importance of security and vulnerability.
the choice of features used to build them. The The computing world underwent a major
prerequisites for interpreting huge amount of positive transformation in terms of increased reliability,
samples are feature selection and appropriate training scalability, quality of services and economy with
techniques. The highly unbalanced nature of training emergence of cloud computing. Nevertheless, remote
data for a machine learning model is owing to negative storage of data in cloud away from owner can lead to
samples always being many orders of magnitude more loss of control of data. The success and wide spread
abundant than positive data samples. The application usage of cloud computing in future depends on
of proper evaluation metrics, sophisticated sampling effective handling of data security issues such as
methods and proper training data set balancing helps accountability, data provenance and identity and risk
us find out if we have the appropriate quantity of management. The face of Cyber security has changed
positive samples or not. The lengthy process of in the recent times with the advent of new technologies
collecting positive samples is one of the first and most such as the Cloud, the internet of things, mobile/
important tasks for building machine learning based wireless and wearable technology[1].
cyber security models. This is how big data is relevant The static data once contained within systems have
to cyber security[6]. now become dynamic and travel through a number
of routers, hosts and data centers. The hackers in cyber
Intelligent Cyber Security Solutions Powered criminals have started using Man-in-the-Middle
by HPC and Data Sciences attacks to eavesdrop on entire data conversations
The advances in Data Sciences and HPC have Spying software and Google Glass to track fingerprint
extended innumerable benefits and conveniences to movements on touch screens, Memory-scraping
our day to day activities and transformed the ongoing malware on point-of-sale systems, theft of specific data
digitization to deeply impact the social and economic by Bespoke attacks.

6 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Context-aware behavioral analytics treats unusual technologies. The cloud based applications which are
behavior as a symptom of an ongoing nefarious activity beyond the realm of firewalls and traditional security
in the computer system. measures can be secured by using a combination of
These cases can no longer be handled by tool based encryption and intrusion detection technologies to
approaches fire walls or antivirus machines. The gain control of corporate traffic. Cloud data can be
previous solutions no more succeed in managing risk protected by Security assertion Markup language, an
in recent technologies, there is an imperative need for XML based open standard format, augmented with
brand new solutions. Analytics help in identifying encryption and intrusion detection technologies. This
unusual or abnormal behaviors. Behavior based also helps control corporate traffic.
analytics approaches include Bio Printing, mobile Proxy based systems designed through SAML secure
location tracking, behavioral profiles, third party Big access and traffic, log activity, watermark files by
Data and external threat intelligence. Now a days embedding security tags into documents and other
hackers carefully analyze a system defenses and use files for tracking their movement and redirect traffic
Trojan horses and due to the velocity volume and through service providers. Such solutions neither
variety of big data security breaches cannot be require software to load on endpoints nor changes to
identified well in time. Solutions based on new end user configurations. Any kind of suspicious
technologies combining machine learning and activity such as failed or unexpected logins etc are
behavioral analytics help detect breaches and trace the alerted by notifications. The security administrators
source. User profiling is built and machine behavior can instantaneously erase corporate information
pattern studied to detect new type of cyber attacks, without effecting personal data of users. Active defense
the emphasis is on providing rich user interfaces which measures such as counter intelligence gathering, sink
help in interactive exploration and investigation. These holing, honey pots and retaliatory hacking can be
tools can detect strange behavior and changes in data. adopted to track and attack hackers. Counter
This problem can be solved by Virtual dispersive intelligence gathering is a kind of reverse malware
technologies which split the message into several analysis in which a cyber expert secretly finds
encrypted parts and routed on different independent information about hackers and their techniques. Sink
servers, computers and/or mobile phones depending holing servers hand out non routable addresses for all
on the protocol. domains within sink hole. Malicious traffic is
intercepted and blocked for later analysis by experts.
This problem can be solved by Virtual dispersive
Isolated systems called Honey pots such as computer,
technologies which split the message into several
data or network sites are set up to attract hackers.
encrypted parts and routed on different independent
Cyber security analysts to catch spammers to prevent
servers, computers and/or mobile phones depending
attacks etc.. Retaliatory hacking is most dangerous
on the protocol.
security measure which usually considered illegal as it
The traditional bottlenecks are thus completely may require infiltration into a hacker community,
avoided. The data dynamically travels on optimum build a hacking reputation to prove the hacking group
random paths also taking into consideration network of your credentials. None of these things being legal
congestion and other issues as well. Hackers find it raises debate over active defense measures. Early
difficult to find data parts. Furthermore in order to warning systems forecast sites and server likely to be
prevent cyber criminals exploiting the weak point of hacked using machine learning algorithms. These
the technology which is the place where two endpoints systems are created with the help of machine learning
must connect to a switch to enable secure and data mining techniques. Most of the algorithms
communication, hidden switches are used by VDN take into the account a website software, traffic
making them hard to find. statistic, file system structure or webpage structure. It
Critical infrastructures can be protected by security uses a variety of other signature features to determine
measures and standards provided by Smart Grid the presence of known hacked and malicious websites.

Volume 8, Issue 1 • January-June, 2017 7


IITM Journal of Management and IT

Notifications can be sent to website operators and solutions could not prevent zero day attacks for
search engines to exclude the results. Classifiers should unidentified malware as they lack predictive power of
be designed to adapt to emerging threats. Such security data science. Data science effectively uses scientific
measure is growing in its scope. The more data that techniques to draw knowledge from data. The ongoing
absorbs the better will be its accuracy[7]. security breaches accentuate the need for new
approaches for identification and prevention of
The cyber threats in recent times necessitate state of
malware. The technological advances in data science
the art dynamic approach to threat management. The
which help develop contemporary cyber security
Cyber security threats rapidly changing with
solutions are storage, computing and behavior. The
technological advancements. An application
storage aspect eases the process of collection and
vulnerability free today may be exposed to a major
storage of huge data on which analytic techniques are
unanticipated attack tomorrow. A few of recent
applicable. On the other hand high performance
examples are of Adobe Flash vulnerability allowing
computing power assists machine learning techniques
remote code execution, NTP (Network Time
to build novel models for identification of malware.
Protocol) issue allowing denial-of-service attacks,
The behavioral aspect had shifted from identification
Cisco ASA firewall exposure allowing for denial-of-
of malware with signatures to identify the specific kind
service attacks, and Apple, thought for a long time to
of behaviors exhibited by an infected computer. Big
be invulnerable, releasing iOS 9, quickly followed by
data plays a key role analytical models which identify
additional releases to correct newly discovered
cyber attacks. Any rule based model based on machine
exposures. The dynamic threats are the key challenges
learning requires large number of data samples to be
to information security and necessitate dynamic
analyzed in order to unearth the set of characteristics
security approaches for their mitigation. Neither were
of a model. Subsequently data is required to cross
these a resultant of negligence on the part of affected
check and assess the performance of a model.
parties nor was it the result of a change affected by
these parties in the products. The information security Application of machine learning tools to enterprise
programs should be proactive, agile and adaptive. A security gives rise to a new set of solutions. These tools
few of the strategies for moving from static to a can analyze networks, learn about them, detect
dynamic is by making vulnerability checks a regular anomalies and protect enterprises from threats[9].
and frequent task with monthly external scans and
Machine learning increased in its popularity with the
internal scans conducted on same schedule or when
advent of high performance computing resources. This
software or configuration changes are made, whichever
has resulted in the development of off-the-shelf
happens first, paying attention to fundamentals such
machine learning packages which allow complex
as checking logs and auditing access rights. Firmware
machine learning algorithms to be trained and tested
updates should be top priority as many of the
on huge data samples. The aforementioned
exposures we face today result from issues found in
characteristics render machine learning as an
the firmware of devices attached to our network score
indispensable tool for developing cyber security
devices such as routers and firewalls, or Internet of
solutions. Machine learning is a broader data science
Things devices, such as printers and copiers. Threat
solution for detecting cyber attacks. Minor changes
sources should be studied on a regular basis[8].
in malware can leave Intrusion Prevention Systems
Data science techniques help in the prediction of types and Next-generation Fire wall perimeter security
of security threats decides reacting to these threats. solutions performing signature matching in network
Data sciences and cyber security were highly isolated traffic ineffective. The rigorous analytical methods of
disciplines until recent times. The cyber security data sciences differentiate abnormal behavior defining
solutions are usually based on signatures which use an infected machine after identifying normal behavior
pattern matching with prior identified malware to through repetitive usage. Therefore contemporary
capture cyber attacks. But these signature based cyber security solutions require big data samples and

8 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

advanced analytical methods to build data-driven by exploring the predictive power of machine
solutions for malware identification and detection of learning and data mining approaches.
cyber attacks. This results in spectacular improvement l Machine learning approaches require Big Data
of cyber security efficacy[10]. for training models.
Conclusions l Big Data can be efficiently processed in real time
using High Performance Computing.
l Cyber Security Solutions should be more
proactive and dynamic. l Cloud Computing, IoT can be highly risk prone
in the absence of effective security framework.
l Effective Cyber Security Solutions for future
l The Solution to Future security needs lies in
threats can be achieved by exploiting the
integrating the processing and storage power of
processing and storage power of High
High Performance Computing with predictive
Performance Computing.
power of machine learning and data mining
l Intelligent Cyber Security Solutions can be built techniques.

References
1. S. Maitra, “NCETIT’2017", iitmipu.ac.in, 2017. [Online]. Available: http://iitmipu.ac.in/wp-content/
uploads/2017/02/NCETIT-2017-Brochure.pdf. [Accessed: 14- Feb- 2017].
2. “HPC solutions for cyber security”, Eurotech.com, 2017. [Online]. Available: https://www.eurotech.com/
en/hpc/industry+solutions/cyber+security. [Accessed: 11- Feb- 2017].
3. C. Keliiaa and J. Hamlet, “National Cyber Defense High Performance Computing and Analysis: Concepts,
Planning and Roadmap”, Sandia National Laboratories, New Mexico, 2010.
4. S. Tracy, “Big Data Meets HPC”, Scientific Computing, 2014. [Online]. Available: http://
www.scientificcomputing.com/article/2014/03/big-data-meets-hpc. [Accessed: 11- Feb- 2017].
5. R. Covington, “Risk Awareness:The risk of data theft — here, there and everywhere”, IDG Contributor
Network, 2016.
6. D. Pegna, “Cybersecurity, data science and machine learning: Is all data equal?”, Cybersecurity and Data
Science, 2015.
7. “Hot-technologies-cyber-security”, cyberdegrees, 2017. [Online]. Available: http://www.cyberdegrees.org/
resources/hot-technologies-cyber-security/. [Accessed: 04-Feb- 2017].
8. R. Covington, “Risk Awareness:Is your information security program giving you static?”, : IDG Contributor
Network, 2015.
9. B. Violino, “Machine learning offers hope against cyber attacks”, Network World, 2016.
10. D. Pegna, “Cybersecurity and Data Science:Creating cybersecurity that thinks”, IDG Contributor Network,
2015.

Volume 8, Issue 1 • January-June, 2017 9


Applications of Machine Learning and Data Mining
for Cyber Security
Ruby Dahiya*
Anamika**
Abstract
Security is an essential objective in any digital communication. Nowadays, there is enormous information,
lots of protocols, too many layers and applications, and massive use of these applications for various
tasks. With this wealth of information, there is also too little information about what is important for
detecting attacks. Methods of machine learning and data mining can help to build better detectors from
massive amounts of complex data. Such methods can also help to discover the information required to
build more secure systems, free of attacks. This paper will highlight the applications of machine learning
and data mining techniques for securing data in huge network of computers. This paper will also
present the review of applications of data mining and machine learning in the field of computer security.
The papers which will be reviewed here, present the results of various techniques of data mining and
machine learning on different performance parameters.
Keywords: Data mining, Machine Learning, Artificial Neural Networks, Classification, Clustering,
Inductive Learning, Evolution Learning, Support Vector Machine.

I. Introduction of these systems is high false alarm rates (FAR). The


As technology moves forward user become more hybrid approach uses the combination of both
technical aware then before. People communicate and signature-based and anomaly-based techniques. These
corporate efficiently through the internet using their types of system have high detection rate of known
PC’s, PDs or mobile phones. Through these digital attacks and low false positive rates for unknown
devices link by the internet, hacker also attack personal attacks. The literature review shows that most of the
privacy using a variety of weapons such as virus, techniques were actually hybrid. The security
worms, botnet attacks, spam and social engineering mechanisms are also categorized as: network based and
platforms. These forms of attack can be categorized host based. A network-based system monitors the
into three groups- Stilling confidential information, traffic through the network devices. A host based
manipulating the components of cyber infrastructures system monitors the processes and the file related
and denying the functions of infrastructure. There are activities associated with a specific host. However
three approaches to deal with these attacks: signature- building a defense system for discovered attacks is not
based, anomaly-based and hybrid. The signature based easy because of constantly evolving cyber attacks. The
detection system use the particular signature of an figure 1 depicts the cyber security mechanism.
attack, hence are unable to detect unknown attacks. This paper is intended for readers who wish to begin
The anomaly-based system detects the anomalies as research in the field of machine learning and data
the deviation from the normal behavior so they can mining for cyber security. This paper highlights ML
detect unknown attacks as well. The main disadvantage and DM techniques used for cyber security. The paper
describes ML and DM techniques in reference to
Ruby Dahiya* anomaly method and signature based hybrid methods
Associate Professor (IT) however the in depth description of these methods is
Institute of Information Technology & Management in the paper of Bhuyan et al. [1]. This paper focuses
Anamika** on cyber intrusion detection for both wired and
Assistant Professor (IT) wireless networks. The paper Zhang el al. [2] focuses
Institute of Information Technology & Management more on dynamic networking.
IITM Journal of Management and IT

Figure1. Cyber Security System

The paper is organized as follow: section II highlights finding correlations or patterns among dozens of fields
the procedure of Machine Learning and Data Mining. in large relational databases. The following are areas
Section III describes the techniques of ML and DM. in which data mining technology may be applied or
Section IV presents and discusses the comparative further developed for intrusion detection
analysis of individual technique and related work.
l Development of data mining algorithms for
Section V presents the conclusion.
intrusion detection: Data mining algorithms can
II. Machine Learning and Data mining be used for misuse detection and anomaly
Procedure detection. The techniques must be efficient and
scalable, and capable of handling network data of
The ML and DM are two terms that are often confused high volume, dimensionality and heterogeneity.
because generally, they both have same techniques.
Machine Learning, a branch of artificial intelligence, l Association and correlation analysis and
was originally employed to develop hniques to enable aggregation to help select and build discriminating
computers to learn. Arthur Samuel in 1959 defined attributes: Association and correlation mining can
Machine Learning as a “field of study that gives be applied to find relationships between system
computers the ability to learn without being explicitly attributes describing the network data. Such
programmed”[3]. ML algorithm applies classification information can provide insight regarding the
followed by prediction, based on known properties selection of useful attributes for intrusion
learned from the training data. ML algorithms need a detection.
well defined problem from the domain where as DM l Analysis of stream data: Due to the transient and
focuses on the unknown properties in the data dynamic nature of intrusions and malicious attacks,
discovered priory. DM focuses on finding new and it is crucial to perform intrusion detection in the
interesting knowledge. An ML approach consists of data stream environment. It is necessary to study
two phases: training and testing. These phases include what sequences of events are frequently
classification of training data, feature selection, training encountered together, finding sequential patterns,
of the model and use of model for testing unknown and identify outliers.
data.
l Distributed data mining: Intrusions can be
Data mining is the process of analyzing data from launched from several different locations and
different perspectives and summarizing it into useful targeted to many different destinations.
information. Data mining software is one of a number Distributed data mining methods may be used to
of analytical tools for analyzing data. It allows users to analyze network data from several network
analyze data from many different dimensions or angles, locations in order to detect these distributed
categorize it, and summarize the relationships attacks.
identified. Technically, data mining is the process of

Volume 8, Issue 1 • January-June, 2017 11


IITM Journal of Management and IT

l Visualization and querying tools: Visualization mining works only on binary data i.e. an item was
tools should be available for viewing any either present in the transaction will be represented
anomalous patterns detected. Intrusion detection by 1 or 0 if not. But, in the real world applications,
systems should also have a graphical user interface data are either quantitative or categorical for which
that allows security analysts to pose queries Boolean rules are unsatisfactory. To overcome this
regarding the network data or intrusion detection limitation, Fuzzy Association Rule Mining was
results. introduced [10], which can process numerical and
categorical variables.
III. Techniques of ML and DM
An algorithm based on Signature Apriori method was
This section focuses on the various ML/DM
proposed by Zhengbing et al. [11] that can be applied
techniques for cyber security. Here, each technique is
to any signature based systems for the inclusion of
elaborated with references to the seminal work. Few
new signatures. The work of Brahmi [12] using
papers of each technique related to their applications
multidimensional Association rule mining is also very
to cyber security.
promising for creating signatures for the attacks. It
A. Artificial Neural Networks:Neural Networks follow showed the detection rate of attacks types DOS, Probe,
predictive model which are based on biological U2R and R2L as 99%, 95%, 75% and 87%
modeling capability and predicts data by a learning respectively. Association rule mining is used in
process. The Artificial Neural Networks (ANN) is NETMINE [35] for anomaly detection. It applied
composed of connected artificial neurons capable of generalization association rule extraction based on
certain computations on their inputs [4]. When ANN Genio algorithm for the identification of recurring
is used as classifiers, the each layer passes its output as items. The fuzzy association rule mining is used by
an input to the next layer and the output of the last Tajbakhsh et al. [38] to find the related patterns in
layer generates the final classification category. KDD 1999 dataset. The result showed good
performance with 100 percent accuracy and false
ANN are widely accepted classifiers that are based on
positive rate of 13%. But, the accuracy falls drastically
perceptron [5] but suffer from local minima and
with fall of FPR.
lengthy learning process. This technique of ANN is
used for as multi-category classifier for signature-based C. Bayesian Networks: A Bayesian is a graphical model
detection by Cannady [6]. He detected 3000 simulated based on probabilities which represents the variables
attacks from a dataset of events. The findings of the and their relationships [15], [16]. The network is
paper reported almost 93% accuracy and error rate designed with nodes as the continuous or discrete
0.070 root mean square. This technique is also used variables and the relationship between them is
by Lippmann and Cunningham [27] for anomaly represented by the edges, establishing a directed acyclic
detection. They used keyword selection based on graph. Each node holds the states of the random
statistics and fed it to ANN which provides posterior variable and the conditional probability form.
probability of attack as output. This approach showed
Livadas et al. [17] presented comparative results of
80% detection rate and hardly one false alarm per day.
various approaches to DOS attack. The anomaly
Also, a five-stage approach for intrusion detection was
detection approach is mainly reactive whereas
proposed by Biven et al. [8] that fully detected the
signature-based is proactive. They tried to detect
normal behavior but FAR is 76% only for some
botnets in Internet Relay Chat (IRC) traffic data. The
attacks.
analysis reported the performance of Bayesian
B. Association Rules and Fuzzy Association Rules: networks as 93% precision and very low FP rate of
Association Rule Mining was introduced by Agarwal 1.39%.Another IDS based on Bayesian networks
et.al. [9], as a way to find interesting co-occurrences classifiers was proposed by Jemili et al. [18] with
in super market data to find frequent set of items which performances of 89%, 99%, 21% and 7% for DOS,
bought together. The traditional association rule Probe, U2R and R2L respectively. Benferhat [19] also
used this approach to build IDS for DOS attack.

12 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

D. Clustering: Clustering is unsupervised technique labels of KDD dataset by Zhang et al. [26] with the
to find patterns in high-dimensional unlabeled data. use of Random Forests. The Random forest was used
It is used to group data items into clusters based on a as the proximity measure. The accuracy for the DOS,
similarity measure which are not predefined. Probe, U2R and R2L attacks were 95%, 93%, 90%
and 87% respectively. The FAR is 1%.
This technique was applied by Blowers and Williams
[20] to detect anomaly in KDD dataset at packet level. G. Evolutionary Computation: It is the collective name
They used DBSCAN clustering technique. The study for a range of problem-solving techniques like Genetic
highlighted various machine learning techniques for Algorithms, genetic programming, particle swarm
cyber security. Sequeira and Zaki [21] performed optimization, ant colony optimization and evolution
detection over shell commands data to identify whether strategies based on principles of biological evolution.
the user is a legitimate one or intruder. Out of various
The signature-based model was developed by Li [27]
approaches for sequence matching, the longest
with genetic algorithms used for evolving rules.
common sequence was the most appropriate one. They
Abraham et al. [28] also used genetic programming
stated the performance in terms of 80% accuracies
techniques to classify attacks in DARPA 1998
and 15% false alarm rate.
intrusion detection dataset.
E. Decision Trees: It is a tree like structure where the
H. Inductive Learning: It is a learning method where
leaf node represents or predicts the decision and the
learner starts with specific observations and measures,
non-leaf node represents the various possible
begins to detect patterns and regularities, formulates
conditions that can occur. The decision tree technique
some tentative hypothesis to be explored and ends up
has simple implementation, high accuracy and
with development of some general conclusion and
intuitive knowledge expression. This expression is large
theories. Inductive learning moves from bottom-up
for small trees and less for deeper and wider trees. The
that is from specific observations to broader
common algorithms for creating decision tree are ID3
generalizations and theories. Repeated Incremental
[22] and C4.5 [23].
Pruning to Produce Error Reduction RIPPER [29]
Kruegel and Toth [24] proposed clustering along with applies separate and conquer approach to induce rules
decision tree approach to build a signature detection in two-class problems. Lee et al. [31] provided a
system and compared its performance to framework for signature-based model using various
SNORT2.0.The speed up varies from 105% to 5 %, machine learning and data mining techniques like
depending on the traffic. This paper showed that the inductive learning, association rules, sequential pattern
combination of decision trees with clustering mining etc.
technique can prove an efficient IDS approach. The
I. Naïve Bayes: It is a simple probabilistic classification
decision tree approach using WEKA J48 program was
technique based on Bayes’ Theorem with an
also used in EXPOSURE [25] to detect the malicious
assumption of independence among predictors. In
domains like botnet command, scam hosts, phishing
simple terms, a Naive Bayes classifier assumes that the
sites etc. Its performance is satisfactory in terms of
presence of a particular feature in a class is unrelated
accuracy and FAR.
to the presence of any other feature.Panda and Patra
F. Ensemble Learning: It is a supervised machine [31] presented the comparison of Naïve Bayes with
learning paradigm where multiple learners are trained NN classifier and stated that Naïve Bayes performed
to solve the same problem. As compared with ordinary better in terms of accuracy but not in FAR. Amor et.
machine learning approaches which try to learn one al. [32] used Bayesian network as naïve bayes classifier.
hypothesis from training data, ensemble methods try The paper stated accuracy of 98% with less than 3%
to construct a set of hypotheses and combine them to false alarm rate.
use.
J. Support Vector Machine: A Support Vector Machine
An outlier detector was designed to classify data as (SVM) is a discriminative classifier formally defined
anomaly as well as to classify it to one of the attack by a separating hyper plane. In other words, given

Volume 8, Issue 1 • January-June, 2017 13


IITM Journal of Management and IT

Table1. Analysis of ML and DM techniques


ML/DM Technique Method Data Set Evaluation Metric Work
ANN Signature based Network Packet level Acc., RMS Cannady
ANN Anomaly DARPA 1998 DR, FAR Lippmann & Cunningham
ANN Anomaly DARPA 1999 DR, FAR Bivens et. al.
Association Rules Signature based DARPA 1998 DR Brahmi et. al.
Association Rules Signature based Signature attacks Runtime Zhengbing et. al.
Association Rules - Fuzzy Hybrid KDD 1999 (corrected) Acc., FAR Tajbakhsh et. al.
Bayesian Network Signature based Tcpdump- botnet traffic Precision, FAR Livadas et. al.
Bayesian Network Signature based KDD 1999 DR Jemili et. al.
Clustering- density based Anomaly KDD 1999 DR but no actual FAR Blowers and Williams
Clustering – Sequence Anomaly Shell Commands Acc., FAR Sequeira and Zaki
Decision Tree Signature based DARPA 1999 Speedup Kruegel and Toth
Ensemble – Random Forest Hybrid KDD 1999 Acc., FAR Zhang et. al.
Evolutionary Computing (GA) Signature based DARPA 2000 Acc. Li
Evolutionary Computing (GP) Signature based DARPA 1998 FAR Abraham et. al.
Inductive Learning Signature based DARPA 1998 Acc. Lee et. al.
Naïve Bayes Signature based KDD 1999 Acc., FAR Panda & Patra
Naïve Bayes Anomaly KDD 1999 Acc., FAR Amor et. al.
Support Vector Machine Signature based KDD 1999 Acc. Li et. al.
Support Vector Machine Anomaly DARPA 1998 Acc., FAR Hu et. al.

labeled training data (supervised learning), the researches have used accuracy, detection rate, false
algorithm outputs an optimal hyper plane which alarm rate as the evaluation criteria. There have been
categorizes new examples. multiple approaches that are applied for both anomaly
and signature-based detection. Several approaches are
An SVM classifier was built to classify KDD 1999
appropriate for signature-based others are for anomaly
dataset by Li et. al.[33] using ant colony optimization
detection. But, the answer to the question about
for the trainee. This study showed 98% accuracy,
determination of most appropriate approach depends
however it is not performing well for U2R attacks.
on multiple factors like the quality of the training data,
RSVM(Robust Support Vector Machine) was used as
properties of that data, working of the system(online
anomaly classifier by Hu et. al.[34] which showed a
or offline) etc.
better performance with noise having 75% accuracy
with no false alarms. V. Conclusions
IV. Comparative Analysis And Discussion In this paper, we survey a wide spectrum of existing
studies on machine learning and data mining
The analysis of the work using of ML and DM for
techniques applied for the cyber security. Based on
cyber security highlights few facts about the growing
this analysis we then outline key factors that need to
research area in this field. From the comparative
be considered while choosing the technique to develop
analysis presented in Table 1, it is obvious that the
an IDS. These are the quality and properties of the
DARPA 1998, DARPA 1999, DARPA2000 KDD
training data, the system type for which the IDS has
1998, KDD 1999 are the favorite choices of most of
to be devised and the working nature and environment
the researchers for the dataset for IDS. Most of the

14 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

of the system. There is a strong need to develop strong for the cyber detection using some fast incremental
representative dataset augmented by network data level. learning ways.
There is also a need to regular updating of the models
References
1. M. Bhuyan, D. Bhattacharyya, and J. Kalita, “Network anomaly detection:Methods, systems and tools,”
IEEE Commun. Surv. Tuts., vol. 16, no. 1, pp. 303–336, First Quart. 2014.
2. Y. Zhang, L. Wenke, and Y.-A. Huang, “Intrusion detection techniques for mobile wireless networks,” Wireless
Netw., vol. 9, no. 5, pp. 545–556, 2003.
3. J. McCarthy, “Arthur Samuel: Pioneer in Machine Learning,” AI Magazine, vol. 11, no. 3, pp. 10-11, 1990.
4. K. Hornik,M. Stinchcombe, and H.White, “Multilayer feedforward networks are universal approximators,”
Neural Netw., vol. 2, pp. 359–366,1989.
5. F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,”
Psychol. Rev., vol. 65, no. 6,pp. 386–408, 1958.
6. J. Cannady, “Artificial neural networks for misuse detection,” in Proc 1998 Nat. Inf. Syst. Secur. Conf.,
Arlington, VA, USA, 1998, pp. 443–456.
7. R. P. Lippmann and R. K. Cunningham, “Improving intrusion detection performance using keyword selection
and neural networks,” Comput.Netw., vol. 34, pp. 597–603, 2000.
8. A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M. Embrechts, “Network-based intrusion detection
using neural networks,” Intell. Eng.Syst. Artif. Neural Netw., vol. 12, no. 1, pp. 579–584, 2002.
9. R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,”
in Proc. Int. Conf. Manage. Data Assoc. Comput. Mach. (ACM), 1993, pp. 207–216.
10. C. M. Kuok, A. Fu, and M. H. Wong, “Mining fuzzy association rules in databases,” ACM SIGMOD Rec.,
vol. 27, no. 1, pp. 41–46, 1998.
11. H. Brahmi, B. Imen, and B. Sadok, “OMC-IDS: At the cross-roads of OLAP mining and intrusion detection,”
in Advances in Knowledge Discovery and Data Mining. New York, NY, USA: Springer, 2012, pp. 13–24.
12. H. Zhengbing, L. Zhitang, and W. Junqi, “A novel network intrusion detection system (NIDS) based on
signatures search of data mining,” in Proc. 1st Int. Conf. Forensic Appl. Techn. Telecommun. Inf. Multimedia
Workshop (e-Forensics ‘08), 2008, pp. 10–16.
13. D. Apiletti, E. Baralis, T. Cerquitelli, and V. D’Elia, “Characterizing network traffic by means of the NetMine
framework,” Comput. Netw., vol. 53, no. 6, pp. 774–789, Apr. 2009.
14. A. Tajbakhsh, M. Rahmati, and A. Mirzaei, “Intrusion detection using fuzzy association rules,” Appl. Soft
Comput., vol. 9, pp. 462–469, 2009.
15. D. Heckerman, A Tutorial on Learning with Bayesian Networks. New York, NY, USA: Springer, 1998.
16. F. V. Jensen, Bayesian Networks and Decision Graphs. New York, NY, USA: Springer, 2001.
17. C. Livadas, R.Walsh, D. Lapsley, andW. Strayer, “Usingmachine learning techniques to identify botnet
traffic,” in Proc 31st IEEE Conf. Local Comput. Netw., 2006, pp. 967–974.
18. F. Jemili, M. Zaghdoud, and A. Ben, “A framework for an adaptive intrusion detection system using Bayesian
network,” in Proc. IEEE Intell. Secur. Informat., 2007, pp. 66–70.

Volume 8, Issue 1 • January-June, 2017 15


IITM Journal of Management and IT

19. S. Benferhat, T. Kenaza, and A. Mokhtari, “A Naïve Bayes approach for detecting coordinated attacks,” in
Proc. 32nd Annu. IEEE Int. Comput. Software Appl. Conf., 2008, pp. 704–709.
20. M. Blowers and J. Williams, “Machine learning applied to cyber operations,” in Network Science and
Cybersecurity. New York, NY, USA: Springer, 2014, pp. 55–175.
21. K. Sequeira and M. Zaki, “ADMIT: Anomaly-based data mining for intrusions,” in Proc 8th ACM SIGKDD
Int. Conf. Knowl. Discov. Data Min., 2002, pp. 386–395.
22. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986.
23. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufmann, 1993.
24. C. Kruegel and T. Toth, “Using decision trees to improve signature based intrusion detection,” in Proc. 6th
Int. Workshop Recent Adv. Intrusion Detect., West Lafayette, IN, USA, 2003, pp. 173–191.
25. L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE: Finding malicious domains using passive
DNS analysis,” presented at the 18th Annu. Netw. Distrib. Syst. Secur. Conf., 2011.
26. J. Zhang, M. Zulkernine, and A. Haque, “Random-forests-based network intrusion detection systems,”
IEEE Trans. Syst. Man Cybern. C: Appl. Rev., vol. 38, no. 5, pp. 649–659, Sep. 2008.
27. W. Li, “Using genetic algorithms for network intrusion detection,” in Proc. U.S. Dept. Energy Cyber Secur.
Group 2004 Train. Conf., 2004, pp. 1–8.
28. A. Abraham, C. Grosan, and C. Martin-Vide, “Evolutionary design of intrusion detection programs,” Int. J.
Netw. Secur., vol. 4, no. 3, pp. 328–339, 2007.
29. W. W. Cohen, “Fast effective rule induction,” in Proc. 12th Int. Conf. Mach. Learn., Lake Tahoe, CA, USA,
1995, pp. 115–123.
30. W. Lee, S. Stolfo, and K. Mok, “A data mining framework for building intrusion detection models,” in Proc.
IEEE Symp. Secur. Privacy, 1999, pp. 120–132.
31. M. Panda and M. R. Patra, “Network intrusion detection using Naïve Bayes,” Int. J. Comput. Sci. Netw.
Secur., vol. 7, no. 12, pp. 258–263, 2007.
32. N. B. Amor, S. Benferhat, and Z. Elouedi, “Naïve Bayes vs. decision trees in intrusion detection systems,” in
Proc ACMSymp. Appl. Comput., 2004, pp. 420–424.
33. Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, and K. Dai, “An efficient intrusion detection system based on support
vector machines and gradually feature removal method,” Expert Syst. Appl., vol. 39, no. 1, pp. 424–430,
2012.
34. W. J. Hu, Y. H. Liao, and V. R. Vemuri, “Robust support vector machines for anomaly detection in computer
security,” in Proc. 20th Int. Conf. Mach. Learn., 2003, pp. 282–289.

16 National Conference on Emerging Trends in Information Technology


Fingerprint Image Enhancement Using Different
Enhancement Techniques
Upender Kumar Agrawal*
Pragati Patharia**
Swati Kumari***
Mini Priya****
Abstract
Fingerprint identification is one of the most reliable biometrics technologies. It has applications in many
fields such as voting, ecommerce, banking military etc for security purposes. In this paper, we have
apllied the Histogram Equalization and Adaptive Histogram Equalization. We have evaluated the
performance of the enhancement image method by testing it with fingerprint images.
Keywords: HE, AHE, DNA, CLAHE

I. Introduction
Image Enhancement is one of the necessary step for
better analysis. There are various methods to improve
the contrast of images [1-3]. Fingerprints are unique
patterns, made by friction ridges (raised) and furrows
(recessed), which appear on the pads of the fingers
and thumbs. They form pressure on a baby’s tiny, Loops - prints that recurve back on themselves to form
developing fingers in the womb. The fingerprints are a loop shape. Divided into radial loops (pointing
unique. No two persons have been found to have the toward the radius bone, or thumb) and ulnar loops
same fingerprints — Fingerprints are even more (pointing toward the ulna bone or pinky), loops
unique than DNA, the genetic material in each of account for approximately 60 percent of pattern types.
our cells. Although identical twins can share the same Whorls - form circular or spiral patterns, like tiny
DNA - or at least most of it -they can’t have the same whirlpools. There are four groups of whorls: plain
fingerprints. Friction ridge patterns are grouped into (concentric circles), central pocket loop (a loop with
three distinct types—loops, whorls, and arches—each a whorl at the end), double loop (two loops that create
with unique variations, depending on the shape and an S-like pattern) and accidental loop (irregular
relationship of the ridges: shaped). Whorls make up about 35 percent of pattern
types.
Upender Kumar Agrawal* Arches - create a wave-like pattern and include plain
upeagrawal@gmail.com arches and tented arches. Tented arches rise to a sharper
Pragati Patharia** point than plain arches. Arches make up about five
pathariapragati@gmail.com percent of all pattern types.
Swati Kumari*** 2. Histogram Eqalization
swati.kumari3661@gmail.com
Histogram equalization (HE) is one of the popular
Mini Priya
technique for contrast enhancement of images. It is
minipriya9496@gmail.com
Guru Ghasidas Viswavidyalya, Bilaspur one of the well-known methods for enhancing the
IITM Journal of Management and IT

contrast of a given image in accordance with the PSNR = 20log10 (Max(Y(i,j) RMSE)
samples distribution. HE is a simple and effective
Greater the value of PSNR better the contrast
contrast enhancement technique which distributes
enhancement of the image.
pixel values uniformly such that enhanced image have
linear cumulative histogram. HE has been widely 3. Adaptive Histogram Equalization
applied when the image need enhancement, such as
Adaptive histogram equalization (AHE) is a image
medical image processing radar image processing,
processing technique used to improve contrast in
texture synthesis and speech recognition.
images [1-3]. It differs from ordinary histogram
It stretches the contrast of high histogram regions and equalization in the respect that the adaptive method
compresses the contrast of low histogram region. The computes several histograms, each corresponding to
goal of histogram equalization is to remap the image a distinct section of the image, and uses them to
grey levels so as to obtain a uniform (flat) histogram redistribute the lightness values of the image. It is
in the other words to enhance the image quality .HE therefore suitable for improving the local contrast and
based methods are reviewed and compared with image enhancing the definitions of edges in each region of
quality measurement (IQM) tools such as Peak Signal an image. However, AHE has a tendency to over
to Noise Ratio (PSNR) to evaluate contrast amplify noise in relatively homogeneous regions of an
enhancement. image. A variant of adaptive histogram equalization
called contrast limited adaptive histogram
Peak Signal to Noise Ratio (PSNR)
equalization (CLAHE) prevents this by limiting the
Let, X(i,j) is a source image that contains M by N amplification. The size of the neighbourhood region
pixels and a reconstructed image Y(i,j), where Y is is a parameter of the method. It constitutes a
reconstructed by decoding the encoded version of characteristic length scale: contrast at smaller scales is
X(i,j). In this method, errors are computed only on enhanced, while contrast at larger scales is reduced [4-
the luminance signal; so, the pixel values X(i,j) range 5].Due to the nature of histogram equalization, the
between black (0) and white (255)[6-7]. First, the result value of a pixel under AHE is proportional to
mean squared error (MSE) of the reconstructed image its rank among the pixels in its neighbourhood. This
is calculated. The root mean square error is computed allows an efficient implementation on specialist
from root of MSE. Then the PSNR in decibels (dB) is hardware that can compare the centre pixel with all
computed as; other pixels in the neighbourhood.

4. Original Data Of Fingerprint Thumb Impression :

Fig 1: Sample variations of individual left hand thumb impression showing arches, loops and whorls.

18 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

5. Results And Comparision we enhance the fingerprint image using histogram and
The above discussed methodologies have been adaptive histogram techniques. Results from the above
implemented by using Matlab. For the testing purpose implementation are in described in the following
we have created two Image Database. At first we section.
captured fingerprint image using mobile camera then

Fig 2. Original image and its histogram, Histogram equalization and its histogram,
Adaptive histogram equalization and its histogram.

Comparision of PSNR

(Variation of histogram technique)


6. Conclusion and for brightness preserving .Secondly by using
Based on the result of the experiment phase in this Adaptive Histogram Equalization (AHE) is an
research we found. Firstly, the use of Histogram excellent contrast enhancement method for both
Equalization enable to increase fingerprint contrasts natural images and medical and other initially non-

Volume 8, Issue 1 • January-June, 2017 19


IITM Journal of Management and IT

visual images. As conclusion, the proposed Technique that the PSNR of adaptive histogram equalization is
produces a fine fingerprint image quality. This graph more than histogram equalization.
shows the comparison of PSNR. The output shows

References
1. Z. M. Win and M. M. Sein, ¯Fingerprint recognition system for low quality images, presented at the SICE
Annual Conference, Waseda University, Tokyo, Japan, Sep. 13-18, 2011.
2. Dr. Muna F. Al-Samaraie, “A New Enhancement Approach for Enhancing Image of Digital Cameras by
Changing the Contrast”, International Journal of Advanced Science and Technology Vol. 32, July, 2011.pp.-
13-22.
3. Mustafa Salah Khalefa 1, Zaid Amin Abduljabar 2 and Huda Ameer Zeki, “Fingerprint Image Enhancement
by Develop Mehtre Technique”, Advanced Computing: An International Journal ( ACIJ ), Vol.2, No.6,
November 2011,pp.-171-182.
4. D. Ezhilmaran and M. Adhiyaman, “A Review Study on Fingerprint Image Enhancement Techniques”,
International Journal of Computer Science & Engineering Technology (IJCSET)Vol. 5 No. 06 Jun 2014,
ISSN : 2229-3345,625-631.
5. Darshan Charan Nayak, “Comparative Study of Various Enhancement Techniques for Finger Print Images”,
(IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) , 2015, ISSN
:0975-9646, 1900-1905.
6. C.Nandini and C.N.Ravikumar, “Improved fingerprint image representation for recognition,” International
journal of computer science and information technology, MIT Publication, Vol. 01-no.2, 2011,pp.59-64.
7. J.Choudhary, Dr.S.Sharma, J.S.Verma, “A new framework for improving low quality fingerprint images,”
international journal of computer technology and application. Vol.2, no.6, pp.1859 -1866,2011.

20 National Conference on Emerging Trends in Information Technology


Data Mining in Credit Card Frauds: An Overview
Vidhi Khurana*
Ramandeep Kaur**
Abstract
With the increasing awareness of customers amongst plastic money and internet banking, the number
of frauds in transactions have also emerged. In order to detect these frauds, various data mining
techniques can be applied. Financial Fraud Detection(FFD) has been a major concern among the
leading organizations and the banks. Hence a framework has been proposed so as to detect the fraud
in the early stages as well as forecast which transactions are prone to fraudulent activities. This paper
reviews the previous research conducted by the leading researchers in their areas with a focus on credit
card fraud detection and prevention using data mining approaches.
Keywords: Credit Card, Data mining, Financial Fraud Detection, Fraud Prevention

I. Introduction fraud. According to the 2008 Javelin fraud survey


Data Mining has been a very vibrant and upcoming report, victims who detected the fraud within 24 hours
field in all the prevailing industries. From a small and were defrauded for an average of $428. Victims who
independent IT firm, banking organizations, did not discover the fraud up to a month later suffered
convenience stores, to leading industries, the an average loss of $572[6].
implications of data mining can be felt. It may be Financial Fraud can be classified into various categories
defined as the logical process of extraction of hidden as depicted in Table 1.
and interesting information from the huge
databases[1]. It is a methodology of mining of Bank Frauds are very devastating and have a severe
knowledge from the given data sources. Hence may repercussion on the organizations. It comprises of all
aid in Knowledge discovery. the fraudulent activities involved in the banking sector.
It is broadly classified into two categories: i) External:
Data Mining can be categorized into three identifiable here the assassin are outside the bank ii) Internal: here
steps: (i) Exploration (ii) Pattern Identification and bank personnel commits the fraud. Card fraud,
(iii) Deployment. On the basis of the kind of data to mortgage fraud and money laundering are few
be mined, there are two categories of functions instances of bank fraud. Insurance Fraud is an activity
involved in Data Mining, viz.,Descriptive and of obtaining fraudulent outcomes from an insurance
Classification and Prediction[27]. Mined knowledge company[8]. It can be committed by consumer, broker
can be used in various domains like: fraud detection, and agents, insurance company employees and others.
production control, science exploration and market Automobile fraud and healthcare fraud are in top
analysis. Financial Fraud Detection(FFD) is of high category of this classification [2,13]. Securities and
priority at present. Data Mining help in detection of commodities fraud is a type of white collar crime that
financial frauds by analysing patterns hidden in the can be committed by individuals. [investopedia] The
transaction data [8]. FFD is vital for the prevention types of misrepresentation involved in this crime
of the often devastating consequences of financial include providing false information, withholding key
information, offering bad advice, and offering or acting
Vidhi Khurana*
on inside information. Other related financial frauds
Pursuing MCA from Institute of Information
include corporate and mass marketing fraud. Mass
Technology & Management
communication media such as telephones and
Ramandeep Kaur**
internets are used in mass market fraud [14]. Mass-
Assistant Professor
marketing fraud schemes generally fall into two broad
Institute of Information Technology & Management
IITM Journal of Management and IT

Table 1: Classification for Financial Fraud based on FBI, 2007

categories: (1) schemes that defraud numerous victims II. Literature Review
out of comparatively small amounts, such as several Vast research has been carried out in the field of data
hundred dollars, per victim; and (2) schemes that mining and fraud detection but the challenge in
defraud comparatively less numerous victims out of dealing with the increasing number of frauds remains
large amounts, such as thousands or millions of dollars the same. Data mining enables a user to seek valuable
per victim. information and their interesting relationships [24].
The objective of this paper is to describe generalized A number of data mining techniques are available such
architecture of Financial Fraud detection as well as as decision trees, neural networks (NN), Bayesian belief
the techniques of preventing the frauds. Special focus networks, case based reasoning, fuzzy rule-based
has been laid on Credit Card Financial Frauds. The reasoning, hybrid methods, logistic regression, text
remainder of the paper is divided in the following mining, feature selection etc. Financial fraud is a
sections: Section II deals with a detailed review of serious problem worldwide and more so in fast growing
literature. Section III deals with a framework for countries like China[21]. According to Kirkos et al.
Financial Fraud Detection. Section IV deals with Fraud [7], some estimates stated that fraud cost US business
detection in Credit Cards. Section V gives a concluding more than $400 billion annually. An innovative fraud
remark on the review carried out. detection mechanism was developed on the basis of

Fig 1: Methodological Framework for research[8]

22 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Table 2: Research on data mining techniques in FFD[8]

Zipf ’s Law with a purpose of assisting the auditors in suggested that novel combination of meta-heuristic
reviewing the bulbous volumes of datasets while at approaches, namely the genetic algorithms and the
the same time intending to identify any potential fraud scatter search when applied to real time data, may yield
records[26]. The study of Bolton and Hand [22] fraudulent transactions which are classified
provides a very good summary of literature on fraud correctly[5]. Padhy et al (2012) provided a detailed
detection problems. Some researchers used methods survey of data mining applications and its feature
such as ID3 decision tree, Bayesian belief, back- scope. A number of researchers also discussed the
propagation Neural Network to detect and report the application of data mining in anomaly detection [17,
financial frauds[7,12]. Fuzzy logic based techniques 19, 20, 23].
based on soft computing were also incorporated to
deal with the frauds [15, 16]. Panigrahi et. al.[25] III. Framework of FFD
suggested a four component fraud detection solution Methodological framework for review is a three step
with an idea to determine a set of suspicious process: i) Research Definition ii) Research
transactions and then predict the frauds by running Methodology and iii) research analysis. Research
Bayesian learning algorithm. Further, a set of fuzzy definition is a phase mining technique.Goal of the
association rules were extracted from a data set research is to create a classification framework for data
containing genuine and fraudulent transactions w.r.t mining techniques applicable to FFD. Research scope
credit cards to analyze and compare the frauds. It was here is the literature comprising application of data

Volume 8, Issue 1 • January-June, 2017 23


IITM Journal of Management and IT

Fig 2: Architecture for Credit Card Fraud Detection[10]

mining techniques on FFD published from 1997 to Various data mining techniques used in credit card
2008. Phase to is the research methodology. In this fraud detection are logistic regression, support vector
phase the online academic databases are searched for machine and random forests. Credit card fraud
FFD. In each iteration these databses are filtered out detection scheme scans all the transactions inclusive
to obtain the articles that were published in the of fraudulent transactions[10]. Data obtained from
academic journals(1997-2008) and should present the data warehouse is divided into various dataset.
data mining techniques along with application to FFD. Dataset comprises of primary attributes (account
A detailed process for FFD has been depicted in Fig number, sale, purchase, date name and many others)
1. All the obtained articles consistency are verified and and derived attributes (for instance transactions
final result of classification is passed to third phase of grouped monthly). Derived attributes are not precise,
the framework. Research analysis phase includes the which causes approximation of results and therefore
analysis of the selected where the topic or area of not accurate information. Therefore derived attributes
research is identified for formulating the research goal are limitation to the credit card fraud detection scheme.
and definingg the scope of the performed research. The implemented architecture [Fig2] comprises of
Here identified research area: the academic reserch on database interface subsystem and credit card fraud
FFD that applies data papers to formulate conclusion (CCF) detection engine. The former enables the
and results based on the analysis of paper[8]. reading of transactions, i.e. it acts as an interface for
banking software.
IV. Fraud Detection in Credit Cards
In the CCF detection subsystem, the host server checks
Credit card fraud is sort of identity theft, where an
every transaction rendered to it using neural networks
unauthorized person makes fraudulent transactions.
and transactions business rules.
It can be classified into: Application fraud and
Behaviour fraud. Application fraud occurs when a V. Conclusion
fraudster gets a credit card issued from companies by
Data mining gained weightage in the areas where
providing false information[3]. It is very serious
finding the patterns, forecasting, discovery of
because victim may learn about the fraud too late.
knowledge etc., is required and becomes obligatory in

24 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

different industrial domains. Various techniques and internet users to internet users. A detailed review was
algorithms such as feature selection, classification, conducted to understand how these financial frauds
memory based reasoning, clustering etc., aids in fraud can be detected and avoided using data mining
detection in areanas of insurance, financial frauds etc.. techniques. A special reference to Credit card frauds
Financial sector has been majory affected ny fradulent was mentioned to understand the architecture of credit
activities due to increase in conversion rate of non- card fraud detection.

References
1. Bose, R.K. Mahapatra, “Business data mining — a machine learning perspective”, Information Management,
vol.39, no.3, pp.211–225, 2001.
2. Coalition against Insurance Fraud, “Learn about fraud,” http://www.insurancefraud.org/
learn_about_fraud.htm, Last accessed 23 January 2017.
3. Credit Card Fraud: An Overview, Legal Information Institute, web: https://www.law.cornell.edu/wex/
credit_card_fraud, Last Accessed: 23 January 2017.
4. D. Sánchez, M. A. Vila, L. Cerda, and J. M. Serrano, “Association rules applied to credit card fraud detection,”
Expert Syst. Appl., vol. 36, no. 2 PART 2, pp. 3630–3640, 2009.
5. E. Duman and M. H. Ozcelik, “Detecting credit card fraud by genetic algorithm and scatter search,” Expert
Syst. Appl., vol. 38, no. 10, pp. 13057–13063, 2011.
6. E. Joyner, “Enterprisewide Fraud Management”, Banking, Financial Services and Insurance, Paper 029, 2011
7. E. Kirkos, C. Spathis and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial
statement”, Expert Systems with Applications, vol.32, pp.995–1003, 2007.
8. E. W. T. Ngai, L. Xiu, and D. C. K. Chau, “Application of data mining techniques in customer relationship
management: A literature review and classification,” Expert Syst. Appl., vol. 36, no. 2 PART 2, pp. 2592–
2602, 2009.
9. FBI, Federal Bureau of Investigation, Financial Crimes Report to the Public Fiscal Year, Department of
Justice, United States, 2007, http://www.fbi.gov/publications/financial/fcs_report2007/
financial_crime_2007.htm.
10. F. N. Ogwueleka, “Data Mining Application In Credit Card Fraud Detection System”, Journal of Engineering
Science and Technology, vol. 6, no. 3, pp.311 – 322, 2011.
11. F.N. Ogwueleka, and H.C. Inyiama, “Credit card fraud detection using artificial neural networks with a
rule-based component’, The IUP Journal of Science and Technology, vol.5, no.1, pp.40-47, 2009.
12. J.E. Sohl and A.R. Venkatachalam, “A neural network approach to forecasting model Selection”, Information
& Management, vol.29, no.6, pp. 297–303, 1995.
13. J.L. Kaminski, “Insurance Fraud”, OLR Research Report, http://www.cga.ct.gov/2005/rpt/2005-R-0025.htm.
2004
14. “Mass Marketing Fraud(MMF)”, Strategy, Policy & Training Unit, Department of Justice, http://
www.justice.gov/criminal-fraud/mass-marketing-fraud, Last Accessed: 23 January 2017.
15. M. Delgado, D. Sa´nchez, and M.A. Vila, “Fuzzy cardinality based evaluation of quantified sentences”,
International Journal of Approximate Reasoning, vol.23, pp.23–66, 2000.
16. M. Delgado, N. Marý´n, D. Sa´nchez, and M.A.Vila, “Fuzzy association rules: General model and
applications”, IEEE Transactions on Fuzzy Systems, vol.11, no.2, pp.214–225, 2003.

Volume 8, Issue 1 • January-June, 2017 25


IITM Journal of Management and IT

17. N. Kaur, “Survey paper on Data Mining techniques of Intrusion Detection”, International Journal of Science,
Engineering and Technology Research, vol. 2, no. 4, pp. 799-804, 2013.
18. N. Padhy, P. Mishra, and R. Panigrahi, “The Survey of Data Mining Applications and Feature Scope”,
International Journal of Computer Science, Engineering and Information Technology, vol. 2, no. 3,pp. 43-58,
2012.
19. P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava and P.N.Tan, “Data mining for network intrusion
detection”, Proceedings of NSF Workshop on Next Generation Data Mining, pp. 21-30, 2002.
20. P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández and E. Vázquez, “Anomaly-based network intrusion
detection: Techniques, systems and challenges”, Computers and security, vol.28, no. 1, pp. 18-28, 2009.
21. P. Ravisankar, V. Ravi, G. Raghava Rao, and I. Bose, “Detection of financial statement fraud and feature
selection using data mining techniques,” “, Decision Support Systems, vol. 50, no. 2, pp. 491–500, 2011.
22. R. Bolton, and D. Hand, ‘Statistical fraud detection: A review”, Statistical Science, vol.17, pp.235–255,
2002.
23. S. Agrawal and J. Agrawal, “Survey on anomaly detection using data mining techniques,” Procedia Comput.
Sci., vol. 60, no. 1, pp. 708–713, 2015.
24. S. H. Weiss, and N. Indurkhya, “Predictive Data Mining: A Practical Guide”, , CA: Morgan Kaufmann
Publishers, 1998.
25. S. Panigrahi, A. Kundu, S. Sural, and A. Majumdar, “Credit card fraud detection a fusion approach using
Dempster–Shafer theory and bayesian learning”, Information Fusion, pp.354–363, 2009.
26. S.-M. Huang, D.C. Yen, L.-W. Yang and J.-S. Hua, “An investigation of Zipf ’s Law for fraud Detection”,
Decision Support Systems, vol.46, no. 1, pp. 70–83, 2008.
27. Tutorialspoint, “Data mining Tasks”, http://www.tutorialspoint.com/ data_mining/ dm_tasks.htm, Last
Accessed: 24 January 2017.

26 National Conference on Emerging Trends in Information Technology


Review of Text Mining Techniques
Priya Bhardwaj*
Priyanka Khosla**
Abstract
Data mining is a process of discovering potential and practical, previously unknown patterns from large
pre existing databases. Text mining is a realm of data mining in which large amount of structured and
unstructured text data is analyzed to produce information of high commercial value. Analyzing textual
data requires context analysis. This paper represents the current research status of text mining. Association
rules, a novel technique in text mining is gaining increasing currency among research scholars is discussed.
Based on studied attempts, the potential future research activities have been proposed.
Keywords: component; formatting; style; styling; insert (key words)

I. Introduction that 80% of company’s information is contained in


With the evolution of internet and rapid developments text [4] and analysis of this information is required
in information technology enormous amount of for making strategic decisions.
textual data is generated in the form of blogs, tweets This paper introduces the current research status of
and discussion forums. The data potentially has a lot text mining. Section III describes some general models
of hidden information which can intuitively predict used for mining text. The applications of text mining
human behavior. The major challenge is to uncover and the related techniques are discussed in Section IV
relationships and associations in the data which is in followed by a conclusion.
various formats i.e. unstructured data [1]. Text mining
aims at revealing the concealed information by using II. State of the Art
various techniques that are capable of coping up with Hans Peter Luhn[6] in 1958, published an article in
large amount of structured data on one hand and journal of IBM which discusses about the automatic
handling the vagueness, fuzziness and uncertainty of extraction by data processing machine and classifies
the unstructured data on the other. Text mining or the document on the word frequency statistics. This
knowledge discovery from text (KDT) — for the first was considered to be one of the primitive definitions
time mentioned in Feldman et al. [2] — deals with of business intelligence.
the computational analysis of textual data. It is an
interdisciplinary field involving techniques from The research in the field of text mining continued and
information extraction, information retrieval as well many scholars carried prolific research in the field. In
as Natural Language Processing (NLP) and integrates the 1st International Conference on Data Mining and
them with the algorithms and methods of data mining, Knowledge Discovery in 1995 Feldman et al. [5]
statistics and machine learning. proposed Knowledge Discovery in Database (KDT).
Supervised [7] and Unsupervised [8][9] learning
The most convenient way of storing information is
algorithms are used to uncover hidden patterns in the
believed to be text. In the recent surveys it is considered
textual documents.
Priya Bhardwaj* Subsequently, other outstanding work done is in the
Assistant Professor field including dimensionality reduction on the basis
Institute of Information Technology and
of correlation in feature extraction [13]-[14]; soft set
Management, Delhi, India
approach using association rule mining [15] by
Priyanka Khosla**
introducing SOFTAPRIORI that discovers
Assistant Professor
relationships more accurately; sentiment analysis for
Institute of Information Technology and
Management, Delhi, India
online forums hotspot detection and forecast [16];
IITM Journal of Management and IT

Figure 1: Knowledge Discovery Process

sentiment analysis using self organizing maps and ant performing the desired operations the data set should
clustering [17]; and text mining in various other fields be consistent with the system.
such as stock prediction [18], web mining [19], digital
library [20] and so on. C. Data Mining
After the document being converted into the
III. Text mining Models intermediate form data mining techniques can be
Generally text mining is a four step process which is applied to different type of data according (structured,
text preprocessing, data selection, data mining and post semi- structured and unstructured) to recognize
processing.. relationships and patterns. The various data mining
Finally, complete content and organizational editing techniques are discussed in detail in section IV.
before formatting. Please take note of the following
items when proofreading spelling and grammar: D. Data Post processing
It includes the tasks of evaluation and visualization of
A. Data Cleaning the knowledge coming out after performing text
The textual data available for mining is generally mining operations.
collected over web from the tweets, discussion forums
and blogs. The data set available from these sources is IV. Techniques Used in Data Mining
in various formats i.e. “unstructured”. We need to The progress of Information Technology has produced
“clean” the data by performing parsing of data, missing large amount of data and data repositories in diverse
value treatment, removing inconsistencies. After areas. The research made in databases has further given
performing the desired operations the data set should rise to the techniques used to store and process the
be consistent with the system. data for decision making. Thus, Data mining is a
B. Data selection and transformation process of finding useful patterns from large amount
The textual data available for mining is generally of data and is also termed as knowledge discovery
collected over web from the tweets, discussion forums process which states the knowledge mining or
and blogs. The data set available from these sources is extraction from large amount of data.
in various formats i.e. “unstructured”. We need to Machine Learning Algorithms
“clean” the data by performing parsing of data, missing
value treatment, removing inconsistencies. After l Unsupervised Machine Learning :It is a type of
machine learning algorithm that is used to draw

28 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

conclusion from datasets that consists of input data classifier. Rules are generated from it that further helps
without the labeled responses. The most familiar in making decisions.
unsupervised learning method is cluster analysis,
Types of classification models:
that is used for exploratory data analysis to find
hidden patterns or grouping in data. l Classification by decision tree induction
l Supervised Machine Learning Algorithm: It is a l Bayesian Classification
type of machine learning algorithm that uses a l Neural Networks
identified dataset (called the training dataset) in l Support Vector Machines (SVM)
order to make predictions. The training data set l Classification Based on Associations
comprises of input data and response values. From
this dataset, the supervised learning algorithm B. Clustering RulesTechnique:
searches for a model that can make predictions of It is the task of grouping objects in such a way that
the response values for a new dataset. A test dataset objects in the same group or cluster are similar in one
is often used to validate the model. Using larger sense or another to each other than to those objects
training datasets often yield models with higher present in another groups. Thus it is an identification
predictive power that can generalize well for new of similar classes of objects. By using clustering
datasets. techniques we can further identify dense and sparse
A. Classification Technique: regions in object space and can discover overall
distribution pattern and correlations among data
Classification is the commonly used data mining attributes. Types of clustering methods involves
technique that employs training dataset or pre-
classified data to generate a model that is used to l Partitioning Methods
classify records according to rules. This technique of l Hierarchical Agglomerative (divisive) methods
data mining is used to find out in which group each l Density based methods
data instance is related within a given dataset using
l Grid-based methods
the training dataset. It is used for classifying data into
different classes according to some constraints. Credit l Model-based methods
Risk analysis and fraud detection are the application
C. Association Rules Technique:
of this technique. This algorithm employs decision
tree or neural network-based classification algorithms. Association is a data mining technique that discovers
Classification is a Supervised learning that involves the probability of the co-occurrence of items in a
the following steps: collection. The relationships between co-occurring
items are expressed as association rules. These rules
Step 1: Rules are extracted using the learning algorithm are if/then statements that help uncover relationships
from (create a model of) the training data. The training between seemingly unrelated data in a relational
data are pre classified examples (class label is known database or other information repository. An example
for each example). of an association rule would be “If a customer buys a
Step 2: Evaluation of the rules on test data. Usually dozen eggs, he is 80% likely to also purchase milk.”
split known data into training sample (2/3) and test Therefore both eggs and milk together are associated
sample (1/3). with each other and are likely to be placed together to
increase the sales of both the product. Thus association
Step 3: Apply the generated rules on new data. rules helps industries and businesses to make certain
Thus, the classifier-training algorithm uses the pre- decisions, such as cross marketing, customer shopping,
classified examples to determine the set of parameters designing of catalogue etc. Association Rule algorithms
required for proper discrimination. The algorithm then should be able to generate rules with confidence values
encodes these parameters into a model called as a less than one. Although the number of possible

Volume 8, Issue 1 • January-June, 2017 29


IITM Journal of Management and IT

Table I. Tasks With Algorithms


Examples of tasks Algorithms to use
Predicting a discrete attribute Decision Tree Algorithm
Flag the customers in a prospective buyers list as good or poor
prospects.
Calculate the probability that a server will fail Clustering Algorithm
within the next 6 months.
Categorize patient outcomes and explore related factors. Neural Network Algorithm
Predicting a continuous attribute Decision Tree Algorithm
Forecast next year’s sales.
Predict site visitors given past historical and seasonal trends.
Generate a risk score given demographics.
Predicting a sequence: Clustering Algorithm
Perform click stream analysis of a company’s Web site.
Analyze the factors leading to server failure.
Capture and analyze sequences of activities during outpatient
visits, to formulate best practices around common activities.
Finding groups of common items in transactions: Association Algorithm
Use market basket analysis to determine product placement. Decision Tree Algorithm
Suggest additional products to a customer for purchase.
Analyze survey data from visitors to an event, to find which
activities or booths were correlated, to plan future activities.
Finding groups of similar items: Clustering Algorithm
Create patient risk profiles groups based on attributes
such as demographics and behaviors.
Analyze users by browsing and buying patterns.
Identify servers that have similar usage characteristics.

Association Rules for a given dataset is generally very VI. Conclusion


large and among that a high proportion of the rules are The paper has provided a concise introduction about
usually of little value. Types of association rules are: the state of the art of text mining. In the next section
l Multilevel association rule the steps required to extract valuable information from
l Multidimensional association rule the data set are described. Consequent section
summarized various data mining techniques such as
l Quantitative association rule classification, clustering and association rule. Text
V. Choosing an Algorithm by Task mining gives a direction to the upcoming fields like
artificial intelligence, therefore it needs the continuous
To help you select an algorithm for use with a specific improvement in order to grow its application areas.
task, the following table provides suggestions for the types
of tasks for which each algorithm is traditionally used.

30 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

References
1. Ah Hwee Tan et al., “Text Mining: The state of the art and the challenges”, Proceedings of the Pakdd Workshop on
Knowledge Disocovery from Advanced Databases, pp. 65-70, 2000.
2. R. Feldman and I. Dagan. Kdt - knowledge discovery in texts. In Proc. of the First Int. Conf. on Knowledge
Discovery (KDD), pages 112–117, 1995.
3. Marti A. Hearst, Untangling text data mining, pp. 3-10, 1999, University of Maryland.
4. S.Grimes. “Unstructured data and 80 percent rule.” Carabridge Bridgepoints, 2008
5. H. P. Luhn, “A Business Intelligence System”, Ibm Journal of Research & Development, vol. 2, no. 4, pp. 314-319,
1958.
6. M. E. Maron, J. L. Kuhns, “On Relevance Probabilistic Indexing and Information Rctrieval”, Journal of the Acm,
vol. 7, no. 3, pp. 216-244, 1960.
7. Larsen, Bjornar, and Chinatsu Aone. “Fast and effective text mining using linear-time document clustering.”
Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM, 1999.
8. Jiang, Chuntao, et al. “Text classification using graph mining-based feature extraction.” Knowledge-Based Systems
23.4 (2010): 302-308.
9. Liu, Wei, and Wilson Wong. “Web service clustering using text mining techniques.” International Journal of
Agent-Oriented Software Engineering 3.1 (2009): 6-26.
10. Ronen Feldman, I. Dagan, H. Hirsh, “Mining Text Using Keyword Distributions”, Journal of Incelligent
Information Systems, vol. 10, no. 3, pp. 281-300, 1998.
11. J. Mothe, C. Chrisment, T. Dkaki, B. Dousset, D. Egret, “Information mining: use of the document dimensions
to analyse interactively a document set”, European Colloquium on IR Research: ECIR, pp. 66-77, 2001.
12. M. Ghanem, A. Chortaras, Y. Guo, A. Rowe, J. Ratcliffe, “A Grid Infrastructure For Mixed Bioinformatics Data
And Text Mining”, Computer Systems and Applications 2005. The 3rd ACS/IEEE International Conference,
vol. 29, pp. 41-1, 2005.
13. Haralampos Karanikas, C. Tjortjis, B. Theodoulidis, “An Approach to Text Mining using Information Extraction”,
Proc. Workshop Knowledge Management Theory Applications (KMTA 00, 2000.
14. Qinghua Hu et al., “A novel weighting formula and feature selection for text classification based on rough set
theory”, Natural Language Processing and Knowledge Engineering 2003. Proceedings. 2003 International Conference
on IEEE, pp. 638-645, 2003.
15. Nahm, Un Yong, and Raymond J. Mooney. “Mining soft-matching association rules.” Proceedings of the eleventh
international conference on Information and knowledge management. ACM, 2002.
16. Li, Nan, and Desheng Dash Wu. “Using text mining and sentiment analysis for online forums hotspot detection
and forecast.” Decision support systems 48.2 (2010): 354-368.
17. Chifu, Emil ªt, Tiberiu ªt Leþia, and Viorica R. Chifu. “Unsupervised aspect level sentiment analysis using Ant
Clustering and Self-organizing Maps.” Speech Technology and Human-Computer Dialogue (SpeD), 2015
International Conference on. IEEE, 2015.
18. Nikfarjam, Azadeh, Ehsan Emadzadeh, and Saravanan Muthaiyah. “Text mining approaches for stock market
prediction.” Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on.
Vol. 4. IEEE, 2010.
19. Kosala, Raymond, and Hendrik Blockeel. “Web mining research: A survey.” ACM Sigkdd Explorations Newsletter
2.1 (2000): 1-15.
20. Fuhr, Norbert, et al. “Digital libraries: A generic classification and evaluation scheme.” International Conference
on Theory and Practice of Digital Libraries. Springer Berlin Heidelberg, 2001.

Volume 8, Issue 1 • January-June, 2017 31


Security Vulnerabilities of Websites and Challenges
in Combating these Threats
Dhananjay*
Priya Khandelwal**
Kavita Srivastava***
Abstract
The public use of Internet started in 1990s. Since then, billions of websites have been developed. Also
the technology has caused development of websites easier and less costly. It has enabled people to
make their online presence quickly and easily through the use of websites. In recent years a number of
Open Source CMS (Content Management Systems) have developed which enabled creation of websites
in minutes. This large number of adoption of website by people also led to the growth of unskilled
website administrators and developers. As a result almost 75% of websites are found to be infected with
malware.
Google reported in March 2015 that around 17 million websites either have installed malicious software
or trying to steal information. This number is increased to 50 million in March 2016. Google blocks
nearly 20000 websites per week for malware and phishing. Most of these blocked websites are found
to be implemented with WordPress, Joomla, and Magento.
This paper addresses various security vulnerabilities found in websites implemented with different
technologies, methods of combating these vulnerabilities and research and development in this direction.
Keywords:

Introduction redirection to another website access or posting


Security is one of the critical phases of quality of any malicious content by gaining access into the
software or any application. Security testing of web application.
applications attempts to figure out various In order to prevent such attacks, the most effective
vulnerabilities, attacks, threats, viruses etc related to method is to develop web applications by applying
the respective application. Security testing should good and secure coding skills. Most of the web
attempt to consider as many as potential attacks as applications which suffer from security vulnerabilities
possible. have common coding problems such as improper
Increase in usage of web applications has opened the input field validations, wrong or no session
doors for hackers around the world for penetrating management, poor configuration settings in web
these applications. Hackers and attacker try to find applications as well as the web server which runs these
out loop holes in coding of web applications to harm applications.
them in a number of ways such as applying Denial of We can organize the threats to web applications in a
Service (DoS) attack, spreading malware, illegal number of classes like Inadequate Authentication,
Cross-Site Scripting, SQL Injection and so on. In the
Dhananjay* next sections all these web security vulnerability classes
BCA, IV, IITM are elaborated.
Priya Khandelwal**
BCA, IV, IITM Security Issues in Websites
Kavita Srivastava*** In this section we discuss the classification of website
Associate Professor, IITM security vulnerabilities.
IITM Journal of Management and IT

(1) Poor Access Grant and Lack of Sufficient (6) Inadequate Authentication
Authorization Authentication this involves confirming the identity
Authorization it is a process where a requester is of an entity/person claiming that it is a trusted one.
allowed to perform an authorized action or to receive Sometimes a developer doesn’t provide a link for
a service. Often a web application grants the access of administrative access. Yet administrative access is
some of its features to specified users only. The web provided through another folder on the server. If a
application verifies the credentials of users trying to hacker identifies its path it becomes very easy to exploit
access these features through a Login page. This type the application.
of vulnerability exists in an application if users can
access these features without verification through (7) Spoofing
certain links or tabs and access other users’ accounts This is an attack where an attacker tries to masquerades
also. another program or user by falsifying the content/data.
Hacker injects malicious piece of code to replace the
(2) Poorly Implemented Functionality original content.
This kind of vulnerability exists in a website due to its
own code which results in harmful consequences such (8) Cross-Site Scripting
as password leak, consuming large amount of resources This type of attack is possible when a website
and giving access to administrative features. The containing input fields accepts scripts as well and leads
security breaches may lead to the disclosure of any to the phishing attack. The script gets stored in the
confidential or sensitive data from any web application. database and executed every time the page is attacked.
For example, <script>alert(message)</script>. Message
(3) Inadequate Exception and Error Handling could be a cookie also. When any user visits the page
Mechanisms and application searches for username or password,
The error messages and exception handling code the script will be executed.
should return only limited amount of information
which prevents an attacker to identify a place for SQL
(9) Denial of Service Attack
Injection. For Instance consider the following code. This kind of attack prevents normal users to access a
website. The attacker attempts to access database server
…catch(Exception e) {Console.WriteLine(e.Message);} and performs SQL injections on it so that database
If it is an SQL exception, this code cn display information becomes inaccessible. The attacker may also try to gain
related too database. access as normal user with wrong password. After few
(4) Brute Force Attack attempts the user is locked out. The attacker may also
gain access to web server and sends specially crafted
This is the process of trial and error in order to guess requests so that web server is crashed.
users’ credentials such as user name, password, security
questions for the purpose of hacking a user’s account. (10) SQL Injection
(5) Data/Information Leak It is an attack where any malicious script/code is
inserted into an instance of SQL server/database for
This kind of security breache may lead to the disclosure execution which eventually will try to fetch any
of any confidential or sensitive data from any web database information.
application. This vulnerability exists in web
applications as a result of improper use of technology (11) Poor Session Management
for developing application. It can cause revealing of If an attacker can predict a unique value that identifies
developer’s comments, source code, etc. It can give a particular user or session (session hijacking) he can
enough information to hacker for exploiting the use it to enter in the system as a genuine user. This
system. problem also occurs when logout activity just redirects

Volume 8, Issue 1 • January-June, 2017 33


IITM Journal of Management and IT

the user to home page without termination of current (20) Sidejacking [11]:
session. The old session IDs can be used for It is a hacking vulnerability where an attacker tries to
authorization. capture all the cookies and may even get access to the
user mailboxes etc.
(12) Application Configuration Settings
Certain configuration settings exist in a web (21) Social vulnerability (hacking), session
application by default such as debug settings, hijacking [4, 5, 10, 11]:
permissions, hardcoded user names, passwords and It is a popular hijacking mechanism where an attacker
admin account information. An attacker may use this gains unauthorized access to the information. xviii.
information to obtain unauthorized access. Mis-configuration [24]: in appropriate or inadequate
configuration of the web application may even lead to
(13) Cross site request forgery [6,7]: the security breaches.
It is a vulnerability which includes exploitation of a
website by transmitting unauthorized commands from (22) Absence of secure network infrastructure [9]:
a user that a website trusts. Thus it exploits the trust Absence of any intrusion detection or protection
of a website which it has on its user browser. system or failover systems etc may even lead to violation
of the security breaches.
(14) Xml injection [1]:
(23) Off the shelf components [9, 11]:
It is an attack where an attacker tries to inject xml
code with aim of modifying the xml structure thus These components are purchased from third party
violating the integrity of the application. vendors so there occurs a suspicion about their security
aspect.
(15) Malicious file execution [3]:
(24) Firewall intrusion detection system [8, 9,10]:
Web applications are often vulnerable to malicious file
A firewall builds a secured wall between the outside/
execution and it usually occurs the code execution
external network and the internal network which is
occurs from a non trusted source.
kept to be trusted.
(16) Cookie cloning [11]: (25) Path traversal [3]:
Where an attacker after cloning the user/browser It is a vulnerability where malicious untrusted input
cookies tries to change the user files or data or may causes non desirable changes to the path.
even harm the injected code.
(26) Command injection [3]:
(17) Xpath injection [3]: It is the injection of any input value which is usually
It occurs when ever a website uses the information embedded into the command to be executed.
provided by the user so as to construct an xml query
for xml data.
(27) Parameter manipulation [5]:
It is similar to XSS where an invader inserts malicious
(18) Cookie sniffing [11]: code/script into the web application.
It is a session hijacking vulnerability with the aim of (28) LDAP injection [3]:
intercepting the unencrypted cookies from web
applications. It is similar to SQL and Xpath injection where queries
are being targeted to LDAP server.
(19) Cookie manipulation [5]: (29) Bad code or fault in implementation [2]:
Here an attacker tries to manipulate or change the Iimproper coding or fault in the implementation of
content of the cookies and thus can cause any harm to the web application may even lead to the violation of
the data or he may even change the data. the security of the web application.

34 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

(30) Clickjacking [6]: working the mutants, one should be sincere enough
It is an attack where a user’s click may be hijacked so to incorporate them as injecting && (and) instead of
that the user would be directed to some other link || (or) or any such other modification may lead to fault
which may contain some malicious code. injection which could result in a security vulnerability
as vulnerabilities do not take semantics into
(31) Content injection [8, 6]: consideration [1]. This may even pose a challenge to
It is a vulnerability where an attacker loads some static the security testing of any such web application. Usage
content that may be some false content into the web of insecure cryptographic storage may even pose a
page. challenge to the web application security testing [1].
Security testing of web applications may face
(32) File injection [8]: repudiation attacks where any receiver is not able to
It refers to the inclusion of any unintended file and is prove that the data received came from a specific sender
a typical vulnerability often found in web applications. or from any other unintended source [1]. Also the
Example: remote file inclusion. web development languages which we use may lack in
enforcing the security policy which may even violate
Challenges faced by security testing of web the integrity and confidentiality of the web application
applications [11]. This may even pose a security risk. At times it is
One of the concerns of security testing of web also possible that an invader is able to launder more
applications is the development of automated tools information than intended, in such a case again this
for testing the security of web applications [3]. Increase may lead to the set back to the integrity of the data
in the usage of Rich Internet Applications (RIAs) also which could be another challenge for a security tester.
poses a challenge for security testing of web Conclusion
application. This is due to the fact that the crawling
techniques which are used for exploration of the web In this paper we have describes various kinds of security
applications used for earlier web applications do not vulnerabilities that may exist in a website if proper
fulfil the requirements for RIAs [3]. RIAs being more consideration is not taken during development. A
users friendly and responsive due to the usage of AJAX website developer must employ all possible measures
technologies. Another challenge could be the usage of to combat any known threats during the whole
unintended invalid inputs which may result in security development cycle of a website from its design,
attacks [1]. And these security breaches may lead to implementation to testing. If any security loop hole
extensive damage to the integrity of the data. While remains undetected hackers can use it for exploiting
the system.
References
1. An Approach Dedicated for Web Service Security Testing, S´ebastienSalva, Patrice Laurencot and IssamRabhi.
2010 Fifth International Conference on Software Engineering Advances.
2. Security Testing of Web Applications: a Search Based Approach for Cross-Site Scripting Vulnerabilities,
Andrea Avancini, Mariano Ceccato , 2011- 11th IEEE International Working Conference on Source Code
Analysis and Manipulation.
3. SUPPORTING SECURITY TESTERS IN DISCOVERING INJECTION FLAWS. Sven T¨urpe, Andreas
Poller, Jan Trukenm¨uller, J¨urgenRepp and Christian Bornmann, Fraunhofer-Institute for Secure Information
Technology SIT, Rheinstrasse 75,64295 Darmstadt, Germany, 2008 IEEE,Testing: Academic & Industrial
Conference - Practice and Research Techniques.
4. A Database Security Testing Scheme of Web Application, Yang Haixia ,Business College of Shanxi University,
Nan Zhihong, Scholl of Information Management,Shanxi University of Finance &Economics,china.
Proceedings of 2009 4th International Conference on Computer Science & Education.

Volume 8, Issue 1 • January-June, 2017 35


IITM Journal of Management and IT

5. Mapping software faults with web security vulnerabilities. Jose Fonseca and Marco Vieira. International
conference on Dependable Systems &Networks : Anchorage, Alaska,june 2008 IEEE.
6. D-WAV: A Web Application Vulnerabilities Detection Tool Using Characteristics of Web Forms. Lijiu Zhang,
Qing Gu, Shushen Peng, Xiang Chen, Haigang Zhao, Daoxu Chen State Key Laboratory of Novel Software
Technology, Department of Computer Science and Technology, Nanjing University. 2010 Fifth International
Conference on Software Engineering Advances.
7. Enhancing web page security with security style sheets Terri Oda and Anil Somayaji (2011) IEEE.
8. Security Testing of Web Applications: a Search Based Approach for Cross-Site Scripting Vulnerabilities,
Andrea Avancini, Mariano Ceccato , 2011- 11th IEEE International Working Conference on Source Code
Analysis and Manipulation.
9. Assessing and Comparing Security of Web Servers. Naaliel Mendes, AfonsoAraújoNeto, JoãoDurães, Marco
Vieira, and Henrique Madeira CISUC, University of Coimbra. 2008 14th IEEE Pacific Rim International
Symposium on Dependable Computing.
10. Firewall Security: Policies, Testing and Performance Evaluation. Michael R. Lyu and Lorrien K. Y. Lau.
Department of computer science and engineering. The Chinese University of Hong kong, Shatin, HK. 2000
IEEE.
11. Top 10 Free Web-Mail Security Test Using Session Hijacking Preecha Noiumkar, Thawatchai Chomsiri,
Mahasarakham University, Mahasarakham, Thailand. Third 2008 International Conference on Convergence
and Hybrid Information Technology. Development of Security Engineering Curricula at US Universities.Mary
Lynn Garcia, Sandia National Laboratories.1998 IEEE.

36 National Conference on Emerging Trends in Information Technology


Security Analytics: Challenges and Future Directions
Ganga Sharma*
Bhawana Tyagi**
Abstract
The frequency and type of cyber attacks are increasing day by day. However, well-known cyber security
solutions are not able to cope with the increasing volume of data that is generated for providing security
solutions. Therefore, current trend in research on cyber security is to apply Big Data Analytics (BDA)
techniques to cyber security. This field, called security analytics (SA), can help network managers in the
monitoring and surveillance of real-time network streams and real-time detection of malicious and/or
suspicious patterns. Researchers believe that an SA system can assist in enhancing all traditional security
mechanisms. Nonetheless, there are certain issues related to incorporating big data analytics to cyber
security. This paper presents an analysis on the issues and challenges faced by Security Analytics, and
further provides future directions in the field.
Keywords: cyber-security, big data, security analytics, big data analytics

I. Introduction security infrastructures collect, process and analyze


Big data analytics (BDA) is the large scale analysis and terabytes of security data on monthly basis. This data
processing of information [1,14]. It uses advanced is too large to be handled efficiently by the existing
analytic and parallel techniques to process very large data storage architectures, algorithms, and query
and diverse records that include different types of mechanisms. Therefore the application of Big data
contents. BDA tools allow getting enormous benefits analytics (BDA) to security is the need of the hour.
and valuable insights by dealing with any massive This paper provides an overview of how big data
volume of mixed unstructured, semi-structured and analytics can help in enhancing the traditional cyber
structured data that is fast changing and difficult to security mechanisms and thus provide a means for
process using conventional database techniques. better security analysis. Rest of the paper is organized
In recent years, BDA has gained popularity in the as follows: section 2 gives a brief overview of literature
security community as it promises efficient processing work, section 3 describes the basic BDA process,
and analysis of security-related data at large scale [3]. section 4 and 5 respectively provide the challenges and
Corporate research is now focusing on Security fututre directions in security analytics while section 6
Analytics, i.e., the application of Big Data Analytics concludes the paper.
techniques to cyber-security. Analytics can assist
II. Literature Review
network managers particularly in the monitoring and
surveillance of real-time network streams and real-time Security analytics is a new technology and concept,
detection of both malicious and suspicious (outlying) therefore much research has not been conducted in
patterns. Over the past ten years, enterprise security this area. However, there are some significant
has gone incrementally more difficult as new and contributions by several authors in this field. For e.g.,
unanticipated threats/attacks surface. The existing Mahmood and Afzal[14] have presented a
comprehensive survey on the state of the art of Security
Ganga Sharma* Analytics, i.e., its description, technology, trends, and
Assistant Professor (IT Dept) tools. Gahi et al [1] highlight the benefits of Big Data
IITM Janakpuri Analytics and then provide a brief overview of
Bhawana Tyagi** challenges of security and privacy in big data
Assistant Professor (IT Dept) environments itself. Further, they present some
IITM available protection techniques and propose some
IITM Journal of Management and IT

possible tracks that enable security and privacy in a Fig. 1 shows the basic stages of BDA process[14] .
malicious big data context. Cybenko and Landwehr[7] Initially, data to be analyzed is selected from real-time
stud-ied historical data from a variety of cyber- and streams of big data and is pre-processed (i.e. cleaned).
national security domains in United state such as This is called ETL (Extract Transform Load). It can
computer vulner-ability databases, offensive and take up to 60% of the effort of BDA, e.g., catering for
defense, co-evolution of wormbots such as Conficker inconsistent, incomplete andmissing values,
etc. They claim that security analytics can provide the normalizing, discretizing and reducing data, ensuring
ultimate solution for cyber-security. Cardenas et statistical quality of data through boxplots, cluster
al[9]provide details of how the security analytics analysis, normality testing etc., and understanding data
landscape is changing with the introduction and through descriptive statistics (correlations, hypothesis
widespread use of new tools to leverage large quantities testing, histograms etc.). Once data is cleaned, it is
of structured and unstructured data. It also outlines stored in BDA databases (cloud, mobile, network
some of the fundamental differences between security servers etc.) and analyzed with analytics. The results
analytics and traditional analytic. Camargo et al[10] are then shown in interactive dashboards using
research on the use of big data analytics for security computer visualization.
and analyze the perception of people for security. They
found that big data can indeed provide a long-term IV. Challenges in Security Analytics
solution for citizen’s security, in particular cyber The big data is a recent technology and has been widely
security. adopted to provide solutions to organsational decision
making[11]. One of the most important area to benefit
III. Big Data And The Basic Bda Process from the advancements in big data analytics is cyber
Big data is data whose complexity hinders it from being security. This area is now being stated as security
managed, queried and analyzed efficiently by the analytics. An important goal for security analytics is
existing database architectures[4]. The “complexity” to enable organisations to identify unknown indicators
of big data is defined through 4V’s: 1) volume – of attack, and uncover things like when compromised
referring to terabytes, petabytes, or even exabytes credentials are being used to bypass defenses[2].
(10006 bytes) of stored information, 2) variety – However, handling unstructured data and combing it
referring to the co-existence of unstructured, semi- with structured data to arrive at an accurate assessment
structured and structured data, and 3) velocity – is one of the big challenges in security analytics.
referring to the rapid pace at which big data is being
In the past, information security was really based on
generated and 4) veracity- to stress the importance of
event correlation designed for monitoring and
maintaining quality data within an organization.
detecting known attack patterns[9]. This model alone
The domain of Big Data Analytics (BDA) is concerned is no longer adequate as multidimensional cyber-
with the extraction of value from big data, i.e., insights attacks are dynamic and can use different tactics and
which are nontrivial and previously unknown, implicit techniques to find their way into and out of an
and potentially useful. These insights have a direct organization. In addition, the traditional set of security
impact on deciding or manipulating the current devices is designed and optimized to look for particular
business strategy [14]. The assumption is that patterns aspects of attacks: a network perspective, an attack
of usage, occurrences or behaviors exist in big data. perspective, a malware perspective, a host perspective,
BDA attempts to fit mathematical models on these a web traffic perspective, etc[12]. These different
patterns through different data mining techniques such technologies see isolated aspects of an attack and lack
as Predictive Analytics, Cluster Analysis, Association the bigger picture.
Rule Mining, and Prescriptive Analytics [13]. Insights
1. Cyber-attacks are extremely difficult to distinguish
from these techniques are typically represented on
or investigate, because until all the event data is
interactive dashboards and help corporations maintain
combined, it’s extremely hard to determine what
the competitive edge, increase profits, and enhance
an attacker is trying to accomplish[6,8].
their CRM.

38 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Fig1. Basic BDA process[14]


Addressing new types of cyber-threats requires a 5. While original data formats should be preserved,
commitment to data collection and processing as security analysts must also have the ability to tag,
well as much greater diligence on security data index, enrich, and query any data element or group
analytics. of data elements together to get a broader
perspective for threat detection/response.
2. The main idea behind big data is to extract useful
Otherwise, security data will remain a black hole
insights by performing specific computations.
if it can’t be easily queried and understood by
However, it is important to secure and protect
security professionals .
these computations to avoid any risk or attempt
to change or skew the extracted results. It is also 6. Systems must provide a simple interface and
important to protect the systems from any attempt search-based access to broaden and simplify access
to spy on the nature or the number of performed to data. This will empower security analysts to
computations. investigate threats and gain valuable experience.
Systems should also allow for straightforward ways
3. In an open context, large volume of content
to create dashboards and reports to streamline
collected through big data is not always a good
security operations.
metric for the quality of extracted results.
Therefore, it may not always be possible to achieve V. Future Directions
good threat detection and prevention.
It is no longer a matter of if, but when, attackers will
4. Since cyber-attacks can be multidimensional can break into your network. They’ll use zero-day attacks,
happen over long periods of time, historical stolen access credentials, infected mobile devices, a
analysis must also be incorporated so that analysts vulnerable business partner, or other tactics. Security
can perform root cause analysis and attack scoping success is not just about keeping threats out of your
to determine the breadth of a compromise or data network. Instead it’s about quickly responding to and
breach.
Volume 8, Issue 1 • January-June, 2017 39
IITM Journal of Management and IT

thwarting an attack when it happens[4,5]. According and an efficient, low-maintenance solution that
to a very reputed organization providing security should scale up. Leverage IT investments by
solutions “Organizations are failing at early breach integrating with the existing IT environment and
detection, with more than 92 percent of breaches extending current controls and processes into Big
undetected by the breached organization.” It is clear Databases.
that we need to play a far more active role in protecting
6. As far as possible provide block layer encryption,
our organizations[8]. We need to constantly monitor
which will improve security but also enable big
what is going on within our infrastructure and have
data clusters to scale and perform[7,8].
an established, cyclical means of responding before
attacks wreak havoc on our networks and reputations. 7. Leverage security tools or third-party products.
Therefore, some of the primary requirements for the Tools may include SSL/TLS for secure
security analytics solution are: communication, Kerberos for node
authentication, transparent encryption for data-
1. Secure sensitive data entering Big database systems
at-rest[13].
and then provide control access to Protected data
by monitoring which applications and which users VI. Conclusion
gets access to which original data.
Security analytics is the new technical foundation of
2. Protection of sensitive data that maintains usable, an informed, reliable detection and response strategy
realistic values for accurate analytics and modeling for cyber attacks. Mature security organizations
on data in its encrypted form. recognize this and are leading with building their
security analytics capabilities today. A security analytics
3. Assure global regulatory compliance. Securely
system combines and integrates the traditional ways
capture, analyze and store data from global
of cyber threat detection to provide security analysts a
sources, and ensure compliance with international
platform with both enterprise-scale detection and
data security, residency and privacy regulations.
investigative capabilities. It will not only help identify
Address compliance comprehensively, not system-
events that are happening now, but will also assess the
by-system.
state of security within the enterprise in order to predict
4. Optimize performance and scalability. what may occur in the future and enable more
5. Integrate data security, with quick implementation proactive security decisions.

References
1. Gahi, Y., Guennoun, M., & Mouftah, H. T. (2016, June). Big Data Analytics: Security and privacy challenges.
In Computers and Communication (ISCC), 2016 IEEE Symposium on (pp. 952-957). IEEE.
2. Verma, R., Kantarcioglu, M., Marchette, D., Leiss, E., & Solorio, T. (2015). Security analytics: essential data
analytics knowledge for cybersecurity professionals and students. IEEE Security & Privacy, 13(6), 60-65.
3. Oltsik, J. (2013). The Big Data Security Analytics Era Is Here. White Paper, Retrieved from https://
www.emc.com/collateral/analyst-reports/security-analytics-esg-ar.pdf on on 30th December, 2016
4. Shackleford D. (2013). SANS Security Analytics Survey, WhitePaper, SANS Institute InfoSec Reading Room.
Downloaded on 30th December, 2016.
5. Gawron, M., Cheng, F., & Meinel, C. (2015, August). Automatic detection of vulnerabilities for advanced
security analytics. In Network Operations and Management Symposium (APNOMS), 2015 17th Asia-Pacific
(pp. 471-474). IEEE.
6. Gantsou, D. (2015, August). On the use of security analytics for attack detection in vehicular ad hoc networks.
In Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC), 2015 International
Conference on (pp. 1-6). IEEE.

40 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

7. Cybenko, G., & Landwehr, C. E. (2012). Security analytics and measurements. IEEE Security & Privacy,
10(3), 5-8.
8. Cheng, F., Azodi, A., Jaeger, D., & Meinel, C. (2013, December). Multi-core Supported High Performance
Security Analytics. In Dependable, Autonomic and Secure Computing (DASC), 2013 IEEE 11th International
Conference on (pp. 621-626). IEEE.
9. Cardenas, A. A., Manadhata, P. K., & Rajan, S. P. (2013). Big data analytics for security. IEEE Security &
Privacy, 11(6), 74-76.
10. Camargo, J. E., Torres, C. A., Martínez, O. H., & Gómez, F. A. (2016, September). A big data analytics
system to analyze citizens’ perception of security. In Smart Cities Conference (ISC2), 2016 IEEE International
(pp. 1-5). IEEE.
11. Alsuhibany, S. A. (2016, November). A space-and-time efficient technique for big data security analytics. In
Information Technology (Big Data Analysis)(KACSTIT), Saudi International Conference on (pp. 1-6). IEEE.
12. Rao, S., Suma, S. N., & Sunitha, M. (2015, May). Security Solutions for Big Data Analytics in Healthcare.
In Advances in Computing and Communication Engineering (ICACCE), 2015 Second International Conference
on (pp. 510-514). IEEE.
13. Marchetti, M., Pierazzi, F., Guido, A., & Colajanni, M. (2016, May). Countering Advanced Persistent
Threats through security intelligence and big data analytics. In Cyber Conflict (CyCon), 2016 8th International
Conference on (pp. 243-261). IEEE.
14. T. Mahmood and U. Afzal, “Security Analytics: Big Data Analytics for cyber-security: A review of trends,
techniques and tools,” 2nd National Conference on Information Assurance (NCIA), 2013

Volume 8, Issue 1 • January-June, 2017 41


A Survey of Multicast Routing Protocols in MANET
Ganesh Kumar Wadhwani*
Neeraj Mishra**
Abstract
Multicasting is a technique in which a sender’s message is forwarded to a group of receivers. Conventional
wired multicast routing protocols do not perform well in mobile ad hoc wireless network (MANET)
because of the dynamic nature of the network topology. Apart from mobility aspect there is bandwidth
restriction also which must be addressed by the multicasting protocol for the MANET. In this paper, we
give a survey of classification of multicast routing protocol and associated protocols. In the end, a
comparison is also made among different classes of multicast routing.
Keywords: Multicast routing, mobile ad hoc network, tree based protocol, mesh based protocol,
source-initiated multicast, receiver initiated multicast, soft state, hard state

I. Introduction groups need to be established. Typically, the


MANET is a collection of autonomous mobile nodes membership of a host group is dynamic: that is, the
communicating with each other without a fixed hosts may join and leave groups at any time. There is
infrastructure. MANET find applications in areas no restriction on the location or number of members
where setting up and maintaining a communication in a host group. A host may be a member of more
infrastructure may be difficult or costly like emergency than one group at a time. A host does not have to be a
search and rescue operation, law enforcement and member of a group to send packets to it. A multicast
warfare situations. protocol has the objective of connecting members of
the multicast group in an optimal way, by reducing
Multicasting is a technique for data routing in the amount of bandwidth necessary but also
networks that allows the same message is forwarded considering other issues such as communication delays
to a group of destinations simultaneously. Multicasting and reliability [1].
is intended for group oriented computing like audio/
video conferencing, collaborative works, etc. In MANET Multicast routing plays an important role
Multicasting is an essential technology to efficiently in ad hoc wireless networks to provide communication
support one to many or many to many applications. among nodes which are highly dynamic in terms of
Multicast routing has attracted a lot of attention in their location. It is advantageous to use multicast rather
the past decade, due to it allows a source to send than multiple unicast especially in the ad hoc
information to multiple destinations concurrently. environment where bandwidth is an issue.
Multicasting is the transmission of packets to a group Conventional wired network multicast routing
of zero or more hosts called multicast group that is protocols such as DVMRP, MOSP, CBT and PIM
identified by a single destination address. A multicast don’t perform well in MANET because of the dynamic
group is a set of network clients and servers interested nature of the network topology. The dynamically
in sharing a specific set of data. A typical example of changing topology, coupled with relatively low
multicast groups is a commander and his soldiers in a bandwidth and less reliable wireless links, causes long
battlefield. There are other examples in which multicast convergence times and may give rise to formation of
transient routing loops that rapidly consume the
Ganesh Kumar Wadhwani* already limited bandwidth.
Computer Science,
IITM II. Multicast Routing Classification
Neeraj Mishra** One of the most popular methods to classify multicast
Computer Science, routing protocols for MANETs is based on how
IITM distribution paths among group members are
IITM Journal of Management and IT

Figure I: Classification of Multicast Routing Protocols

constructed (the underlying routing structure). Some of the tree based multicast routing protocols
According to this method, existing multicast routing are, bandwidth efficient multicast routing protocol
approaches for MANETs can be divided into tree based (BEMRP) [3], multicast zone routing protocol
multicast protocols, mesh based multicast protocols (MZRP) [4], multicast core extraction distributed ad
and hybrid multicast protocols. hoc routing protocol (MCEDAR) [5], differential
destination based multicast protocol (DDM) [6], ad
In tree-based protocols, there is only one path between
hoc multicast routing protocol utilizing increasing id
a source-receiver pair. It is efficient but main drawback
numbers (AMRIS) [7], and ad hoc multicast routing
of these protocols is that they are not robust enough
protocol (AMRoute) [8].
to operate in highly mobile environment. [2]
Depending on the number of trees per multicast group, Bandwidth-Efficient Multicast Routing
tree based multicast can be further classified as source Protocol (BEMRP)
based multicast tree and group shared multicast tree. It tries to find the nearest forwarding nodes, rather
In source tree based multicast protocols, the tree is than the shortest path between source and receiver.
rooted at the source, whereas in shared-tree-based Hence, it reduces the number of data packet
multicast protocols, a single tree is shared by all the transmissions. To maintain the multicast tree, it uses
sources within the multicast group and is rooted at a the hard state approach in which control packets are
node referred to as the core node. The source tree based transmitted (to maintain the routes) only when a link
multicast perform better than the shared tree based breaks, resulting in lower control overhead, but at the
protocol at heavy load because of efficient traffic cost of a low packet delivery ration. In BEMRP, the
distribution, But the latter type of protocol are more receiver initiates the multicast tree construction. When
scalable. The main problem in a shared tree based a receiver wants to join the group, it initiates flooding
multicast protocol is that it heavily depends on the of Join control packets the existing members of the
core node, and hence, a single point failure at the core multicast tree, on receiving these packets, respond with
node affects the performance of the multicast protocol. Reply packets. When many such Reply packet reach

Volume 8, Issue 1 • January-June, 2017 43


IITM Journal of Management and IT

the requesting node, it chooses one of them and sends is repeated until a tree node is found (see Figure. 2). If
a Reserve packet on the path taken by the chosen Reply no reply message returns to P, a localized broadcast is
packet. used.

Multicast Operation of the Ad-hoc On- Adaptive Demand-Driven Multicast Routing


Demand Distance Vector Routing Protocol (ADMR)
(MAODV) ADMR [13] is an on-demand sender-tree-based
MAODV [9] is a shared-tree-based protocol that is protocol which adapts its behaviour based on the
an extension of AODV [10] to support multicast application data sending pattern. It does not require
routing. With the unicast route information of AODV, periodic floods of control packets, periodic neighbour
MAODV constructs the shared tree more efficiently sensing, or periodic routing table exchanges. The
and has low control overhead. In MAODV, the group application layer behaviour allows efficient detection
leader is the first node joining the group and announces of link breaks and expiration of routing state. ADMR
its existence by Group Hello message flooding. An temporarily switches to the flooding of each data
interested node P sends a join message toward the packet if high mobility is detected.
group leader. Any tree node of the group sends a reply A multicast tree is created when a group sender
message back to P. P only answers an MACT message originates a multicast packet for the first time.
to the reply message with minimum hop count to the Interested nodes reply to the sender’s packet to join
originator. Then a new branch to the shared tree is set the group. Each multicast packet includes inter -packet
up. time which is the average packet arrival time from the
Ad Hoc Multicast Routing Protocol Utilizing sender’s application layer. The inter-packet time lets
tree nodes predict when the next multicast packet will
Increasing Id-numbers (AMRIS)
arrive and hence no periodic control messages are
AMRIS [12] is an on-demand shared-tree-based required for tree maintenance. If the application layer
protocol which dynamically assigns every node in a does not originate new packets as expected, the routing
multicast session an id- number. The multicast tree is layer of the sender will issue special keep-alive packets
rooted at a special node called Sid and the id- numbers to maintain the multicast tree. The sender occasionally
of surrounding nodes increase in numerical value as uses network floods of data packets for finding new
they radiate from the Sid. These id-numbers help nodes members.
know which neighbours are closer to the Sid and this
reduces the cost to repair link failures. The Differential Destination Multicast
Sid initially floods a NEW-SESSION message Protocol (DDM)
associated with its id -number through the network. DDM [14] is a sender-tree-based protocol that is
Each node receiving the NEW- SESSION message designed for small group. DDM has no multicast
generates its own id- number by computing a value routing structure. It encodes the addresses of group
that is larger than and not consecutive to the received members in each packet header and transmits the
one. Then the node places its own id-number and packets using the underlying unicast routing protocol.
routing metrics before rebroadcasting the message. If a node P is interested in a multicast session, it unicast
Each node sends a periodic beacon for exchanging a join message to the sender of the session. The sender
information (like its own id- number) with its adds P into its member list (ML) and unicasts an ACK
neighbours. When a new node P wants to join the message back to P. DDM has two operation modes:
session, it sends a join message to one of its potential stateless mode and soft-state mode. In stateless mode,
parent nodes (i.e., those neighbouring nodes having the sender includes a list of all receivers’ addresses in
smaller id-numbers) Q. If Q is a tree node, it replies a each multicast packet. According to the address list
message to P; otherwise, Q forwards this join message and the unicast routing table, each node receiving the
to one of its own potential parent nodes. This process packet determines the next hop for forwarding the

44 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

packet to some receivers, and will partition the address When a node has less than R/2 parents, it periodically
list to distinct parts for each chosen next hop. issues new join messages to get more parents. When a
data packet arrives at an mgraph member, the member
In order to reduce the packet size, DDM can operate
only forwards the packet to those nearby member core
in soft-state mode. Each node in soft-state mode
nodes that it knows.
records the set of receivers for which it has been the
forwarder. Each multicast packet only describes the Mesh-based protocols may have more than one path
change of the address list since the last forwarding by between a source-receiver pair thereby provide
a special DDM block in the packet header. For redundant routes for maintaining connectivity to
instance, if R4 moves to another place and loses group members. Because of the availability of multiple
connection to R3, the DDM block in the packet paths between the source and receiver mesh based
header describes that R4 is removed. Then B knows protocols are more robust compared to tree based.[2]
that it only has to forward the packet to R3.
On-Demand Multicast Routing Protocol
Multicast Core-Extraction Distributed Ad Hoc (ODMRP)
Routing (MCEDAR) ODMRP provides richer connectivity among group
MCEDAR is a multicast extension to the CEDAR members and builds a mesh for providing a high data
architecture which provides the robustness of mesh delivery ratio even at high mobility. It introduces a
structures and the efficiency of tree structures. “forwarding group” concept to construct the mesh and
MCEDAR uses a mesh as the underlying a mobility prediction scheme to refresh the mesh only
infrastructure, but the data forwarding occurs only necessarily.
on a sender-rooted tree. MCEDAR is particularly
The first sender floods a join message with data payload
suitable for situations where multiple groups coexist
piggybacked. The join message is periodically flooded
in a MANET.
to the entire network to refresh the membership
At first, MCEDAR partitions the network into disjoint information and update the multicast paths. An
clusters. Each node exchanges a special beacon with interested node will respond to the join message. Note
its one hop neighbors to decide that it becomes a that the multicast paths built by this sender are shared
dominator or chooses a neighbor as its dominator. A with other senders. In other words, the forwarding
dominator and those neighbors that have chosen it as node will forward the multicast packets from not only
a dominator form a cluster. A dominator then becomes this sender but other senders in the same group (see
a core node and issues a message to nearby core nodes Figure. 7).
for building virtual links between them. All the core
Due to the high overhead incurred by flooding of join
nodes form a core graph.
messages, a mobility prediction scheme is proposed
When a node intends to join a group, it delegates its to find the most stable path between a sender-receiver
dominating core node P to join the appropriate pair. The purpose is to flood join messages only when
mgraph instead of itself. An mgraph is a subgraph of the paths indeed have to be refreshed. A formula based
the core graph and is composed of those core nodes on the information provided by GPS (Global
belonging to the same group. P joins the mgraph by Positioning System) is used to predict the link
broadcasting a join message which contains a joinID. expiration time between two connected nodes. A
Only those members with smaller joinIDs reply an receiver sends the reply message back to the sender via
ACK message to P (see Figure. 6). Other nodes the path having the maximum link expiration time.
receiving the join message forward it to their nearby
core nodes. An intermediate node Q only accepts at A Dynamic Core Based Multicast Routing
most R ACK messages where R is a robustness factor. Protocol (DCMP)
Q then puts the nodes from which it receives the ACK DCMP aims at mitigating the high control overhead
message into its parent set and the nodes to which it problem in ODMRP. DCMP dynamically classifies
forwards the ACK message into its child set.

Volume 8, Issue 1 • January-June, 2017 45


IITM Journal of Management and IT

the senders into different categories and only a portion ACMRP proposes a novel mechanism to re-elect a new
of senders need issue control messages. In DCMP, core node which is located nearby all members
senders are classified into three categories: active regularly. The core node periodically floods a query
senders, core senders, and passive senders. Active message with TTL set to acquire the group
senders flood join messages at regular intervals. Core membership information and lifetime of its
senders are those active senders which also act as the neighboring nodes. The core node will select the node
core node for one or more passive senders. A passive that has the minimum total hop count of routes toward
sender does not flood join messages, but depends on a group members among neighboring nodes as the new
nearby core sender to forward its data packets. The core node.
mesh is created and refreshed by the join messages
issued by active senders and core senders. Multicast Protocol for Ad Hoc Networks with
Swarm Intelligence (MANSI)
All senders are initially active senders. When a sender
S has packets to send, it floods a join message. Upon MANSI relies on only one core node to build and
receiving this message, an active sender P delegates S maintain the mesh and applies swarm intelligence to
to be its core node if P is close to S and has smaller ID tackle metrics like load balancing and energy
than S. Afterwards, the multicast packets sent by S conservation. Swarm intelligence refers to complex
will be forwarded to P first and P relays them through behaviors that arise from very simple individual
the mesh. behaviors and interactions. Although each individual
has little intelligence and simply follows basic rules
Adaptive Core Multicast Routing Protocol using local information obtained from the
(ACMRP) environment, globally optimized behaviors emerge
when they work collectively as a group. MANSI utilizes
ACMRP presents an adaptive core mechanism in
this characteristic to lower the total cost in the multicast
which the core node adapts to the network and group
session.
status. In general mesh-based protocols, the mesh
provides too rich connectivity and results in high The sender that first starts sending data takes the role
delivery cost. Hence, ACMRP forces only one core of the core node and informs all nodes in the network
node to take responsibility of the mesh creation and of its existence. Reply messages transmitted by
maintenance in a group. The adaptive core mechanism interested nodes construct the mesh. Each forwarding
also handles any core failure caused by link failures, node is associated with a height which is identical to
node failures, or network partitions. the highest ID of the members that use it to connect
to the core node. After the mesh creation, MANSI
A new core node of a group emerges when the first
adopts the swarm intelligence metaphor to allow nodes
sender has multicast packets to send. The core node
to learn better connections that yield lower forwarding
floods join messages and each node stores this message
cost. Each member P except the core node periodically
into its local cache. Interested members reply a JREP
deploys a small packet, called FORWARD ANT,
message to the core node. Forwarding nodes are those
which opportunistically explores better paths toward
nodes who have received a JREP message. If a sender
the core.
only desires to send packets (it’s not interested in
packets from other senders), it sends an EJREP message A FORWARD ANT stops and turns into a
back to the core node. Those nodes receiving this BACKWARD ANT when it encounters a forwarding
EJREP message only forward data packets from this node whose height is higher than the ID of P. A
sender. If a new sender wishes to send a packet but has BACKWARD ANT will travel back to P via the reverse
not connected to the mesh, it encapsulates the packet path. When the BACKWARD ANT arrives at each
toward the core node. The first forwarding node strips intermediate node, it estimates the cost of having the
the encapsulated packet and sends the original packet current node to join the forwarding set via the
through the mesh. forwarding node it previously found. The estimated

46 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

cost, as well as a pheromone amount, is updated on The Core-Assisted Mesh Protocol (CAMP)
the node’s local data structure. The pheromone CAMP is a receiver-initiated protocol. It assumes that
amounts are then used by subsequent FORWARD an underlying unicast routing protocol provides correct
ANTs that arrive at this node to make a decision which distances to known destinations. CAMP establishes a
node they will travel to next. mesh composed of shortest paths from senders to
MANSI also incorporates a mobility-adaptive receivers. One or multiple core nodes can be defined
mechanism. Each node keeps track of the normalized for each mesh, and core nodes need not be part of the
link failure frequency (nlff ) which reflects the dynamic mesh, and nodes can join a group even if all associated
condition of the surrounding area. If the nlff exceeds core nodes are unreachable.
the threshold, the node will add another entry for the It is assumed that each node can reach at least one
second best next hop into its join messages. Then the core node of the multicast group which it wants to
additional path to the core node increases the reliability join. If a joining node P has any neighbor that is a
of MANSI. mesh node, then P simply tells its neighbors that it is
a new member of the group. Otherwise, P selects its
Neighbor Supporting Ad Hoc Multicast
next hop to the nearest core node as the relay of the
Routing Protocol (NSMP)
join message. Any mesh node receiving the join
NSMP utilizes the node locality concept to lower the message transmits an ACK message back to P. Then P
overhead of mesh maintenance. For initial path connects to the mesh. If none of the core nodes of the
establishment or network partition repair, NSMP group is reachable, P broadcasts the join message using
occasionally floods control messages through the an expanded ring search.
network. For routine path maintenance, NSMP uses
local path recovery which is restricted only to mesh For ensuring the shortest paths, each node periodically
nodes and neighbor nodes for a group. looks up its routing table to check whether the
neighbor that relays the packet is on the shortest path
The initial mesh creation is the same with that in to the sender. The number of packets coming from
MANSI. Those nodes (except mesh nodes) that detect the reverse path for a sender indicates whether the node
reply messages become neighbor nodes, and neighbor is on the shortest path. A special message will be issued
nodes do not forward multicast packets. After the mesh to search a mesh node and the shortest path can be re-
creation phase (see Figure. 11), all senders transmit established. At last, to ensure that two or more meshes
LOCAL_REQ messages to maintain the mesh at eventually merge, all active core nodes periodically send
regular interval. Only mesh nodes and neighbor nodes messages to each other and force nodes along the path
forward the LOCAL_REQ messages. In order to that are not members to join the mesh.
balance the routing efficiency and path robustness, a
receiver receiving several LOCAL_REQ messages III. Present Status of Multicast Routing
replies a message to the sender via the path with largest Protocols
weighted path length. Multicasting is a mechanism in which a source can
Since only mesh nodes and neighbor nodes accept send the same communication to multiple
LOCAL_REQ messages, the network partition may destinations. In multicast routing a multicast tree is
not be repaired. Hence, a group leader is elected among to be found out to a group of destination nodes along
senders and floods request messages through the which the information will be disseminated to different
network periodically. Network partition can be nodes in parallel. Multicast routing is more efficient
recovered by the flooding of request messages. When as compared to unicast because in this data is forwarded
a node P wishes to join a group as a receiver, it waits to many intended destination in one go rather than
for a LOCAL_REQ message. If no LOCAL_REQ sending individually. At the same time it is not as
message is received, P locally broadcasts a MEM_REQ expensive as broadcasting in which the data is flooded
message. to all the nodes in the network. It is extremely suitable
for a bandwidth constrained network like MANET.

Volume 8, Issue 1 • January-June, 2017 47


IITM Journal of Management and IT

Table I: Comparison of Multicast Routing Protocols


Multicast Multicast Initiali- Independent Dependency Maintenance Loop Flooding Periodic
Protocols Topology zation On Routing On Specific Approach Free of Control Control
Protocol Routing Packets Messaging
Protocol
ABAM Source-Tree Source Yes No Hard State Yes Yes No
BEMRP Source-Tree Receiver Yes No Hard State Yes Yes No
DDM Source-Tree Receiver No No Soft State Yes Yes Yes
MCEDAR Source-Tree Source or No Yes Hard State Yes Yes No
Mesh Receiver (CEDAR)
MZRP Source-Tree Source Yes No Hard State Yes Yes Yes
WBM Source-Tree Receiver Yes No Hard State Yes Yes No
PLBM Source-Tree Receiver Yes No Hard State Yes No Yes
MAODV Source-Tree Receiver Yes No Hard State Yes Yes Yes
ADAPTIVE Combination Receiver Yes No Soft State Yes Yes Yes
SHARED of Shared
And Source
tree
AMRIS Shared-Tree Source Yes No Hard State Yes Yes Yes
AMROUTE Shared Tree Source or No No Hard State No Yes Yes
Mesh Receiver
ODMRP Mesh Source Yes No Soft State Yes Yes Yes
DCMP Mesh Source Yes No Soft State Yes Yes Yes
FGMP Mesh Receiver Yes No Soft State Yes Yes Yes
CAMP Mesh Source or No No Hard State Yes No No
Receiver
NSMP Mesh Source Yes No Soft State Yes Yes Yes

Traditional multicast routing protocols for wireless by multiple factors of noise jamming, signal
network cannot be implemented as it is in mobile ad- interference and etc, the actually available effective
hoc network which poses new problems and challenges bandwidth for mobile terminals will be much smaller
for the design of an efficient algorithm for MANET. than the maximum bandwidth value in theory.
Mobile Ad Hoc network mainly showed the following The limitation of mobile terminal: although the user
aspects: terminals in mobile Ad Hoc network have
Dynamic network topology structure: In mobile Ad characteristics of smart and portable, they use the
Hoc network, the node has a arbitrary mobility, the fugitive energy like battery as their power and with a
network topology structure may change at any time, CPU of lower performance and smaller memory,
and this change mode and speed are difficult to predict. especially each of the host computers doubles the
router, hence, there are quite high requirements on
Limited bandwidth transmission: Mobile Ad Hoc routing protocols.
network applies wireless transmission technology as
its communication means, it has a lower capacity Distributed control: there is no central control point in
relative to the wireless channel. Furthermore, affected mobile Ad Hoc network, all the user terminals are equal,

48 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

and the network routing protocols always apply the node failure. For example if any nodes moves out of
distributed control mode, so it has stronger robustness transmission range dividing tree into two or more sub-
and survivability than center-structured network. tree which makes the communication difficult among
Multihop communication: as the restriction of wireless all the nodes in the tree. In addition the overhead
transceiver on signal transmission range, the mobile involved in maintaining the multicast tree is relatively
Ad Hoc network is required to support multihop larger as compared to other protocols.
communication, which also brings problems of hidden Resource requirement for mesh based multicast routing
terminals, exposed terminals, equity and etc. protocols is much larger as compared to tree based
Security: as the application of wireless signal channel, protocols. It also suffers from routing loop problems
wired power, distributed control and etc, it is and special measures are taken to avoid such problems
vulnerable to be threatened by security, such as which incur extra overhead on the overall
eavesdropping, spoofing, service rejecting and etc communication system.
attacking means. The biggest advantage of such protocols are their
Till date so many multicast routing protocols have robustness, if one link fails it will not affect the entire
been proposed and they have their own advantages communication system. Therefore such protocols are
and disadvantages to adapt to different environments. suitable for harsh environments where topology of the
Therefore the hope for a standard multicast routing network is changing very rapidly.
protocol which will be suitable for all network scenarios Hybrid routing protocol is a combination of both the
is highly unrealistic. tree and mesh and is suitable for an environment with
At the same time, it is very difficult to confirm moderate mobility. It is as efficient as tree based
multicast routing algorithms or protocols adapted to protocols and at the same time it survives the frequent
specific application fields for mobile Ad Hoc network, breaks in the network due to high mobility of nodes.
because the application of Ad Hoc network requires a A comparison of all multicast routing protocols discussed
combination and integration of the fixed network with above has been summarized in Table1 at the end.
the mobile environment. So there still needs a deeper
research of multicast application in the mobile Ad Hoc V. Conclusion
network environment. Mobile Ad hoc network faces variety of challenges like
IV. Comparison Of Multicast Routing Dynamic network topology structure, Limited
Protocols bandwidth transmission, The limitation of mobile
The design goal of any multicast routing protocol to terminal, Distributed control, Multihop
transmit information to all intended nodes in an communication and Security therefore routing is more
optimum way and incur minimum redundancy in the difficult in such challenging environment as compare
process. to other networks.
All the protocols try to deal with many problems like Multicast routing is a mode of communication in
nodes mobility, looping, routing imperfections, which data is sent to group of users by using single
whether on demand construction, routing update, the address. On one hand, the users of mobile Ad Hoc
control over packet transmission methods (net-wide Network need to form collaborative working groups
flooding broadcast or broadcast subjected to member and on the other hand, this is also an important means
nodes) etc. of fully using the broadcast performances of wireless
In all tree based multicast routing protocols a unique communication and effectively using the limited
path is obtained between any pair of nodes which saves wireless channel resources.
the bandwidth required for initializing muticast tree This paper summarizes and comparatively analyzes the
as compared to bandwidth requirement of any other routing mechanisms of various existing multicast
structure. The disadvantage of these protocols is the routing protocols according to the characteristics of
survivability of communication system in case of link/ mobile Ad Hoc network.

Volume 8, Issue 1 • January-June, 2017 49


IITM Journal of Management and IT

References
1. T. Nadeem, and S. Parthasarathy, “Mobility Control for Throughput Maximization in Ad hoc Networks,” Wireless
Communication and Mobile Computing, Vol. 6, pp. 951 967, 2006.
2. CHEN-CHE HUANG AND SHOU-CHIH LO, “A Comprehensive Survey of Multicast Routing Protocols for Mobile
Ad Hoc Networks”
3. T. Ozaki, J.B. Kim, and T. Suda, “Bandwidth efficient Multicast Routing for Multi hop Ad hoc Networks,” in Proceedings
of IEEE INFOCOM, Vol. 2, pp. 1182 1191, 2001.
4. X. Zhang, L. Jacob, “MZRP: An Extension of the Zone Routing Protocol for Multicasting in MANETs,” Journal of
Information Science and Engineering, Vol. 20, pp. 535 551, 2004.
5. P. Sinha, R. Sivakumar, and V. Bharghavan, “MCEDAR: Multicast Core Extraction Distributed Ad hoc Routing,”
IEEE Wireless Commun. and Net.Conf. (WCNC), pp. 13131317, 1999.
6. L. S. Ji and M.S. Corson, “Differential Destination Multicast a MANET Multicast Routing for Multihop Ad hoc
Network, in Proceedings of IEEE INFOCOM, Vol. 2, pp. 11921201, 2001.
7. C. W. Wu, Y. C. Tay, C. K. Toh, “Ad hoc Multicast Routing Protocol Utilizing Increasing IdNumberS (AMRIS)
Functional Sspecification,” Internet-Draft, draft-ietf-manet-amris-spec-00.txt, 1998.
8. J. Xie, R. Talpade, T. McAuley, and M. Liu, “AMRoute: Ad hoc Multicast Routing Protocol,” ACM Mobile Networks
and Applications (MONET) Journal, Vol. 7, No.6, pp. 429439, 2002.
9. E. M. Royer and C. E. Perkins, “Multicast Operation of the Ad-hoc On-demand Distance Vector Routing Protocol”, in
Proc. ACM MOBICOM, pp. 207-218, Aug. 1999.
10. C. E. Perkins and E. M. Royer, “Ad-hoc On-demand Distance Vector Routing”, in Proc. IEEE WMCSA, pp. 90-100,
Feb. 1999.
11. L.-S. Ji and M. S. Corson, “Explicit Multicasting for Ad Hoc Networks”, Mobile Networks and Applications”, Vol. 8, No.
5, pp. 535-549, Oct. 2003.
12. C. W. Wu and Y. C. Tay, “AMRIS: A Multicast Protocol for Ad Hoc Networks”, in Proc. IEEE MILCOM, Vol. 1, pp.
25-29, Nov. 1999.
13. J. G. Jetcheva and D. B. Johnson, “Adaptive Demand-driven Multicast Routing in Multi-hop Wireless Ad Hoc Networks”,
in Proc. ACM MOBIHOC, pp. 33-44, Oct. 2001.
14. P. Sinha, R. Sivakumar, and V. Bharghavan, “CEDAR: A Core Extraction Distributed Ad Hoc Routing Algorithm”,
IEEE Journal on Selected Areas in Communications, Vol. 17, No. 8, pp. 1454-1466, Aug. 1999.

50 National Conference on Emerging Trends in Information Technology


Dr

Relevance of Cloud Computing in


Academic Libraries
Dr. Prerna Mahajan*
Dr. Dipti Gulati**
Abstract
Cloud computing is one of the most recent technology models for IT services which is being adopted by
several organizations and individuals.Cloud computing allows them to avoid locally hosting and operating
multiple servers over an organization’s network and constantly dealing with hardware failure, software
installation, upgrades, backup &various compatibility issues which also enables them to save costs.
Cloud Computing emerged as a significant advantage to the libraries and is offering various opportunities
for libraries to connect their services with Cloud computing. This paper presents an overview of cloud
computing and its possible applications that can be clubbed with library services in a web-based
environment.
Keywords: Cloud Computing, Academic Libraries

Introduction There are various synonyms for Cloud Computing


Cloud computing is the latest technology model for such as, ‘On-Demand Computing’, ‘Software as a
IT services, which a large number of organizations Service’, ‘Information Utilities’, ‘The Internet as a
and individuals are adopting. Cloud computing Platform’ besides numerous others.
transforms, the way systems are built and services According to the US National Institute of Standards
delivered, providing libraries with an opportunity to Technology (NIST), “Cloud Computing is a model
extend their impact. Cloud computing is internet- for enabling convenient, on-demand network access
based computing, in which virtual shared servers to a shared pool of configurable computing resources
provide software, infrastructure, platform devices and that can be rapidly provisioned and released with
other resources and hosting to customers on a pay-as- minimal management efforts or service provider
you-use basis. Presently, most of the organizations and interaction”. 1
individuals use computers to work alone, inside a
business or at home by investing on hardware, software Cloud computing, often referred to as simply “the
and maintenance. This scenario is slowly altering due cloud,” is the delivery of on-demand computing
to the emergence of a new breed of Internet services, resources—everything from applications to data
popularly known as Web 2.0, through which any centers—over the internet on a pay-for-use basis.
individual can use the power of computers at a l Elastic resources—Scale up or down quickly and
completely different location, what it is popularly easily to meet demand
called as ‘in the cloud’ or ‘Cloud Computing’.
l Metered service so you only pay for what you use
l Self service—All the IT resources you need with
Dr. Prerna Mahajan*
self-service access.2
Head of the Department
Institute of Information Technology and Cloud computing refers to the use of web for computing
Management needs which could include using software applications,
Dr. Dipti Gulati** storing data, accessing computing power, or using a
Librarian platform to build applications. There is a vast array of
Institute of Information Technology and utilities ranging from e-mail, to word processing or
Management photo sharing or video sharing where a person can use
IITM Journal of Management and IT

http://convergenceservices.in/blog

products that live in the cloud, which are secure, backed- precious time for the computer staff, which they can
up and accessible from any Internet connection. The invest on running other services without worrying about
best live example of this is Gmail, which is increasingly upgrading, backup, compatibility, and maintenance of
being used by organizations and individuals to run their servers, which is taken care of by Google. Libraries use
e-mail services. Google Apps being free for educational computers for running services, such as, Integrated
institutions is widely used for running a variety of Library Management Software (ILMS), website or
applications, especially the email services, which were portal, digital library or institutional repository. These
earlier being using on their own computer servers. This are either maintained by parent organization’s computer
has proved to be cost effective organizations since they staff or library staff, which involves huge investments
pay-per-use for applications and services and saves on hardware, software, and helps staffs to maintain the

http://www.globaldots.com/cloud-computing-types-of-cloud/

52 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

services and undertake the backups and upgrades, when condensed and offered as a service, upon which
new version of the software gets released. other higher levels of service can be built. The
customer has the freedom to build his own
Library professionals in most of the cases are not being
applications, which run on the provider’s
adequately trained in maintaining servers and often
infrastructure. To meet manageability and
find it difficult to undertake some of these activities
scalability requirements of the applications, PaaS
without the support of IT staff from within the
providers offer a predefined combination of OS
organization or through external sources. In the present
and application servers, such as LAMP Platform
day, Cloud Computing has become the latest
(Linux, Apache, MySql and PHP), restricted
buzzword in the field of libraries, which is blessing in
J2EE, Ruby, Google’s App Engine, Force.com,
disguise to operate various ICT services without any
which are some of the popular PaaS examples.
problem since third-party services will manage servers
and undertake upgrades and take back-up of data. 3. Infrastructure as a Service (IaaS): IaaS provides
Currently, some of the libraries have adopted the use basic storage and computing capabilities as
of cloud computing services as an emerging technology standardized services over the network. Servers,
to operate their services despite the fact that there are storage systems, networking equipment, data
certain areas of concern in using cloud services such center space are pooled and made available to
as privacy, security, etc. manage workloads. The customer would typically
deploy his own software on the infrastructure.
Types of Cloud Computing Some of the common examples are Amazon,
There are four types of Cloud Computing: GoGrid, 3 Tera, et al.
1. Private/Internal Cloud: Cloud operated internally
Application of Cloud Computing in Libraries
for a single enterprise.
Libraries are shifting their services with the attachment
2. Public/External Cloud: Applications, Storage and of cloud and networking with the facilities to access
other resource materials that are made available these services anywhere and anytime.
to the general public by the service providers.
In the libraries, the following possible areas were
3. Community Cloud: A Public Cloud tailored to a identified where cloud computing services and
particular community. applications may be applied:
4. Hybrid Cloud: A Combination of the internal and 1. Building Digital Library/Repositories: In the
external cloud. This type of hybrid cloudin the present situation, every library requires a digital
Community clod and Hybrid Cloud are used library to offer their resources, information and
interchangeably. services at an efficient level to ensure access via
the network. Therefore, every library has a digital
Cloud Computing Models library that is developed through the use of any
Cloud Computing Providers offer their services which digital library software.
can be grouped into three categories:
2. Searching Library Data: OCLC is one of the
1. Software as a Service (SaaS): In this model, a best examples for utilizing cloud computing for
complete application is offered to the customer, sharing libraries data for years together. OCLC
as a service on demand. A single request of the World Cat service is one of the well-accepted
service runs on the cloud & multiple end users services for searching library data that now is
are serviced. Today SaaS is offered by the available on the cloud. OCLC is offering various
companies that are: Google, Salesforce, Microsoft services pertaining to circulation, cataloguing,
and Zoho. acquisition and other library related services on
2. Platform as a Service (PaaS): In this model, a the cloud platform through the web share
layer of software or development environment is management system. A Web share management

Volume 8, Issue 1 • January-June, 2017 53


IITM Journal of Management and IT

system facilitates in the development of an open social networking services, such as, Twitter and
and collaborative platform in which each a library Facebook play a dominating role in building
can share their resources, services, ideas and community power. This cooperative effort of
problems with the library community on the libraries will create time saving efficiencies and a
clouds. On the other hand, the main objective of wider recognition, cooperative intelligence for
web-scale services is to provide cloud based better decision-making and provides the platform
platforms, resources and services with cost-benefit for innovation and sharing the intellectual
and effectiveness to share the data and building conversations, ideas and knowledge.
the broaden collaboration in the community.
5. Library Automation: For library automation
3. Website Hosting: Website hosting is one of the purpose, Polaris offers variant cloud- based
earliest adoptions of cloud computing as services, such as, acquisitions, cataloguing, process
numerous organizations including libraries prefer system, digital contents and provision for
to host their websites on third party service inclusion of cutting edge technologies used in
providers rather than hosting and maintaining libraries and also supports various standards such
their own servers Google Sites, which serve as an as MARC21, XML, Z39.50, Unicode and so on
example of a service for hosting websites externally which directly related to library and information
of the library’s servers and allowing for multiple science area. Apart from this, nowadays a majority
editors to access the site from varied locations. of the software vendors such as Ex-Libris, OSS
Labs are also offering this service on the cloud
4. Building Community Power: The Cloud
and third party services providing hosting of this
Computing technology offers tremendous
service (SaaS approach) on the cloud to save
opportunities for libraries to build networks
libraries from investing in hardware for this
among the library and information science
purpose. Besides cost-benefit, the libraries will be
professionals as well as other interested people
free from taking maintenance that is software
including information seekers by using social
updates, backup and other facilities.
networking tools. One of the most well-known

Advantages and Disadvantages of Cloud Computing in Libraries

54 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

In the present situation of Indian Libraries in India, the design detail will help in ensuring a successful
cloud computing in libraries is in the development deployment. Certainly cloud computing can bring
phase. Libraries are attempting to offer their users about strategic, transformation and even revolutionary
cloud-based services however in reality they are not benefits fundamental to digital libraries. As regards to
fully successful mainly due to lack of good service organizations providing digital libraries, with
providers and technical skills of LIS professionals in significant investment in traditional software and
the field of library management using advanced hardware infrastructure, migration to the cloud will
technology. Yet some of the services such as digital highlight considerable technology transition; for less-
libraries, web documentation and using Web2.0 constrained organizations or those with infrastructure
technologies are operating on a successful mode. Some nearing end-of-life, adaptation of cloud computing
of the excellent examples of successful cloud technology may be more immediate.
computing libraries include Dura cloud, OCLC
No doubt, libraries are shifting towards cloud
services and Google-based cloud services. In the
computing technology in the present times and taking
current state, countless commercial as well as open
advantages of these services, especially in building
sources venders (i.e. OSS) are clubbing the cloud
digital libraries, social networking and information
computing technology into their services and products.
communication with manifold flexibilities yet some
However, cloud computing technology is not totally
issues related to security, privacy, trustworthiness and
accepted in the Indian libraries although they are trying
legal issues are still not completely resolved. Therefore,
to develop themselves in this area.
it is high time for libraries to think seriously before
Conclusion clubbing libraries services with cloud-based
technologies and provide reliable and rapid services
Cloud Computing represents an exciting opportunity
to their users. Another responsibility of LIS
to bring on-demand applications to Digital Library
professionals in this virtual era is to make cloud based
in an environment of reduced risk and enhanced
services a reliable medium to disseminate library
reliability. However, it is important to understand that
services to their target users with ease of use and
existing applications cannot just be unleashed on the
trustworthiness.
cloud as they are in existence. A careful attention to

References
1. Aravind Doss, and Rajeev Nanda. (2015). “Cloud Computing: A Practitioner’s Guide.” TMH. New Delhi.
P-265.
2. https://www.ibm.com/cloud-computing
3. Anna Kaushik and Ashok Kumar. (2013). “Application of Cloud Computing in Libraries.” International Journal
of Information Dissemination and Technology. 3 (4): 270-273.
4. Jadith Mavodza. “Impact of Cloud Computing on the Future of Academic Libraries and Services.” Proceedings
at the 34th Annual Conference of the International Association of Scientific and Technological University
Libraries (IATUL), Cape Town, South Africa.
5. Anthony T Velte. and Others. (2015). “Cloud Computing: A Practical Approach”. TMH: New Delhi.
P- 1-23.
6. Aravind Doss, and Rajeev Nanda. (2015). “Cloud Computing: A Practitioner’s Guide.” TMH. New Delhi.
P-265-268.

Volume 8, Issue 1 • January-June, 2017 55


A brief survey on metaheuritic based techniques for
optimization problems
Kumar Dilip*
Suruchi Kaushik**
Abstract
This paper aims to provide a brief review of few popular metaheuristic techniques for solving different
optimization problems. In many non-trivial real life optimization problems finding an optimal solution is
a very complex and computationally expensive task. Application of the classical optimization techniques
is not suitable for such problems due to its inherent complex and large search space. In order to solve
such optimization problems, metaheuristic based techniques have been applied and popularized in
recent years. These techniques are increasingly getting the recognition as effective tools for solving
various complex optimization problems in reasonable amount of computation time. In this brief survey
of metaheuristic techniques we discuss few existing as well as ongoing developments in this area.
Keywords: Optimization problems; metaheuristics; Gentic algorithm; Ant Colony Optimization

I. Introduction techniques among others. First, majority of them are


Application of metaheuristic based techniques for inspired by several working mechanisms of nature
solving real life complex decision making problems is which include biology and physics. Second, they
gaining popularity as the underlying search space of consider many random variables to perform the flexible
such problems are complex and huge in size [2,22]. stochastic search of the large search space. And third,
Although, the heuristic based methods have been they also involve the various parameters and proper
considered as a viable option for solving the complex tuning of them can greatly affect the overall
optimization problems as they are likely to provide performance of the techniques for the considered
good solutions in reasonable amount of time. However problem. The effectiveness of the metaheuristic
the limitation with the heuristic based technique is technique for problem at hand significantly lies on
the focus on the specific feature of the underlying two major concepts, known as intensification or
problem, which makes the design of approach very exploitation and diversification or exploration. The
difficult. In order to address this issue the application exploration tries to identify the potential search area
of metaheuristic based methods is considered as a containing good solutions while exploitation aims to
feasible option. They are not problem specific and can intensify the search in some promising area of search
be effectively adapted for the different types of space. The optimal balance between these two
optimization problems. Alternatively, the mechanisms during search process may lead towards
metaheuristic techniques provide a generic algorithmic comparatively better solutions [2, 22].
approach to solve various optimization problems by The application of metaheuristic techniques is
making comparatively few adjustments according to considered well suited for those optimization problems
problem specification. In general three common where no acceptable problem-specific algorithms are
features can be identified in most of the metaheuristic available for solving them. The application area of
Kumar Dilip* metaheruistic techniques include, finance, marketing,
Department of IT services, industries, engineering, multi-criteria decision
IITM making among others. These techniques may provide
Suruchi Kaushik** good or acceptable solutions to various complex
Department of IT optimization problems in this area with effective
IITM computation time.
IITM Journal of Management and IT

In recent years , popular metaheuristic techniques such iteration attempt to move towards the better solutions.
as Evolutionary algorithm, Genetic algorithm, Ant In recent years the population based metaheuristic
Colony Optimization, Particle Swarm Optimization, techniques have been gaining comparatively more
Bee colony optimization, Simulated Annealing, Tabu popularity and more new population based techniques
Search etc. have been widely used for different are getting reported in literature [21, 22, 23]. Keeping
optimization problems[11,12, 13, 16, 17, 21, 24, 25, this in mind this paper majorly focus on the population
26]. All of the above techniques have certain based techniques. However the details of the single
underlying working principle and various strategic solution based or trajectory based metaheuristic
constructs that may enable them to solve the problems techniques can be found in the literature [21, 22, 23
efficiently. However, in recent few years a new kind of ]. In the next section we describe two popular
metahueristic which is unlike the above approaches, population based metahuristic techniques.
do not belong to a specific metaheuristic category but
combines the approaches form the different areas like III. Population based metaheuristic techniques
computer science, biology, artificial intelligence and The majority of population based methods either
operation research etc. These new class of metaheuristic belongs to class of Evolutionary algorithms or Swarm
techniques are normally referred as Hybrid Intelligence based methods. The inherent mechanism
metaheuristc. In order to improve the performance, of evolutionary algorithm is mainly based on the
concept of quantum computing has also been applied Darwin’s theory of the survival of the fittest. The
to solve the optimization problems. With the intent population of solutions improves iteratively generation
of further improving the performance of the after generation. Fitter solutions are selected to
approaches various quantum inspired metaheuristic reproduce the better solutions for the next generation.
techniques have been proposed in literatures [14]. However, in Swarm intelligence based techniques,
instead of a single agent, the collective intelligence of
The lists of metaheuristic techniques are extensive and
the group is exploited to find the better solutions
it is difficult to summarize them in a brief survey, this
iteratively.
paper also not intended to do so. Rather, this paper
attempt to give a brief introductory overview of few Evolutionary algorithms refer to a class of
popular metaheuristic techniques. In the next section metaheuristic techniques whose underlying working
classification of the metaheuristic based techniques has mechanism is based on the Darwin’s theory of
been described. evolution. According to this theory the fitter living
beings which can better adapt in the changing
II. Classification of metaheuritstic techniqeus environment can survive and can be selected to
Many criteria can be found for the classification of reproduce the better offspring. This generic class of
various metaheuristic techniques. However the more techniques includes evolutionary programming,
common classification of metaheuristic techniques, Genetic algorithms, Genetic programming,
based on the use of single solution and population of evolutionary strategies etc.[15,18,19,20,29]. Though
solutions can be found in literature. The popular single these techniques differ in their algorithmic approach,
solution based techniques also known as the trajectory yet their core underlying working is similar. The
methods include, Simulated Annealing, Tabu Search, evolutionary algorithms are mainly characterized by
Variable Neighborhood Search, Guided Local Search, three important aspects, first the solution or individual
Iterated local search [27,28]. The single solution based representation, second the evolution function and third
approaches start with single initial solution and population dynamics throughout the algorithmic runs.
gradually move off from this solution depicting a All of the evolutionary techniques in every generation
trajectory movement in large search space [ 27, 28]. or algorithmic iteration attempt to select the better
solutions in terms of its objective function values.
Unlike single solution based metaheuristic techniques
These solutions further apply the mechanism of
the population based metaheuristic techniques begin
recombination and mutation operator to produce the
with a population of solutions and in every algorithmic

Volume 8, Issue 1 • January-June, 2017 57


IITM Journal of Management and IT

Procedure Evolutionry Algorithm


Begin Procedure
Initialize the population of the individuals or solutions,
Evaluate the fitness of the each individulas,
While stopping criteria not met, do
Select the fitter individual as parents
Recombine the pair of fitter solutions to produce offspring
Perform the mutation on the offspring solutions
Evaluate the new individuls or solutions
Select the fitter solutions for the next generation
End While
End Procedure
Return solution.
Figure 1: A generic view of Evolutionary Algorithm
better solutions in the next generations. Next a generic algorithms are based on the Darwin’s evolutionary
evolutionary approach has been described in order to theory in which fitter indivdulas are likely to survive
depict the common algorithmic steps in the above and having the higher probability of production
evolutionary algorithms. offsprings for the next genration. This very idea has
been adapted in the algorithmic framework of genetic
In the above procedure each iteration indicates a
algorithms. The candiadate slutions or population of
generation in which population of individuals or
individuals iteratively evolve towards the search space
candidate solutions are evaluated to check its fitness
of fitter or better solutions in each algorithmic
according to given objective function of the problem
iteration. In order to apply the GA for problem solving,
at hand. Among those individuals the set of fitter
the algorithmic requirement is to decide the
individuals are selected by applying some suitable
repersentation of the solution or the chromosome. A
selection mechanism. The pairs of fitter solutions are
binary or alphabetic string of fixed length is common
selected to perform the recombination to produce the
representation of candidate solution in GA
better offspring solutions. Further the mutation is
implementation. Next rquirement is to choose from
performed on the offspring with the intent of
the various selection strategy in order to select the fitter
promoting the diversity in the solutions. These newly
solutions, most popular selection and use of various
created solutions are evaluated for the given objective
possible crossover and mutation operators. A candidate
function to check their suitability to use it for the next
solution is represented by a chromosome and a number
generation. The above procedure will continue
of chromosomes constitute the entire population of
iteratively till the termination condition is satisfied.
the current generation. A population in current
The possible termination condition can be
generation evolves to next generation through above
predetermined number of generation or the condition
mentioned three main operators i.e. selection,
when there is no further improvement in solutions.
crossover and mutation. All these operators play a
There may also be other possible criteria for the
crucial part in the performace of the Genetic algorithm
termination of the algorithmic runs.
for the considered problem and their proper tuning is
Genetic Algorithm (GA) essential aspect of the GA implementiation. In most
of the cases the focus is on the crossover as a variation
The idea of Genetic algorithm were first introduced
operator. The crossover operator is usually applied on
by John Holland in 1970’s. This evolutionary search
the pair of the selected chromosome after performing
Technique has been widely applied for different types
selection strategy. The various crossover operators can
of real world optimization problems. As an
be found in the literature and their application may
evolutionary technique, the concepts of Genetic

58 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

depend upon the considered problem and or also on Salesman problem [5]. In majority of the cases, where
the solution representation. With the help of crossover ACO is applied the problem subjected to is represented
operator two or more solutions may exchange their with a graph. ACO is a population based
genetic materials or some part of the solutions and metaheuristic. Various ants of real world, in search of
create new individuals. The cross over rate of the their food, work in a group and they find the shortest
population indicates the total number of chromosomes path from nest to the food source. This very behaviour
or solutions that would undergo the crossover or of real ants has inspired the ant colony optimization,
recombination. Each chromosome in the population in which a group of simple agents work in co-operation
has a fitness value determined by the objective in order to achieve the complex task. The real world
function. This fitness value is used by selection ants attempt to find the quality food sources nearest
operator to evaluate the desirability of the chromosome to their colony. In this pursuit they deposit some
for next generation. Generally, fitter solutions are chemicals on the search path also known as
preferred by the selection operator but some less fitter pheromones. The paths with good food sources and
chromosomes can also be considered in order to lesser distance from nest is likely to get more amount
maintain the population diversity. Crossover operator of pheromones. Paths with higher pheromone density
is applied on the selected chromosomes to recombine are highly likely to be selected by following ants. Such
them and generate new chromosome which might have behaviour of ants gradually leads towards the
better fitness. Mutation operator is applied to maintain emergence of the shortest path from nest to good food
the population diversity throughout the optimization source. Alternatively, it can be observed that the
process by introducing random modifications in the indirect communication or communication through
population.The Evoluationary algorithms have been enviroment, by using pheromone trails and without
applied for the optimization problems of the diverse any central control among ants, they are likely to find
area. It has been succesfully applied for the different the shortest path from their colony to food source. In
combinatorial optimization problems and constrained addition, artficial ants of Ant Colony Optimization have
optimization problems[7]. In recent years, it is also some extra characteristics which real ants do not have.
getting popularity in the area of multi-criteria These characteristics include presence of memory in
optimization problem. Finding the trade-off solutions artificial ants of ACO, which helps in constructing the
for the multi-objective optimization problem is a feasible candidate solutions and awareness about its
complex task. Evoluationary algorithms based environment for better decsion making during the
techniques like NSGA-II has been successfully applied solutions construction. In ACO, ants probabilistically
for several multi-objective optimization problem construct solutions using two important information
[1,3,8,9,10]. known as pheromone information and heuristic
information. The pheromone information τ(ij) repersents
In recent years the quantum inspired Genetic
the amount of pheromone on edge or solution
algorithm is also getting a lot of attention. It applies
component (i,j) and η(ij) repersents the preference of
the pricipal of quantum computing combined with
selection of node j from node i, during solution
evolutionary algorithm [14]. Insetead of binary,
construction. Both of these values are reperented using
numeric or symbolic repersentation, Quantum
numeric values. Both of these values influence the process
inspired algorithm applies Q-bit repersentation and
of search towards higher pheromone values and heuristic
Q-gate operator is used as a variation operator.
information values. In addition, the pheromone
Next we describe the swarm intelligence based information or denstiy on the path are updated at every
technique, Ant colony optimization or ACO. algorithmic iteration. The pheromone information
repersents the past search experience while heuristic
Ant Colony Optimization (ACO) information is problem specfic which remains unchanged
Ant colony optimization is a metaheuristic wich is throughout the algorithmic run of ACO. The solution
inspried by the behaviour of the real ants. This in each iteration is probabilistically constructed using
approach was first applied for solving Travelling the following formula:

Volume 8, Issue 1 • January-June, 2017 59


IITM Journal of Management and IT

P(ij) repersents the probability of selection of node j After the completion of solution construction, a
after node i in partially consturcted solution, mechanism of evaporation is applied with the intent
l indicates the available nodes for the solution of forgetting the unattractive choices and no path
construction or the nodes which are not already part become too dominating as it may lead towards the
of partially constructed solution. Here α and β indicate premature convergence. The path update at every
the relative importance for pheromone information iteration performed using the following formula:
and heuristic information respectively.

In the above formula, ρ indicates the pheromone decay initial work of ant system, many variants of ant based
coefficient, τ(0) indicate some intial pheromone value optimization techniques have been proposed in
deposited on the edge (ij). literature for solving various combinatorial
optimization problems such as Travelling salesman
In addition, daemon actions such as local search can
problem, vehicle routing problem, production
be applied as an optional action to further improve
scheduling, quadratic assignment problems, among
the quality of solution. The first ant colony based
others[4,5,6]. An abstract view of the ACO is as
optimization technique was proposed in [6] to solve
follows:
the single objective optimization problems. After the
Procedure ACO
Initialize pheromone matrix τ,
Initialize heuristic factor η,
While stopping criteria not met do
Perform ProbailisticSolutionsConstuction( )
Perform LocalSearchProcess( ) // optional action
Perform PheromoneUpdateProcess()
End While
End Procedure
Return best solution.
Figure 2. An ACO procedure [ 4,5,6]
An ant based system consists of multiple stages as nodes. As an optional action, local serach can be
shown in figure 2. In the first step, evaluation function performed for further improvement of the quality of
and the value of pheromone information (τ) are solution. Once each ant completes the process of the
initialized. In the next step, at each algorithmic solution constuction, the process of pheromone update
iteration, each ant in a colony of ants incremently using evaporation mechanism is performed. The best
constructs the solution by probabilistically selecting solution/solutions in terms of the value of the given
the feasible components or nodes from the available objective function is chosen to update the pheromone

60 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

information. The algorithmic iteration of solution Classification rules, Bayesian networks, Protein folding
construction and pheromone update ends when it among others [4]. In recent years it has been also
meets some predefined condition and the best solution gaining popularity for solving various multi-objective
is returned. This could be some predefined number optimization problems.
of generation or the condition of stagnation when there
is no further imporvment in solution is found. Conclusion
In this survey we have briefly described the
The ACO has been widely and succesfully applied for
metaheuristic based techniques for solving various
the various problems which include Travelling
optimization problems. Considering the distinction
Salesman problem, vehicle routing, Sequential
between the metaheuristic techniques based single
ordering, Quadratic Assignment, Graph coloring,
solutions approach and population based approaches,
Course timetabling, Project sheduling, Total weighted
we described introductory idea of two popular and
tardiness, Open shop, Set covering, Multiple knapsack,
widely used population based approaches including
Maximum clique, Constraint satisfaction,
Genetic algorithm and Ant colony optimization.
References
1. Asllani, A., & Lari, A. (2007). ‘Using genetic algorithm for dynamic and multiple criteria web-site
optimizations’, European journal of operational research, Vol. 176, No. 3, pp. 1767-1777
2. Basseur, M., Talbi, E., Nebro, A. & Alba, E. (2006). ‘Metaheuristics for Multiobjective Combinatorial
Optimization Problems: Review and recent issues’, INRIA Report, September 2006, pp. 1-39
3. Coello-Coello, C. A., Lamont, G. B. & van Veldhuizen, D. A. (2007). ‘Evolutionary Algorithm for solving
multi-objective problems, Genetic and Evolutionary Computation Series’, Second Edition, Springer.
4. Dorigo, M. & stutzle, T. (2004). Ant colony optimization, Cambridge: MIT Press, 2004
5. Dorigo, M. & Gambardella, L.M.,(1997) ‘Ant colonies for the traveling salesman problem’, BioSystems, vol.
43, no. 2, pp. 73–81, 1997.
6. Dorigo, M., Maniezzo,V. & Colorni, A., (1996) ‘Ant System: Optimization by a colony of cooperating
agents,’ IEEE Transactions on Systems, Man, and Cybernetics—Part B, vol. 26, no. 1, pp. 29–41, 1996.
7. Kazarlis, S.A., Bakirtzis, A.G. & Petridis, V (1996). ‘A genetic algorithm solution to the unit commitment
problem’, IEEE Transactions on Power System, Volume 11, Number 1, pp. 82-92
8. Deb, K., Pratap, A., Agarwal, S & Meyarivan, T. (2002). ‘A fast and elitist multiobjective Genetic Algorithm:
NSGA-II’, IEEE Transaction on Evolutionary Computation, Vol. 6, No. 2. pp. 182-197
9. Deb, K. (2010). Multi-objective optimization using Evolutionary algorithms. Wiley India.
10. Doerner, K. F., Gutjahr, W. J., Hartl, R. F., Strauss, C. and Stummer, C (2004). “Pareto ant colony optimization:
A metaheuristic approach to multiobjective portfolio selection,” Annals of Operations Research, vol. 131,
pp. 79–99,2004.
11. T’kindt, V., Monmarch´e, N., Tercinet, F. & La¨ugt, D (2002). “An ant colony optimization algorithm to
solve a 2-machine bicriteria flowshop scheduling problem,” European Journal of Operational Research, vol.
142, no. 2, pp. 250–257, 2002
12. Wang L., Niu, Q. & Fei, M.(2007) ‘A Novel Ant Colony Optimization Algorithm’, Springer Verlag Berlin
Heidelberg. LNCS 4688, pp. 277– 286, 2007
13. Goldberg, D. E. (1989). Genetic Algorithm in Search, Optimization and Machine Learning, Pearson
Education, India

Volume 8, Issue 1 • January-June, 2017 61


IITM Journal of Management and IT

14. Han, K.–H. & Kim, J.–H., (2000)‘Genetic quantum algorithm and its application to combinatorial
optimization problem,’ in Proc. Congress on Evolutionary Computation, vol. 2, pp. 1354-1360, La Jolla,
CA,2000.
15. X. Yao, Y. Liu, Fast evolutionary programming, in: Evolutionary Programming, 1996, pp. 451–460.
16. F. Vandenbergh, A. Engelbrecht, A study of particle swarm optimization particle trajectories, Information
Sciences 176 (2006) 937–971.
17. S. Kirkpatrick, C. Gelatt, M. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671–680.
18. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection
(Complex Adaptive Systems), first ed., The MIT Press, 1992.
19. T. Bäck, H.P. Schwefel, An overview of evolutionary algorithms for parameter optimization, Evolutionary
Computation 1 (1993) 1–23.
20. S. Baluja, Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function
Optimization and Competitive Learning, Technical Report, Carnegie Mellon University, Pittsburgh, PA,
USA, 1994.
21. F. Glover, Tabu search for nonlinear and parametric optimization (with links to genetic algorithms), Discrete
Applied Mathematics 49 (1994) 231– 255.
22. M. Birattari, L. Paquete, T. Stützle, K. Varrentrapp, Classification of Metaheuristics and Design of Experiments
for the Analysis of Components, Technical Report AIDA-01-05, FG Intellektik, FB Informatik, Technische
Universität Darmstadt, Darmstadt, Germany, 2001.
23. E.G. Talbi, Metaheuristics: From Design to Implementation, first ed., Wiley-Blackwell, 2009.
24. S. Jung, Queen-bee evolution for genetic algorithms, Electronics Letters 39 (2003) 575–576.
25. D. Karaboga, An Idea Based on Honey Bee Swarm for Numerical Optimization, Technical Report TR06,
Erciyes University, 2005.
26. D. Karaboga, B. Akay, A survey: algorithms simulating bee swarm intelligence, Artificial Intelligence Review
31 (2009) 61–85.
27. N. Mladenovic, A variable neighborhood algorithm – a new metaheuristic for combinatorial optimization,
in: Abstracts of Papers Presented at Optimization Days, Montréal, Canada, 1995, p. 112.
28. N. Mladenovic, P. Hansen, Variable neighborhood search, Computers and Operations Research 24 (1997)
1097–1100.
29. X. Yao, Y. Liu, G. Lin, Evolutionary programming made faster, IEEE Transactions on Evolutionary
Computation 3 (1999) 82–102.

62 National Conference on Emerging Trends in Information Technology


Cross-Language Information Retrieval on Indian
Languages: A Review
Nitin Verma*
Suket Arora**
Preeti Verma***
Abstract
Cross Language Information Retrieval on Indian Languages (CLIROIL) can be used to improve the
ability of users to search and retrieve documents in different languages. The aim of CLIR is to provide
the benefit to the user in finding and assessing information without being limited by language barriers.
We can use Simple measures to get high - accuracy in cross-language retrieval in which translation is
one of them. Translation is one of the technique that makes use of software that translates text from one
language to another language. Different type of translation techniques (dictionary based translation,
machine translation, transitive translation, dual translation) can be used to achieve Cross Language
Information Retrieval. IR deals with presentation, storage, space, retrieval, and access of a multiple
document collection. This paper describes the work done in CLIR and translation techniques for CLIR.
This paper translates the work done.
Keywords: CLIROIL, Translation, Dictionary-based, Machine translation, Transitive translation.

I. Introduction Chinese. But Indian languages always have Cross


Cross Language Information Retrieval On Hindi Language Information Retrieval On Hindi Language
Language allows the users to read and search pages in allows the users to read and search pages in the language
the language different from the other language of being different from the other language of being searched.
searched. Cross language information retrieval is a kind Cross language information retrieval is a kind of
of information retrieval in which the language of the information retrieval in which the language of the
query is different from the language of the documents query is different from the language of the documents
retrieved as in a search result. In Cross Language retrieved as in a search result. In Cross Language
Information Retrieval system a user is not limited to Information Retrieval system a user is not limited to
his own native language, different set of languages are his own native language, different set of languages are
there, so the user can make his query in his native there, so the user can make his query in his native
language but the system returns set of documents in language but the system returns set of documents in
another different languages. Different foreign another different languages. Different foreign
languages have been used like English, French, Spanish, languages have been used like English, French, Spanish,
Chinese. But Indian languages always have system
Nitin Verma* simplifies the search process for multiple users and
Assistant Professor, Computer Science Dept., enables those who know only one language to provide
Hindu College, Amritsar queries in their language and then get help from
Suket Arora** translators for using other languages documents. CLIR
Assistant Professor, Dept. of Computer system simplifies the search process for multiple users
Applications, Amritsar College of Engineering & and enables those who know only one language to
Technology, Amritsar provide queries in their language and then get help
Preeti Verma*** from translator for using other languages documents.
Assistant Professor, Dept. of Computer CLIR. System simplifies the search process for multiple
Applications, Amritsar College of Engineering & users and enables those who know only one language
Technology, Amritsar to provide queries in their language and then get help
IITM Journal of Management and IT

from translators for using other languages documents. A. Direct Translation


Due to the “standardization” of terms, stemming The direct is of three types. Now we will explain them:
sometimes contributes in increasing the retrieval
 Corpus Based Translation
effectiveness. This is, however, not always the case.
 Dictionary Based Translation
Current search engines usually do not use aggressive
 Machine Based Translation
stemming, while in the area of research, stemming is
still generally used as a standard pre-processing. 1) Corpus Based Translation
Parallel corpora are commonly used in cross-language
II. Translation
information retrieval to translate queries. The basic
A full document translation can also be applied offline technique involves a side-by-side analysis of the corpus
to create translation of an entire document. The producing a set of translation probabilities for each
translations provide the basis for constructing an index term in a given query[1]. Large collections of parallel
for information retrieval and also offer the user the texts are referred to as parallel corpora. Parallel corpora
possibility to access the content in his native language. can be acquired from a variety of sources.
Multiple information search becomes important due
to large amount of online information available in 2) Dictionary Based Translation
different languages. We can also use an online A dictionary-based approach for the translation is very
translation through sources like i.e. Google, Wikipedia easy but it is having two limitations such as ambiguity
which confirms the accuracy of the search. Usually and lack of coverage[1].
machine translation system supports the translation.
Searching strategies are continuously improving their 3) Machine Translation
techniques to provide more relevant, accurate and Machine Translation is not only performs the
proper information for a given query. A common substitution of words from one language to other; but
problem with translation is word accuracy. This it also involves finding phrases and its counterparts in
problem can be solved by using different techniques target language to produce good quality translation.
.Various techniques are used to reduce the grammatical B. Indirect Translation
mistakes. The Search can also be filtered by providing
the unrestricted domains. Machine Translation is not Indirect translation relies upon the use of an intermediary
always available as a realistic option for every pair of which is placed between the source query and the target
languages. Widely translation system supports the document collection. In the case of transitive translation,
translation between language pairs which involve the the query will be translated into an intermediate to enable
languages likely as English, German or Spanish, and comparison with the target document collection. The
Chinese. In translating the document, firstly we select Indirect translation is two types:
a single query language and then translate every single l Transitive translation
document into that language then single retrieval is l Dual translation
carried out. This technique provides more context but 1) Transitive Translation
current systems don’t damage the context widely. But
one must have to determine in which language each Transitive translation relies upon the use of a pivot
document should be translated; translated documents language which acts as an intermediary between the
in all the languages should be stored. source query and the target document collection[1].
2) Dual Translation
III. Translation Techniques
Dual translation systems attempt to solve the query
Translation techniques in CLIR are categorized into document mismatch problem by translating the query
two types: representation and the document representations into
 Direct translation some “third space” prior to comparison. This “third
 Indirect translation space” can be another human language, an abstract

64 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

language or a conceptual inter-lingual. This general D. Some Advance Approaches


category also includes translation techniques that 1) Universal words
induce a semantic correspondence between the query
and the documents in a cross-language dual space They confirm the vocabulary of the language. To be
defined by the documents. able to express any concept occurring in a natural
language, the UNL proposes the use of English words
IV. Approaches of clir modified by a series of semantic restrictions that
There are different approaches for CLIR. Following eliminate the innate ambiguity of the vocabulary in
are approaches: natural languages. If there isn’t any English word
suitable to express the concept, the UNL allows the
A. Query Translation use of words from other languages. In this way, the
Multilingual information search becomes important language gets an expressive richness from the natural
due to increasing the amount of online information languages but without their ambiguity.
available in non-English languages and multiple
language document collections. This can be achieved 2) Relations
by Query translation. Query translation using CLIR These are a group of 41 relations that define the
became the widely used technique to access documents semantic relations among concepts. They include
of the different languages from the language of query. argumentative (agent, object, goal), circumstantial
For translating the query, we can use an online (purpose, time, place), logic (conjunction, and
translation i.e. Google Translate, train a Statistical disjunction) relations, etc.
Machine Translation system using parallel corpora,
employ Machine Readable Dictionaries to translate V. Knowledge Representation
query terms or use of large scale multilingual By knowledge bases in our context we understand the
information sources like Wikipedia . Google Translate set of concepts belonging to a specific domain and
query translation approach. Translation can be applied the relations between these concepts that also belong
to the query terms online. Online query translation to this domain. But when we turn to ontologies, the
can be achieved by using one of the Google Translate richness of a domain becomes relegated to a mere
API which will convert the query into the other enumeration of concepts and a taxonomic organization
languages. Online query translation will help the user of them. That is, there is danger of identifying
to translate his query in the other languages. Online ontologies as mere theasauri.[8]
query translation will help the user to translate his
query in the other languages [3]. VI. Challenges In CLIR
B. Interlingual Translation  Dictionaries only include the most commonly used
The Inter-lingual technique is useful if there is no proper nouns and technical terms used such as major
resource for a direct translation but it has lower cities and countries. Their translation is crucial for a
performance than the direct translation. The Inter- good cross-language IR system. A common method
lingual technique is useful if there is no resource for a used to handle untranslatable keywords is to include
direct translation but it has lower performance than the non-translated word in the target language query.
the direct translation [4]. A phrase cannot be translated by translating each of
the word in the phrases.
C. Document Translation
 Named entities extraction and translation are vital
In Document translation we select a single query
in the field of natural language processing for
language and then translate every document into that
research on machine translation, cross language
language then perform monolingual retrieval. Typically
IR, bilingual lexicon construction, and so on.
machine translation systems supports the translation
There are three types of Named entities; entity
between language pairs which involve languages, such
names such as organizations, persons and
as English, German or Spanish, and English.

Volume 8, Issue 1 • January-June, 2017 65


IITM Journal of Management and IT

locations, temporal expressions such as dates and VII. Applications of CLIR


times, and number expressions such as monetary  This CLIR System can be helpful for immigration
values and percentages. department. For eg. Immegration department
 Using the dictionary-based translation is a interact with thousands of the Indian native
traditional approach in cross-lingual IR systems Language speakers which are not able to
but significant performance degradation is understand English Languages .
observed when queries contain words or phrases  This System can be used for multilingual
that do not appear in the dictionary. This is called population regions so that the peoples having
the Out-of-Vocabulary. This is to be expected even different native languages retrieve documents in
in the best of dictionaries. Translation their native languages.
Disambiguation, which is rooted from  This system can also be used for intelligence
homonymy and polysemy[6]. Homonymy refers departments.
to a word which has at least two entirely different  The CLIR will be beneficial for students for their
meanings, for example the word “left” can either research work regarding historical places.
mean opposite of right or the past tense of leave.
Input queries by user usually short and even the VIII. Conclusion
query expansion cannot help to recover the CLIROIL provides us a new technique for searching
missing words because of the lacking documents through different kinds of languages across
information.[7] the whole world .By using the different type of
 A common problem with query translation is translation techniques CLIROIL make it possible to
word inflection used in the query. This problem provide the better search results in the other language
can be solved by stemming and lemmatization. to the language which is queried. So it will be beneficial
Lemmatization is where every word is simplified for wide population regions. Survey proves that query
to its uninflected form or lemma; while stemming translation is much better than document translation.
is where different grammatical forms of a word It is more convenient way to translate the query than
are reduced to a common shortest form which is the whole documents. Document translation which
called a stem, by removing the ending in word. uses machine translation is computationally quite
For example, the stemming rules for word “see” expensive and the size of document collection is large.
might return just “s” by stemming and “see” or However, it might be practical in the future when the
“saw” by lemmatization[4]. computer technology would be much improved.

References
1. Dong Zhou, Mark Truran, Tim Brailsford, Vincent Wade, Helen Ashman,” Translation Techniques in Cross-
Language Information Retrieval.
2. J. Cardeñosa, C Gallardo, Adriana Toni,” Multilingual Cross Language Information Retrieval A new approach”.
3. UNL Center. UNL specifications v 2005. http://www.undl.org/unlsys/unl/unl2005-e2006/
4. D. Manning, C., P. Raghavan, and H. Schütze, “An Introduction toInformation Retrieval”, 2009.
5. Nurul Amelina, Nasharuddin, Muhamad Taufik Abdullah, “Crosslingual Information Retrieval”,Electronic
Journal of Computer Science and Information Technology,Vol. 2,No. 1.
6. Abusalah, M., J. Tait, M. Oakes, “Literature Review of Cross Language Information Retrieval”,2005
7. Nurul Amelina, Nasharuddin, Muhamad Taufik Abdullah,”Crosslingual Information Retrieval”,Electronic
Journal of Computer Science and Information Technology,Vol. 2,No. 1,
8. Bateman, J.A; Henschel, R. and Rinaldi, F. “The Generalized Upper Model 2.0.” 1995. http:// http://
www.fb10.unibrem en.de/anglistik/langpro/webspace/jb/gum/index.htm

66 National Conference on Emerging Trends in Information Technology


Enhancing the Efficiency of Web Data Mining using
Cloud Computing
Tripti Lamba*
Leena Chopra**
Abstract
Data Mining is the process of discovering actionable information from raw data, which helps to enhance
the capability of existing business process. Due to the unrestricted use of Internet by individuals ubiquitously,
limitless data has to be stored and maintained on servers. World Wide Web is a group of massive
amount of information resources, interconnected files on Internet. Mining the valuable information
from this huge source is the main area of concern. In cloud computing web mining techniques and
applications are major areas to focus on. Another name for cloud Computing is a distributed computing
over the Network. Cloud computing doesn’t require to deploy the application on local computer as it
directly delivered the hosted services over the internet. The objective of the paper is to study the Map-
Reduce programming model and the Hadoop development platform of cloud computing and to ensure
efficiency of Web mining using these parallel mining algorithms.
Keywords: Data Mining, Web mining, Cloud Computing, map-reduce

I. Introduction has today changed computing to distributed computing


A) Web Mining or cloud computing. All the major Social Media sites:
Twitter, Facebook, Linked In, and Google+ contains
Extensive version of data mining can be termed as web
abundance of information are today on cloud platform.
mining. On web data is stored in a heterogeneous
For instance Tweets happen every millisecond on
manner in a semi-structured or unstructured form due
Twitter, they happen at the “speed of thought”. This
to which mining on web is difficult as compared to
data is available for consumption all the time. The data
traditional data mining. Web data mining is used to
on Twitter ranges from small tweets to long
extract useful information or facts from Web Usage
conversational dialogues to interest graphs etc. Now
logs[2], Web Hyperlinks, Web Page contents. Different
which data mining technique to apply, how to find
types of web Mining are:
association or correlation or how to cluster the data
l Web structure Mining based on their similarity, so as to gain efficiency in the
l Web Content Mining platform of cloud computing is the research area.
l Web Usage Mining [4] Problems associated with Web Mining
The process of extracting the information on Web is 1. Scalability: The database is huge and it contains
called Web content mining. In Web Mining, data large dataset so mining interesting rules adds on
collection is a substantial task especially for Web to uninterested rules that are huge. There is no
Structure and Web content mining, and involves efficient algorithm for extracting useful pattern
crawling a large number of Web pages[3]. The Internet from the huge database.
Tripti Lamba* 2. Type of Data: The data on Web is
Research Scholar heterogeneous[5]. Web cleaning is the most
Jagan Nath University, Jaipur, India important process and is very difficult for semi
Leena Chopra** structured data and unstructured data. According
Research Scholar to researchers 70% of the time is spent on data
Amity Univesity, Noida, India pre-processing.
IITM Journal of Management and IT

3. Efficiency: Mining rules from semi structure and However, in spite of improved movement and
unstructured as in the semantic web is a great attention, there are considerable, continual concerns
challenge. Lot of time and memory consumption about cloud computing that ultimately compromise
leads to decreased efficiency. the vision of cloud computing as a new IT
procurement model. Fundamentally Cloud Mining
4. Security: The data on web is accessed publicly.
is novel approach to faced search interface for your
There is no data that is hidden, so this is another
data. The major challenge which is a security of web
challenge in Web Mining.
mining is been offered by SaaS (Software-as-a Service)
B) Cloud Computing and used for dropping the cost which is termed as
cloud mining technique. It’s been targeted to change
The computer resources these days are consumed as
the existing framework of web mining to generate an
utility by various companies the same manner one
influential framework by Hadoop and map Reduce
consumes electricity or a rented house. There is no
communities for projecting analytics. [9]
need to fabricate and retain computing infrastructures
in-house. There are three types of cloud private, public In the next section we have discussed how to use Map/
and hybrid. Cloud services are mainly categorized into Reduce Model in Cloud Computing and what are the
three types: Software as a Service (SaaS), Platform as a various benefits of using this model.
Service (PaaS) and Infrastructure as a Service(IaaS)[8].
There are various benefits of Cloud, some of which II. Cloud Computing and Map/ Reduce Model
are mentioned below: The term cloud is a representation designed for the
Internet, an intellection of the Internet’s fundamental
l Self-service provisioning: It all depends on the
infrastructure that helps to spot the point at which
end users, which type of services they yearn for.
accountability moves from the user to an external
Users can revolve around multiple computing
provider. Cloud Computing is one of the most
assets for almost any type of workload on-demand.
captivating areas where lots of services are being
l Elasticity: Companies can scale up as computing utilized. The main objective of Cloud computing is
needs increase and then scale down again as to fully utilize the resources dispersed at various
demands decrease. places[10]. Map/ Reduce model which is a
l Pay per use: There is a flexibility of using the programming model, proposed by Google is used for
services and computing resources as per the need processing voluminous data sets. Map/Reduce Model
of demand of the user. This facility permits users processes around 20 petabytes of data in a single day.
to pay only for the resources and workloads they This model is gaining more popularity in cloud
utilize. computing these days[11][12]. Map/ Reduce model
is used for parallel and disseminated processing of huge
Cloud computing is most impressive technology data sets on clusters[13]. Some of the applications of
because it is cost efficient and flexible. Cloud Mining’s Map/Reduce are:
Software as Service (SaaS) is used for implementing At Google:
Web Mining, as it reduces the cost and increases the
security. Compared to all the other web mining l Index building for Google Search
techniques, Web usage mining is immeasurably used l Article clustering for Google News
and have known productive outcomes[7]. l Statistical machine translation
At Yahoo!:
C) Web Mining and Cloud Computing l Index building for Yahoo! Search
One of the mostly used technologies in Web Mining l Spam detection for Yahoo! Mail
is Web Usage Mining[1]. Web Usage mining using At Facebook:
Cloud Computing is majorly adopted these days due
l Ad optimization
to its reduced cost efficiency and flexibility[6].
l Spam detection

68 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Fig. 1 Map/ Reduce System Framework[14]

A) Advantages of Map/Reduce Framework: B) Map/ Reduce System Framework


The main advantage of the MapReduce framework is The basic architecture of Map/Reduce is mentioned
its fault tolerance, where periodic reports from each in Fig. 1[14] Map/ Reduce involve two basic steps:
node in the cluster are expected when work is
l Map: performs filtering and sorting and
completed. A task is transferred from one node to
another. If the master node notices that a node has l Reduce :performs a summary operation
been silent for a longer interval than expected, the main The input and output are in the form of key-value pairs.
node performs the reassignment process to the frozen/ After the input data is partitioned into splits of
delayed task. Some of the advantages [15] of Map/ appropriate size, the map procedure takes a series of key-
Reduce Framework are mentioned below: value pairs and generates processed key-value pairs, which
Scalability and Distributed Processing: Hadoop are passed to a particular reducer by a certain partition
platform that utilizes Map/Reduce framework is function; later after the data sorting and shuffling, the
extremely scalable. It has the capability to accumulate reduce procedure integrates the results. The scalability
and distribute large data sets across ample of servers achieved using MapReduce to implement data processing
which operates in parallel which leads to reduced cost. across a large volume of CPUs with low implementation
costs, whether on a single server or multiple machines, is
Flexibility: It operates on Structured and a smart proposition.
Unstructured data from variety of sources like email,
e-commerce, social media, etc. III. Conclusion
Fast: This framework works on Distributed Cloud Computing is definitely one of the widely used
architecture so huge amount of data ranging from technologies as it is cost efficient and flexible. Web Usage
Terabytes to petabytes. It takes minutes to process Mining uses Cloud Computing Service SaaS (Software
terabytes of data, and hours for petabytes of data. as a Service) to increase the security and reduce the cost.
In this paper we have discussed the basic Map/Reduce
Security and Authentication: Security is the major model and its advantages. The future work will focus on
area of concern in almost every field. MapReduce new ways to improve the current model so as to aim at
works with HDFS and HBase security which allows more accurate and faster approach for Web Usage mining,
only access to only authenticated users. based on Cloud Computing.

Volume 8, Issue 1 • January-June, 2017 69


IITM Journal of Management and IT

References
1. M. U. Ahmed and A. Mahmood, “Web usage mining:,” International Journal of Technology Diffusion, vol.
3, no. 3, pp. 1–12, Jul. 2012.
2. S. K. Pani, et.al L “Web Usage Mining: A Survey On Pattern Extraction From Web Logs”, International
Journal Of Instrumentation, Control & Automation (IJICA), Volume 1, Issue 1, 2011.
3. Singh, Brijendra, and Hemant Kumar Singh. “Web data mining research: a survey.” In Computational
Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on, pp. 1-10. IEEE,
2010.
4. J Vellingiri, S.Chenthur Pandian, “A Survey on Web Usage Mining”, Global Journal of Computer Science
and Technology .Volume 11 Issue 4 Version 1.0 March 2011.
5. Li, J., Xu, C., Tan, S.-B, “A Web data mining system design and research”. Computer Technology and
Development 19: pp. 55-58, 2009
6. Robert Grossman , Yunhong Gu, “Data mining using high performance data clouds: experimental studies
using sector and sphere”, Proceedings of the 14th ACM SIGKDD international conference on Knowledge
discovery and data mining, August 24-27, 2008
7. J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web usage mining,” ACM SIGKDD Explorations
Newsletter, vol. 1, no. 2, p. 12, Jan. 2000.
8. Khanna, Leena, and Anant Jaiswal. “Cloud Computing: Security Issues And Description Of Encryption
Based Algorithms To Overcome Them.” International Journal of Advanced Research in Computer Science
and Software Engineering 3 (2013): 279-283.
9. V. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White. Visualization of navigation patterns on a web
site using modelbased clustering. In In Proceedings of the sixth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining,pages 280{284, Boston, Massachusetts, 2000.
10. Zhu, W., & Lee, C. (2014). A new approach to web data mining based on cloud computing. Journal of
Computing Science and Engineering, 8(4), 181–186. doi:10.5626/jcse.2014.8.4.181
11. “MapReduce.” Wikipedia. N.p.: Wikimedia Foundation, 11 Jan. 2017. Web. 2 Jan. 2017.
12. Divestopedia, and Securities Institute. What is MapReduce? - definition from Techopedia. Techopedia.com,
2017. Web. 2 Jan. 2017.
13. Posted, and Margaret Rouse. What is MapReduce? - definition from WhatIs.com. SearchCloud Computing,
25 June 2014. Web. 2 Jan. 2017.
14. Hornung, T., Przyjaciel-Zablocki, M., & Schätzle, A. (2017). Giant data: MapReduce and Hadoop » ADMIN
magazine. Retrieved January 10, 2017, from http://www.admin-magazine.com/HPC/Articles/MapReduce-
and-Hadoop
15. Lee, K.-H., Lee, Y.-J., Choi, H., Chung, Y. D., & Moon, B. (2012). Parallel data processing with MapReduce.
ACM SIGMOD Record, 40(4), 11. doi:10.1145/2094114.2094118

70 National Conference on Emerging Trends in Information Technology


Role of Cloud computing in the Era of cyber security
Shilpa Taneja*
Vivek Vikram Singh**
Dr. Jyoti Arora***
Introduction upgrading of servers and other hardware. It diminishes
Cloud computing is taking the IT landscape further the requirement of large IT staff. It provides faster time
away from the organization. There are numerous to market and increased employee productivity. Cloud
benefits of cloud based system where software is computing provide the next generation of IT resources
managed and upgraded. Cost of hardware is very low through a platform which is scalable and easy to
as it requires only internet connection and browser, manage the local area network. The legal system is
so other hardware devices become unnecessary. Cloud running behind to adopt cloud computing. As most
computing in simplification is considered as a form of the cloud vendors donot take responsibility for data
of outsourcing. With this the major issue is lying with loss, downtime or loss of revenue caused by cyber-
most important asset for any organization i.e. attacks there is a need of taking preventive as well as
information. Most of the IT organizations are losing corrective measures for solving the problem. According
control of their technology. As the cloud computing to foster, the cloud computing market will have a
is emerging so as the cyber security trends of today are tremendous growth of $191 billion by 2020 which is
evolving at high speed pace. Prediction and detection $91 in 2015.
of attack in cyber security is the shifting of incident
Risks to cloud computing
response which is a continuous process. It generates
the requirement of a security architecture that The study has revealed the 9 cloud risks. It follows
integrates prediction, prevention, detection and high profile breaches of cloud platform evernote, adobe
response. Cloud computing in cyber security provides creative cloud, slack and lastpass. The lastpass breach
the advantages of a public utility system in aspect of is problematic as it stores all of user’s website and cloud
economic, flexibility and convince; but simultaneously service password. It is protected with password
raises the issue on security and loss of control. This especially those belonging to administrator with
paper presents the user centric measure of cyber extensive permission for a company’s critical
security and provides the comparative study on infrastructure, a critical criminal could launch a
different methodology used for cyber security. devasting attack.
1. Loss of intellectual property
Cloud computing in cyber security
Cyber criminals are benefited by gaining the access
Cloud computing provides high level of security and
on sensitive data. Skyhigh in its report says that21%
uptime than typical network. It is the simplest form
of the uploaded files share services contains responsive
of outsourcing. There are numerous benefits of cloud
data. A few services can even pose risk if the terms and
based system. Cost of hardware is lowers down and
conditions claim ownership of data uploaded to them.
on the offside software is managed and upgraded. It
saves cost and time as it controls the buying and 2. Compliance violations and regulatory actions
Most of the companies these days follow some
Shilpa Taneja* regulatory control of their information being it is about
Assistant Professor, IITM health information or student record. It becomes
Vivek Vikram Singh** requirement for the companies to know about the
Assistant Professor, IITM location of their data and about its protection. It is
Dr. Jyoti Arora*** also required to know about the person who will access
Assistant Professor, IITM it.
IITM Journal of Management and IT

3. Loss of control over end user actions Certain regulations like the EU Data Protection Directive
Employees can harm the company by downloading a require these disclosures. Following legally-mandated
report of all customer contacts, upload the data to a breach disclosures, regulators can levy fines against a
personal cloud storage service and then access that company and it’s not uncommon for consumers whose
information once he left the company and joins some data was compromised to file lawsuits.
competitor. It can be misused when companies are in
8. Increased customer churn
dark about the working moment of their employees.
It is one of the more common insider threats today. If customers even suspect that their data is not fully
protected by enterprise-grade security controls, they
4. Malware infections that unleash a targeted may take their business elsewhere to a company they
attack can trust. A growing chorus of critics is instructing
Cloud services are the vector of data exfiltration. Study consumers to avoid cloud companies who do not
reveals that a novel data exfiltration technique is that protect customer privacy.
where attackers encoded sensitive data into video files 9. Revenue losses
and uploaded them to social media. There are
numerous malware that exfiltrates sensitive data via a According to the Ponemon BYOC study, 64% of
private social media accounting the case of the Dyre respondents confirmed that their companies can’t
malware variant, cyber criminals used file sharing confirm if their employees are using their own cloud
services to deliver the malware to targets using phishing in the workplace. In order to reduce the risks of
attacks. unmanaged cloud usage, companies first need visibility
into the cloud services in use by their employees. They
5. Contractual breaches with stake holders need to understand what data is being uploaded to
Contracts among business parties often restrict how which cloud services and by whom. With this
data is used and who is authorized to access it. When information, IT teams can begin to enforce corporate
employees move restricted data into the cloud without data security, compliance, and governance policies to
authorization, the business contracts may be violated protect corporate data in the cloud. The cloud is here
and legal action could ensue. The cloud service to stay, and companies must balance the risks of cloud
maintains the right to share all data uploaded to the services with the clear benefits they bring.
service with third parties in its terms and conditions, In this era of digitization, data security is paramount
thereby breaching a confidentiality agreement the to every business. In past, on-premise servers were the
company made with a business partner. business technology model, but now there are more
choices. For the last several years, a debate has flowed
6. Diminished trust of customer through businesses. How will cloud computing affect
Data breaches results in diminished trust of customers. them? Should they adopt a public cloud approach,
The biggest breach reported was that where cyber opt for private cloud, or stick with their on-premise
criminals stole over 40 million customer credit and servers? The use of cloud computing is steadily rising.
debit card numbers from different Target. The breach In fact, a recent study has shown that cloud services
led customers to stay away from Target stores, and led are set to reach over $130 billion by 2017. Before
to a loss of business for the company, which ultimately making any decisions, it’s important to think about
impacted the company’s revenue. how this shift towards cloud computing will affect
cyber security for your business.
7. Data breach requiring disclosure and
notification to victims Measures or models of cloud computing in
If sensitive or regulated data is put in the cloud and a cyber security
breach occurs, the company may be required to disclose Boehm et al. poised that all dilemmas that arise in
the breach and send notifications to potential victims. software engineering are of an economic nature rather

72 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

than a technical nature, and that all decisions ought third party data control, which arises in cloud
to be modeled in economic terms: maximizing benefit; computing because user data is managed by the cloud
minimizing cost and risk. Their work is perfectly provider and may potentially be exposed to malicious
compatible with the philosophy of value-based third parties. They also discuss strategies that maybe
software engineering, as it models system security not used to mitigate these security concerns.
by an arbitrary abstract scale but rather by an economic
Center for Internet Security (2009)used mean time to
function (MFC), quantified in monetary terms (dollars
incident discovery, incident rate, mean time between
per hour), in such a way as to enable rational decision
security incidents, mean time to incident recovery,
making.
vulnerability scan coverage, percentage of systems
Brunette and Mogull (2009) discuss the promise and without known severe vulnerabilities, mean time to
perils of cloud computing, and single out security as mitigate vulnerabilities, number of known
one of the main concerns of this new computing vulnerability instances, patch policy compliance, mean
paradigm. They have cataloged and classified the types time to patch and proposed a set of MTTF-like metrics
of security threat that arise in cloud computing. Their to capture the concept of cyber security.
work can be used to complement and provides a
comprehensive catalog of security threats that are Benefits of Cyber security in Cloud Computing
classified according to their type. Cyber security has numerous benefits in cloud based
applications like improvement in gathering and threat
Black et al. (2009) discussed about categorization of
model, enhanced collaboration, reduction of lag time
metrics and measures and among different type of
between detection and remediation. With the increase
metrics. These metrics can be used as standard by
in cyber-attacks in era of cloud computing
organization to compare between current situations
organization need to take precautions and adequate
and expected one. This provides the organization
measures to deal with threats. The four pillars of cloud
facility to raise the level in order to meet the goal.
based cyber security comprise updated Technologies,
Jonsson and Pirzadeh (2011) proposed a framework extremely protected platforms, skilled manpower and
to measure security by regrouping the security and high bandwidth connectivity. Learning collection can
dependability attributes on the basis of already existing support real time integrated security information.
conceptual model applicable on application areas Usage of cyber security ensures that security while
varying from small to large scale organization. They maintaining sensitive data. The concept of out-of-band
discussed how different matrices are related to each channels can be used to deal with cyber-attacks. 41%
other. They categorize the security metric into of business employ infrastructure-as-a-service (IaaS)
protective and behavior metrics. Choice of measures for mission-critical workloads. Cloud-based cyber
affect the results and accuracy of a metric. security solution developed by PwC and Google can
Carlin and Curran (2011) founded that using cloud provide advanced detection, analysis, collective
computing companies can decrease the budget by learning, high performance, scalability in analytic
18%. The findings comprise mainly three services processes to enable an advanced security operations
Software-as-a-service (SaaS), Platform-as-a-service capability (ASOC).This will create honeypots and
(PaaS) and Infrastructure-as-a-service (IaaS). Three dummies for maintaining connection to end point for
kinds of model public private and hybrid, encryption analysis and learning.
is not a way to fully protect the data. Conclusion
Chow et al. (2009) discusses the three types of security This paper discusses about numerous benefits of cloud
concern raised in cloud computing- provider-related based system and various risks related to it. We also
vulnerabilities, which represent traditional security discussed the various models which talks about how
concerns; availability, which arises in any shared to maximize the benefits, minimizing cost and risks.
system, and most especially in cloud computing; and On the basis of classification of metrics and measures

Volume 8, Issue 1 • January-June, 2017 73


IITM Journal of Management and IT

of cloud computing we can facilitate organization to concerns. At last we can say that usage of cyber security
raise the efficiency and to meet their goals. Various ensures security while maintaining sensitive data as
strategies maybe used to mitigate these security well.

References
1. Rabia, L., Jouini, M., Aissa, A., Mili, A., 2013. A cybersecurity model in cloud computing environments.
Journal of King Saud University –Computer and Information Sciences.
2. Boehme, R., Nowey, T., 2008. Economic security metrics. In: Irene, E.,Felix, F., Ralf, R. (Eds.), Dependability
Metrics, 4909, pp. 176–187.
3. Brunette, G., Mogull, R., 2009. Security guidance for critical areas offocus in cloud computing V 1.2.
Cloud Security Alliance.
4. Black, P.E., Scarfone, K., Souppaya, M., 2009. Cyber Security Metricsand Measures. Wiley Handbook of
Science and Technology forHomeland Security.
5. Jonsson, E., Pirzadeh, L., 2011. A framework for security metricsbased on operational system attributes. In:
International Workshopon Security Measurements and Metrics – MetriSec2011,Bannf, Alberta, Canada.
6. Carlin, S., Curran, K., 2011. Cloud computing security. InternationalJournal of Ambient Computing and
Intelligence.
7. Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuok, R.,Molina, J., 2009. Controlling data in
the cloud: outsourcingcomputation without outsourcing control. In: ACM Workshop onCloud computing
Security (CCSW).
8. The Center for Internet Security, The CIS Security Metrics v1.0.0, 2009. <https://www.cisecurity.org/tools2/
metrics/CIS_Security_Metrics_v1.0.0.pdf>.

74 National Conference on Emerging Trends in Information Technology


Cryptography and its Desirable Properties in
terms of different algorithm
Mukta Sharma*
Dr. Jyoti Batra Arora**
Abstract
The proliferation of Internet has revolutionized the world. The world has become a smaller place to
communicate. Especially in India, after demonetization Indian government is encouraging both customer
and buyer to transact online (go cashless). Electronic payment is a new trend to transact online as any
e-commerce environment needs a payment system. Payment system requires an intricate design which
ensures payment security, transaction privacy, system integrity, customer’s authentication, and purchaser’s
promise to pay and supplier promise to sell a high-quality product. There are several e-payments
systems like paying via Plastic money (credit/debit/smart card), e-wallet, e-cash, UPI, Net banking,
Aadhaar Card, etc. Electronic payment is made online without face to face interaction, which leads to
electronic frauds. Therefore, the emphasis is given on security methods opted by banks especially on
cryptography.
This paper begins with the primary security threats, followed by the prevention plan. It highlights the
cryptography and discusses the desirable property to check the strength of encryption algorithm.
Keywords: Avalanche, Cryptography, Decryption, Encryption, Cipher Text, DES, Plain Text, Symmetric
Cryptography

I. Introduction Customers are reluctant to share their demography


With the technological advancement, everyone is using especially financial details online because of the security
the Internet on their smart phones, laptops, desktops, concerns. The need for the safety means to prevent
iPads, etc. Users are transacting funds online. E- unwanted access to confidential information.
banking is growing phenomenally well. There are Cybercriminals steal sensitive data and misuse it for
numerous advantages of using online banking from their benefits.
both customers and bankers’ perspective such as cost-
II. Security Threats
effective, paperless, immediate transfer of funds,
geographical convenience, 24*7, etc. Several issues in Electronic transactions have been facing various
internet banking are security, trust, authentication, obstacles with context to security. Crimes like hacking,
Non-repudiation, privacy and availability. Since the cracking, phishing; DOS, etc. are among few attacks
inception of e-banking security is and always will or threats for the safety. Following attacks breach the
remain a matter of great concern. After the security:
development of e-banking, the bank needs to ensure a) Cracking / Hacking- It defined as the unauthorized
payment security, transactions privacy, system integrity, access to someone else information.
customer authentication as it is a payment system
online. b) Denial of Service attack- DoS floods the computer
with more requests than it can handle causing the
Every coin has two facets with the internet having
web server to crash. Denying authorized users the
numerous advantages it has significant security threats.
service offered by the resource. Distributed Denial
Mukta Sharma* of Service (DDoS) attack wherein the perpetrators
Research Scholar, TMU are many and are geographically widespread.
Controlling such attacks is tough. The attack is
Dr. Jyoti Batra Arora**
initiated by sending excessive demands to the
Assistant Professor, IITM
IITM Journal of Management and IT

victim’s computer(s), exceeding the limit that the server, over which any amount of data can be sent
victim’s servers can support and making the server’s securely. All browsers support SSL, and many Web
crash. sites use the protocol to obtain confidential user
information, such as credit card numbers.
c) E-mail spoofing- A spoofed e-mail is one, which
misrepresents its origin. It shows its origin to be b) HTTPS- Hyper Text Transfer Protocol combined
different from which it originates. with SSL to ensure security. S-HTTP is designed
to transmit individual messages securely. SSL and
d) Phishing- It is another criminally fraudulent
S- HTTP, can be seen as complementary rather
process, in which a fake website resembling the
than competing technologies. Both protocols have
original site is designed. Phishing is an attempt to
been approved by the Internet Engineering Task
acquire sensitive information such as usernames,
Force (IETF) as a standard.
passwords and credit card details, by masquerading
as a trustworthy entity in an electronic c) Firewall- Firewalls can be implemented in both
communication. hardware and software, or a combination of both
to prevent unauthorized access. Firewalls are
e) Salami Attack- is an attack which is difficult to
frequently used to prevent unauthorized Internet
detect and trace, also known as penny shaving.
users from accessing private networks connected
The fraudulent practice of stealing money
to the Internet, especially intranets. All messages
repeatedly in small quantities, usually by taking
are entering or leaving the intranet pass through
advantage of rounding to the nearest cent (or other
the firewall, which examines each message and
monetary units) in financial transactions.
blocks those messages that do not meet the
f) Virus / Worm Attacks – Malicious Programs are specified security criteria.
dangerous may it be Viruses, worms, logic bombs,
d) SET- Secure Electronic Transaction is a standard
trap doors, Trojan Horse, etc. As they are programs
developed jointly by Visa International,
written to infect and harm the data by altering or
MasterCard, and other companies. The SET
deleting the information, or by making a backdoor
protocol uses digital certificates to protect credit
entry for unauthorized person.
card transactions that are conducted over the
g) Forgery- Counterfeit currency notes, postage, and Internet. The SET standard is a significant step
revenue stamps, mark sheets, etc. can be forged towards securing Internet transactions, paving the
using sophisticated computers, printers, and way for more merchants, financial institutions,
scanners. and consumers to participate in electronic
commerce.
III. Security Measures
e) PGP- Pretty Good Privacy provides confidentiality
Security has become a necessity, and need to keep data
by encrypting messages to be transmitted or data
safe, achieve it and many techniques are available. By
files to be stored using an encryption algorithm.
using these techniques, one can ensure the
PGP uses the “public key” encryption approach -
confidentiality, authentication, privacy and integrity
messages are encrypted using the publicly available
of their information. Information can be of any type;
key, but can only be deciphered by the intended
may it be in the form of text, image, audio or video.
recipient via the private key.
The need for security means to prevent unwanted
access to confidential information, this can be attained f) Anti-Virus- To secure PC, laptop, smartphone
by the following ways:- from any malicious attack the user must install a
good anti- virus and always update the anti-virus
a) SSL- Secure Socket Layer is a protocol developed
software fortnightly for better security.
by Netscape. It was designed so that sensitive data
can be transmitted safely via the Internet. SSL g) Steganography- It is the process of hiding a secret
creates a secure connection between a client and a message with an ordinary message. The original

76 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Figure 1: Symmetric Key Encryption Algorithm

user will view the standard message and will fail There are two basic types of cryptosystems such as
to identify that the message contains a hidden or symmetric cryptosystems and asymmetric
encrypted message. The secret message can be cryptosystems. Symmetric cryptography is a concept
extracted by only the authentic users who are aware in which both sender and receiver shares the same key
of the hidden message beneath the ordinary file. for encryption and decryption process. In contrast to
Steganography is now gaining popularity among symmetric cryptography, asymmetric cryptography
the masses because of ease of use and abundant uses a pair of keys for encryption and decryption
tools available. transformations. The public key is used to encrypt
h) Cryptography- It is the “scrambling” of data done data, and the private key is used to decrypt the message.
using some mathematical calculations and only 1) Symmetric Key Encryption Algorithms
authentic user with a key and algorithm can
“unscramble” it. It allows secure transmission of Symmetric Key is also known as a private key or
private information over insecure channels. conventional key; shares the unique key for
transmitting the data safely. The symmetric key was
IV. Cryptography the only way of enciphering before the 1970s.
Cryptology is the study of reading, writing, and Symmetric Key Encryption can be performed using
breaking of codes. It comprises of cryptography (secret Block Cipher or Stream Cipher.
writing) and cryptanalysis (breaking code). Stream Cipher takes one bit or one byte as an input,
Cryptography is an art of mangling information into process it and then convert it into 1bit or 1-byte cipher-
apparent incomprehensibility in a way permitting a text. Like RC4 is a stream cipher used in every mobile
secret method of unscrambling [11]. Human has a phone.
requirement to share private information with only
intended recipients. Cryptography gives a solution to Block Cipher works with a single block or chunks of
this need. data or message instead of a single stream, character,
or byte. Block ciphers mean that the encryption of
Cryptographic algorithms play a significant role in the
any plaintext bit in a given block depends on every
field of network security. To perform cryptography,
other plaintext bit in the same block. Like DES, 3DES
one requires the secure algorithm which helps the
have a block size of 64 bits (8bytes), and AES has a
conversion efficiently, securely if carried out with a
block size of 128 bits (16 bytes).
key. Encryption is the way to transform a message so
that only the sender and recipient can read, see or 2) Need for Cryptography
understand it. The mechanism is based on the use of
It has given a platform which can ensure not only
mathematical procedures to scramble data so that it is
confidentiality but also integrity, availability, and non-
tough for anyone else to recover the original message.
repudiation of messages/ information. Symmetric Key

Volume 8, Issue 1 • January-June, 2017 77


IITM Journal of Management and IT

encryption algorithm focuses on privacy & reverse order for decryption [2] [21].
confidentiality of data.
d) AES- AES is also a symmetric key algorithm based
3) Symmetric Key Block Cipher Algorithm on the substitution–permutation Network
[4][7][23].
The paper focuses on Symmetric Key block ciphers.
DES, 3DES, AES, IDEA, Blowfish are among most AES use a 128-bit block as plain text, which is
used and popular algorithms of Block ciphers. organized as 4*4 bytes array also called as State
and is processed in several rounds. It has variable
a) DES- DES is based on Feistel network. It takes
Key length 128, 192 or 256-bit keys. Rounds are
64 bit Plain Text as an input and 64 bit Cipher
variable 10, 12, or 14 depends on the key length
Text comes as an output. Initially a 64 bit Key is
(Default # of Rounds = key length/32 + 6). For
sent which is later converted to 56 bits (by
128 bit key, number of rounds are 10; 192 bit
removing every 8th bit). Later using 16 iterations
key, 12 rounds and for 256 bit key, 14 rounds. It
with permutation, expansion, substitution,
only contains a single S- box (which takes 8bits
transpositions and basic mathematical functions
input, and give 8 bits output) which consecutively
encryption is performed and decryption is the
work 16 time. Originally the cipher text block
reverse process of encryption.
was also variable, but later it was fixed to 128 bits.
b) 3DES – Triple DES is an enhancement of Data
The Encryption and decryption process consists
Encryption Standard. To make it more secure the
of 4 different transformations applied
algorithm execute three times with three different
consecutively over the data block bits, in a fixed
keys and 16*3=48 rounds; and a key length of
number of iterations, called rounds. The
168 bits (56*3) [22]. The 3DES encryption
decryption process is direct inverse of the
algorithm works in a sequence Encrypt-Decrypt-
encryption process. Hence the last round values
Encrypt (EDE). The decryption process is just
of both the data and key are first round inputs for
reverse of Encryption process (Decrypt- Encrypt-
the decryption process and follows in decreasing
Decrypt). 3DES is more complicated and
order. AES is extremely fast and compact cipher.
designed to protect data against different attacks.
For implementers its symmetric and parallel
3DES has the advantage of reliability and a longer
structure provides great and an effective resistance
key length that eliminates many attacks like brute
against cryptanalytic attacks. The larger block size
force. 3DES higher security was approved by the
prevents birthday attacks and large key size
U.S. Government. Triple DES has one big
prevents brute force attacks
limitation; it is much slower than other block
encryption methods. e) BlowFish- It is a symmetric block cipher and
works on of 64-bit block size. Key length is
c) IDEA-International Data Encryption Algorithm
variable from 32 bits to 448 bits. It has16 rounds
is another symmetric key block cipher algorithm
and is based on Feistel network. It has a simple
developed at ETH in Zurich, Switzerland. It is
structure and it’s easy to implement. It encrypts
based on substitution-permutation structure. It
data on 32 bit microprocessors at a rate of 18 clock
is a block cipher that uses a 64 bit plain text,
cycles per byte so much faster than AES, DES, and
divided equally into 16 bits each (16*4=64); with
IDEA. Since the key size is large it is complex to
8 and s half rounds and a Key Length of 128-bits.
break the code in the blowfish algorithm. It is
For each round 6 sub keys are required 4 before
vulnerable to all the attacks except the weak key
the round and 2 within the round (8*6= 48 sub
class attack. It is unpatented and royalty-free. It
keys+ 4 sub keys are used after last or eighth round
requires less than 5K of memory to run Blowfish
that makes total 52 sub- keys). IDEA does not
[6] [18].
use S-boxes. IDEA uses the same algorithm in a

78 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Volume 8, Issue 1 • January-June, 2017 79


IITM Journal of Management and IT

V. Algorithm Security Avalanche Effect= Number of flipped bits in


The two essential properties to check the complexity of ciphered text/ Number of bits in ciphered text.
any algorithm is time and space. According to Kerckhoff,  Completeness -According to encryption, this is a
the cryptanalyst knows the complete process of necessary property. Completeness means that each
encryption and decryption except for the value of the bit of the cipher text/ output block needs to
secret key. It implies that the security of a secret-key cipher depend on each bit of the plaintext [15]. Change
system rests entirely on the secret key [17]. Therefore, in one bit of the input (plaintext) will bring change
for better security in symmetric encryption one should in every bit of the output (Ciphertext). It has an
keep the following criteria’s in mind: average of 50% probability of changing.
 Key should be exchanged very safely because if Let us imagine an eight-byte plain text, and there
the key is known the entire algorithm is is a change in the last byte, it would only have
compromised. affected the 8th byte of the Ciphertext. An attacker
 A secure encryption algorithm is robust & resilient can very easily guess 256 different plaintext-
against a potential breach using combinations of Ciphertext pairs. Finding out 256 plaintext-
cipher texts & key [14] [20]. Ciphertext pairs is not hard at all in the internet
world, and standard protocols have standard
1) Desirable Properties of Block Cipher headers and commands (e.g. “get,” “put,” “mail
The strength of a block cipher can be tested through from:,” etc.) which the attacker can safely guess.
these properties like Avalanche, Completeness and If the cipher has this property, the attacker need
Statistical Independence. to collect 264 (~1020) plaintext-Ciphertext pairs
 Avalanche Effect- It is an excellent property of to crack the cipher in this way.
cryptographic algorithm also stated as Butterfly  Statistical independence that input and output
effect. It means that by changing only one bit should appear to be statistically independent.
(small change) of the plain text or the key should
produce a radical shift in the final output. If the VI. Conclusion
final output is modified or flipped with 50% of Cryptography is a good way to protect data from
bits, then it is said to be strict Avalanche effect. getting breached. Symmetric cryptography ensures
SAC is harder to perform an analysis on cipher confidentiality of data. Asymmetric cryptography takes
text when trying to come up with an attack [5] care of authenticity, integrity, non-repudiation of data.
[8] [17]. It’s easy to impose conditions on Boolean As can be seen in the above table of comparative
functions so that they satisfy certain avalanche analysis, where all the algorithms are built on these
criteria, but constructing them is a harder task. three desired properties. The percentages may vary but
Avalanche can be categorized as follows: they all fulfil the basic criteria of an encryption
l The strict avalanche criteria (SAC) guarantee algorithm. While building the understanding about
that exactly half of the output bits change the encryption algorithm and designing a new
when one input bit changes [17]. algorithm anybody can establish the significant role
of thee building blocks.
l The bit independence criterion (BIC) states
that output bits j and k should change These three important properties decide the strength
independently when any single input bit i is and resistance of the algorithm.
inverted, for all i, j and k[17].
References
1. Daemen, J., Govaerts, R. and Vandewalle, J. (1998).Weak Keys for IDEA. Springer-Verlag.
2. Engelfriet, A. (2012). The DES encryption algorithm. Available at www.iusmentis.com/technology/encryption/
des.
80 National Conference on Emerging Trends in Information Technology
IITM Journal of Management and IT

3. Forouzan, B.A., &Mukhopadhyay, D. (2010). Cryptography and Network Security. Tata McGraw-Hill, New
Delhi, India
4. Gatliff, B. (2003). Encrypting data with the Blowfish algorithm. Available at http://www.design-reuse.com/
articles/5922/ encrypting-data-with-the-blowfish-algorithm.
5. Kak, A. (2015). Computer and Network Security- AES: The Advanced Encryption Standard.Retrieved from
https://engineering.purdue.edu/kak/compsec/NewLectures/Lecture8.pdf
6. Koukou, Y.M., Othman, S.H., Nkiama, M. M. S. H. (2016). Comparative Study of AES, Blowfish, CAST-
128 and DES Encryption Algorithm. IOSR Journal of Engineering, 06(06), pp. 1-7.
7. Kumar, A., Tiwari, N. (2012).Effective Implementation and Avalanche Effect of AES. International Journal of
Security, Privacy and Trust Management (IJSPTM).
8. Mahindrakar, M.S. (2014). Evaluation of Blowfish Algorithm based on Avalanche Effect. International Journal
of Innovations in Engineering and Technology, 1(4), pp. 99-103.
9. Menezes, A., Van, P., Orschot, O. and Vanstone, S. (1996). Handbook of Applied Cryptography, CRC Press.
10. Mollin, R.A. (2006). An Introduction to Cryptography. Second Edition, CRC Press
11. National Bureau of Standards (1997). Data Encryption Standard. FIPS Publication 46.
12. Paar, C., Pelzl, J. (2010). Understanding Cryptography: A Textbook for Students and Practitioners’. Springer,
XVIII, 372.
13. Ramanujam, S., &Karuppiah, M. (2011). Designing an algorithm with high Avalanche Effect. International
Journal of Computer Science and Network Security. 11(1).
14. Saeed, F., & Rashid, M. (2010). Integrating Classical Encryption with Modern Technique. International Journal
of Computer Science and Network Security, 10(5).
15. Schneier B. (1994). Applied Cryptography. John Wiley& Sons Publication, New York.
16. Schneier, B. (1994).Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish), Fast Software
Encryption, Cambridge Security Workshop Proceedings, Springer-Verlag, 1994, Available at http://
www.schneier.com/paper-blowfish-fse.html
17. Shailaja, S. & Krishnamurthy, G.N. (2014). Comparison of Blowfish and Cast-128 Algorithms Using Encryption
Quality, Key Sensitivity and Correlation Coefficient Analysis. American Journal of Engineering Research, 7(3),
pp. 161-166.
18. Stallings, W. (2011). Cryptography and Network Security: Principles and Practice. Pearson Education, Prentice
Hall: USA
19. Thaduri, M., Yoo, S. and Gaede, R. (2004). An Efficient Implementation of IDEA encryption algorithm using
VHDL. Elsevier
20. Tropical Software, Triple DES Encryption, Available at http://www.tropsoft.com/strongenc/des3.htm,
21. Wagner, R. N. The Laws of Cryptography. Retrieved From http://www.cs.utsa.edu/~wagner/laws/

Volume 8, Issue 1 • January-June, 2017 81


A Review: RSA and AES Algorithm
Ashutosh Gupta*
Sheetal Kaushik**
Abstract
ARPANET to today’s Internet, the amount of data and information increased to several thousand times.
The amount of security problems are also increased with this development. In this paper we aim to
review the working of two algorithms, RSA and AES to secure our data over the internet and communication
channels. One of these algorithms is symmetric which is developed in early days of modern cryptography
and other one is asymmetric, which is advance and still trustworthy.
Keywords: Asymmetric, symmetric, RSA, AES, Cryptography, Encryption.

I. Introduction logically and mathematically they are linked [1][4]


Cryptography Practice of the enciphering and [5].
deciphering of messages in secret code in order to
A. Data Encryption
render them unintelligible to all but the intended
receiver. Cryptography may also refer to the art of This is the process of scrambling, stored or transmitted
cryptanalysis, by which cryptographic codes are broken information so that it is meaningless until it is
[1].Information is the most important thing for a unscrambled by the intended recipient. This is also
company or a nation to be secure after human resource. known as Ciphering of data. With increasing data and
While most of the information now a days are in technology advancement, the significance of data
Digital form, they are equally in that much unsecured encryption is also increasing not only for highly
Environment[2].So, techniques like cryptography help diplomatic and military uses but also from life of
in making the environment and the path of ordinary men’s to the high value money and
information travelling more secure and trustworthy. information transfer of big multinationals[6].
A good encryption algorithm must provide The history of the cryptography can be traced back
confidentiality, integrity, non- repudiation, and into hieroglyphs of early Egyptian civilization (c.1900
Authentication [3]. B.C.).Ciphering is always considered as the essence of
Cryptography can be further divided in two major diplomatic and military secrecy. There is several other
types: Secret-Key Cryptography and public key example of cryptography even in the era of Holy Bible
cryptography.Secreate key encryption uses same key which replete with examples of ciphering [7].
for encryption and decription.This type of encryption Now a day’s Encryption standards are increased so high
easier and faster but equally less secure. While on the that Several Government even talking about banning
other hand Public key encryption is more secure and of strong encryption over certain level. The reason
most preferable now days. In this encryption key for behind is the time consumption and work involved
encryption and decryption both are different but even in simple day to day federal cases. For example,
Ashutosh Gupta* the United Kingdom could pass a law that bans
BCA-II Year encryption stronger than 64-bit keys, knowing its
Institute of Information Technology and intelligence agency has the resources to crack any form
Management of legal encryption in the country [5].
Sheetal Kaushik** The early cryptography is done with the standard
IT Department algorithm of 64 bit key known as DES or Data
Institute of Information Technology and Encryption Algorithm given by FIPS (Federal
Management Information Processing Standard) [3], [8].
IITM Journal of Management and IT

DES algorithm is further replaced by Rijndael 10 rounds of encryption is performed for 128 bit key,
algorithm and named as Advance encryption algorithm 12 rounds for 192 bit keys, and 14 rounds for 256 bit
or AES [8], [9].AES has more flexible key strength keys. Following Algorithm Encrypt the data [11].
that may be help in future manipulation for betterment Step 1:- Input a plaintext of 128 bits of block cipher
of it. which will be negotiated as 16 bytes.
RSA was named on their inventor names in 1977, Ron Step 2: - Add Round Key: - each byte is integrated
Rivest, Adi Shamir and Len Adleman[10].This with a block of the round key using bitwise XOR.
algorithm is asymmetric and still in use. RSA
algorithms have dual benefit as it used for data Step 3:- Byte Substitution: - the 16 input bytes are
encryption as well as digital signatures. substituted by examining S- box. The result will be a
4x4 matrix.
II. AES Step 4:- Shift row: - Every row of 4x4 matrices will be
Now a Days Security is Equally essential as Speed of shifted to left. Entry which will be left placed on the
data communication and Advance Encryption right side of row.
standard has best suited for it as it provide speed as Step 5:- Mix Columns: - Every column of four bytes
well as increase security with hardware. Because of its will be altered by applying a distinctive mathematical
dual base which consists of hardware as well as software function (Galois Field).
this System is more advance and secure than basic DES
[8]. Step 6:- Add Round Key: - The 16 bytes of matrix
will be contemplated as 128 bits and will be XORed
AES also advance in the sense of its structure as it uses to 128 bits of the round key.
key in bytes instead of bits whereas in DES number of
rounds for encryption of data is not fixed, it depends Step 7:- This 128 bits will be taken as 16 bytes and
on the size of the plain text it has to encrypt. If size of similar rounds will be performed.
text is 128 bit it will treated as 16 Bytes and these 16 Step 8:- At the 10th round which will be last round a
Bytes then arranged in form of 4x4 matrixes. In AES ciphered text will be produced.

Fig.1 Flow Chart of AES Encryption.

Volume 8, Issue 1 • January-June, 2017 83


IITM Journal of Management and IT

III. RSA Step 4: Form the public key


RSA is a public key algorithm, means it uses two (n, e) form RSA public Key
different keys one of which must be kept private know Step 5: Generate the private key (Number d is the
as private key and other is public key which is not inverse of e modulo (p - 1) (q – 1).This means that d
essentially needed to be secret. Public Key from these is the number less than (p - 1) (q - 1) such that when
two keys is usually used for encryption and private multiplied by e, it is equal to 1 modulo (p - 1) (q - 1))
key is used for decryption [14].The RSA Encryption
method is Explained Below: ed = 1 mod (p H 1)(q H 1)
RSA security system depends on two different
Equations functions.RSA is one of the most secure Cryptography
Step 1: Select Two Large Prime number (Such that algorithm, whose difficulty is actually based on
Number does not exceed printable ASCII Character). practical factoring of very large prime numbers
Select Two Large Prime number p and q [15][16].
Step 2: Generate the RSA modulus (The answer of IV. Comparison
multiplication will be considered the Key Length) In the below table the comparison is done between
n=p*q(Public Key) RSA and AES on the base of the keysize,block size,
Step 3: Generate Random Key using Euler function. speed , key used in encryption and decryption, type
e= (p-1) *(q-1) of algorithm, round of encryption and decryption.[17]

FACTOR AES RSA


DEVELOPED 2000 1978
KEY SIZE 128,192,256 bits >1024 bits
BLOCK SIZE 128 Bits Minimum 512 bits
ENCRYPTION AND SAME DIFFERENT
DECRYPTION
ALGORITHM SYMMETRIC ASYMMETRIC
SPEED FASTER SLOWER
ROUNDS 10/12/14 1

V. Conclusion the RSA is more secured than AES, because of its


Encryption of Data plays very vital role in today’s time. longer key size and different keys for encryption and
Our research work served the famous AES and RSA decryption.
algorithm. Based on research work used in this survey, Our future work will be focused on the study of other
we can conclude that RSA takes more time for algorithm including Hyper Image Encryption
encryption compared to AES. We also concluded that Algorithm. Our focus will also be on the path of
transferring the private key of Asymmetric Encryption.

References
1. www.britannica.com/topic/cryptography.
2. ENISA’s Opinion Paper on Encryption December 2016.
3. https://www.tutorialspoint.com/cryptography/data_encryption_standard. htm.
4. https://www.tutorialspoint.com/cryptography/cryptosystems.htm.

84 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

5. http://www2.itif.org/2016-unlocking-encryption.pdf.
6. http://www.infoplease.com/encyclopedia/science/data-encryption.html.
7. http://www.infoplease.com/encyclopedia/society/cryptography.html.
8. http://www.ijarcce.com/upload/2016/march-16/IJARCCE%20227.pdf0.
9. https://www.britannica.com/topic/AES#ref1095337.
10. http://www.di-mgt.com.au/rsa_alg.html.
11. https://www.irjet.net/archives/V3/i10/IRJET-V3I10126.pdf.
12. ahttps://en + b =.wikipedia c. .org/wiki/Advanced(1) (1) _Encryption_Standard.
13. https://www.tutorialspoint.com/cryptography/advanced_encryption_stan dard.html.
14. A Novel Approach to Enhance the Security Dimension of RSA Algorithm Using Bijective Function.
15. http://paper.ijcsns.org/07_book/201608/20160809.pdf.
16. Research and Implementation of RSA Algorithm for Encryption and Decryption.
17. https://globaljournals.org/GJCST_Volume13/4-A-Study-of-Encryption-Algorithms.pdf

Volume 8, Issue 1 • January-June, 2017 85


Evolution of new version of internet protocol (IPv6) :
Replacement of IPv4
Nargish Gupta*
Sumit Gupta**
Munna Pandey***
Abstract
Taking into consideration today’s scenario internet is becoming a vital part of modern life. The basic
functioning of Internet is based on Internet Protocol (IP). As we were using IPv4 but it has resulted in an
unwanted growth issue. The reason behind its detonation is the brisk use of network addresses which
leads to the decrement in the performance for routing. So in the coming years the unease of the internet
will not decrease and the increase cannot be imagined with so much advancement in the technology.
So to achieve this evolution in Internet there is a need for transition from IPv4 to IPv6. IPv4 address
spaces has finally drained and IANA (Internet Assigned Numbers Authority) is left with no choice as to
move towards the transition from IPv4 to IPv6. This paper reevaluates the main issue and the complications
in IPv4- IPv6 transition and proposes the principles of tunneling and translation techniques. In this we
surveys the mainstream tunneling and translation mechanisms, it new mechanism, techniques, pros
and cons and appropriateness.
Keywords: Internet Protocol, IPv4, IPv6, Routing.

I. Introduction Providers), as per them by the end of 2012, they will


Since the very early stage of the Internet IPv4 [1] has use up all the IPv4 addresses. Besides, the prefix de-
been used as the network layer protocol. No one has aggregation caused by address block subdivision,
thought at the designing time of the protocol that the multihoming and traffic engineering has caused a burst
span of IPv4 Internet can be so bigger [2]. It was in Global IPv4 RIB (Routing Information Base) and
actually unexpected. The set of obstacles which are FIB (Forwarding Information Base). Scalability
currently coming in IPv4 Internet is the exhaustion, problem is the biggest issue with which Internet is
routing scalability, and broken end-to-end property. suffering. The basic end-to-end property all over the
IANA (Internet Assigned Numbers Authority) had Internet has been broken down with the ample use of
been depleted with IPv4 address pool in Feb 2011, so NAT.
as per the status we will soon be exhaust their IPv4 II. Challenges of IPv4
address space [3]. On the other hand, the technology
Since the advancement of technology our life style is
is growing as fastest as possible especially the number
become easier but there are various things under
of mobile users and it will continue. Thus resulting in
consideration. Now the new technology immersed
the excessive demand for new IP address allocation
which is internet of things means thing will
which is difficult to gratify with IPv4. ChinaTelecom
communicate with each other. Due to this every device
is among the biggest telecom ISPs (Internet Service
needs a unique address to identify uniquely this leads
to various challenges on existing IP protocol i.e. IPv4
Nargish Gupta* listed below:
IITM Janakpuri, New Delhi
Sumit Gupta** l IP Address Depletion:
LNCT Bhopal In IPv4 limited number of unique public address are
Munna Pandey*** available (i.e. 4 billion) and IP enabled device are
IITM Jankpuri, New Delhi increases day by day. So every device needs a unique
IITM Journal of Management and IT

IP address which immerses the some extra IP address number of addresses which is sufficient to present as
especially for always on devices. IPv4 are not able to well as future scenario. IPv6 can allot 340 undecillion
fulfill the IP demands. addresses to unique devices which is sufficient to
handle present traffic.
l Internet Routing Table Expansion:
l Improved Packet Handling:
Routing table is used by routers to make best path so
network and entities connected to internet increases Ipv6 packet has eliminated the un required field which
so does the number of network routes. These IPv4 is not required from IPv4 and include required fields
routes consume a great deal of memory and processor which is not present in the IPv4 header. IPv6 simplified
resources on internet routers. Which will increases the with fewer fields this improve packet handling by
complexity of the network as well as takes lots of space. intermediate routers and also provides support for
extensions and options for increased scalability.
l Lack of end to end Connectivity:
l Eliminates need of NAT:
Due to better use of IP address IANA introduce public
As mention earlier IP version4 does not have sufficient
and private addressing. By using private address multiple
Ip addresses. So this problem is solved by Public and
devices are able to connect through the internet by single
Private addresses. But use of private addresses required
IP address. But it needs translation between public address
NATing which is an overhead. In IPv6 NATing
to private ip address as well as private to public IP address.
concept is eliminated because of large number of IPv6
Network Address Translation (NAT) is an technology
addresses.
commonly implemented within IPv4 network NAT
provide a way for multiple devices to share a single public l Integrated Security:
IP address. This is an overhead which leads to increase IPv4 is the first IP version which is mostly focuses on
complexity of the network and increases the possibility the how we can transfer data from two or more devices.
of error [4]. This requirement was successfully accomplished by
IPv4. But as an technology increases chance to theft
III. Improvement that IPv6 Provides
also increases. Ipv4 does not provide any security fields.
In early 1990’s the internet engineering task By keeping in a mind IPv6 has integrated security. It
force(IETF) grew concerned about the issues with IPv4 provides authentication and privacy capabilities.
and began to look for replacement this activity leads
to development of IP version 6. IPv6 overcome the IV. Internet Protocol Version 6 (IPv6)
limitation of IPv4 some are listed below: On Monday Jan 31 2011 IANA allocated the last two
/8 IPv4 address block to Regional internet registries
l Internet address space: (RIR) so IANA implement IPv6. The packet format
It increases address space 128 bit long instead of 32bit of IPv6 kept simple by adding fewer fields. All Fields
which is in IPv4. Due to increases the size it has more of IPv6 are described in the packet format in figure 1.

Version Traffic Class Flow Control


(4bit) (8 bit) (20 bit)
Payload Length Next Header Hop Limit
(16 bit) (8 bit) (8 bit)
Source IP Address
(128 bit)
Destination IP Address
(128 bit)
Figure 1: Packet format of IPv6

Volume 8, Issue 1 • January-June, 2017 87


IITM Journal of Management and IT

Table 1: Comparison of Internet Protocol version 4 and Internet Protocol version 6


Characteristic Factor IPv4 IPv6
Header Length It is of 32 bit long It is of 128 bit long
IP Security It does not have any security It provides integrated authentication
and privacy capabilities
Address Resolution and It has ICMPv4 which does not includes ICMPv6 which includes address
Address Auto address resolution and address auto resolution and address
Configuration configuration auto configuration
NATing Here we need Network Address Due to large number of address space
Translator(NAT) no need of NATing
Header !2 basic header field not including Simplified with 8 fields this
option and padding field improve packet handling
Octets 20 ( Up to 60 bytes if option field 40 (Large because of the length of
used) source and destination)

Version: Version is same as IPv4 which is used to EH are optional and are placed between IPv6 header
identify the version of the packet. It is of 4 bit long and payload. EH are used for fragmentation, for
field. For IPv6 always set version field to 0110 and security, to support mobility and more [6].
0100 for IPv4.
V. IPv4 and IPv6 Coexistence
Traffic Class: This field is same as type of service field
in IPv4. It is of 8 bit long field used for real time There is not a single date to move IPv6. Both Ipv4
application. It can be used to inform router and and Ipv6 will coexist. The transition is expected to
switches to maintain same path for the packet flow so take years. IETF (Internet engineering task force) has
that packet are not reordered. created various protocols and tools to help network
administrator migrate their network to IPv6. These
Payload Length: Payload length field is 16 bit long migration techniques are divided into three categories:
field. It is equivalent to total length field in IPv4.
Define entire packet size including header and optional Dual Stack: It allows Ipv4 and IPv6 to coexist on the
extensions [5]. same network. Dual stack devices run both IPv4 and
IPv6 protocol stack simultaneously.
Next header: Next Header field is 8 bit long field which
is similar to time to live field of IPv4. These values are Tunneling: It is method to transporting IPv6 packet
decremented by one by each router that forwards the over an IPv4 network. IPv6 packet is encapsulated
packets when value reaches zero packet is discarded inside an IPv4 packet similar to other type of data.
and ICMPv6 message is forwarded to sending host
Translation: NAT64 allows IPv6 enabled device to
indicate that packet did not reach to destination.
communicate with IPv4 enabled device using a
Source Address: It is of 128 bit long. This is used to translation technique similar to NAT for IPv4
specify the address of the sender who tries to send the
message. VI. Comparision and Analysis
Destination Address: It is of 128 bit long. This address IPv6 provides 340 undecillion addresses roughly equal
is used to specify the destination address that to sender to every grain of sand on earth. Some field are renamed
wants to sends the message. same. Some field from IPv4 is not used. Some field
changed name and position. In addition new field has
IPv6 packet might also contain extension header (EH)
been added to IPv6 which is not IPv4 [7]. The detailed
which provides optional network layer information.
comparison between Internets Protocol version 4 and

88 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

Version 6 are shown in Table 1. In Table 1 first column we are currently used. Definitely IPv6 is the best among
shows the various characteristic factor bases on these two because it comes after the IPv4 so it eliminate the
two are differ. While second column is for IPv4 and drawbacks of IPv6. IPv4 is the popular protocol which
third column is for IPv6 [8]. we use since long time due to this both protocol keeps
their equal importance. In this paper we can clearly
VII. Conclusion see that the IPv6 is better replacement of IPv4 which
IPv6 and IPv4 both are the Internet Protocols which will take time to overcome the IPv4.

References
1. W. Stalling, Data and Computer Communication, 5th Edition,upper saddle river, NJ: Prentice Hall, 2012.
2. M. Mackay and C. Edwards, “A Managed IPv6 Transitioning Architecture for Large Network Deployments,”
IEEE Internet Computing, vol. 13, no. 4, pp. 42 –51, july-aug. 2009.
3. S. Bradner and A. Mankin, IPng: internet protocol next generation reading, MA: Addision-Wesley, 2011.
4. R. Gillign and R. allon ,”IPv6 Transition mechanism overview” Connexions, oct 2002.
5. E. Britton, J. Tavs and R. Bournas, “TCP/IP: The next generation”, IBM sys, J.No. 3, 1995.
6. C. Huitema, IPv6 the new internet protocol, Upper saddle river, NJ. Prentice Hall, 1996
7. R. Hinden,”IP next generation overview” connexions, Mar 1995.
8. Fernandez, P. Lopez, M. A. Zamora, and A. F. Skarmeta, “Lightweight MIPv6 with IPSec support (Online
First, DOI: 10.3233/MIS-130171),” Mobile Information Systems, http://iospress.metapress.
9. G. Huston, “IPv4 Address Report,” Tech. Rep., Sep. 2010. [Online]. Available: http://www.potaroo.net/
tools/ipv4
10. S. Deering and R. Hinden, “Internet Protocol, Version 6 (IPv6) Speci- fication,” 1998, IETF RFC 2460.
11. S. Thomson, T. Narten, and T. Jinmei, “IPv6 Stateless Address Autoconfiguration,” 2007, IETF RFC 4862
12. R. Hinden and S. Deering, “IP Version 6 Addressing Architecture,” 2006, IETF RFC 4291

Volume 8, Issue 1 • January-June, 2017 89


Social Engineering – Threats & Prevention
Amanpreet Kaur Sara*
Nidhi Srivastava**
Abstract
The term “social engineering” (SE) has gained wide acceptance in the Information Technology (IT) and
Information Systems (IS) communities as a social/psychological process by which an individual (called
attacker) can gain information from an individual (called victim) about a sensitive subject. This information
can be used immediately to by-pass the existing Identification-Authentication-Authorization (IAA) process
or as part of a further SE event. Social engineering methods are numerous and people using it are
extremely ingenious and adaptable. Nonetheless, the field is new but the tactics of the attackers remain
same. Therefore, this paper provides an overview of the current scenario in social engineering and the
security issues associated with it.
Keywords: Cyber security; risks; hacking; social engineering

I. Introduction the objective of catching an end client’s sensitive


A typical misunderstanding regarding cyber-attacks/ information. A phishing message may originate from
hacks is that a very high end tools and technologies a bank, the govt or a noteworthy organizations. The
are used to retrieve sensitive information from conversation or content of the call may vary. Some
someone’s account, machines or mobile phones. This request that the customer to verify their login details,
is essentially false. Hackers have discovered very old and incorporate a taunted up login page finish with
and simple method to steal your data by just conversing logos and marking to look honest to goodness. Some
with you and misguiding you.[1] In this paper we will claim the customer is the winner of a great prize or
figure out how these sorts of human assaults (called draw and demand access to a bank account in which
social engineering assaults) work and what you can to send the rewards. Some request altruistic gifts after
do to ensure yourself. a natural calamity or disaster.[2]

II. Types of Social Engineering Attacks B. Baiting


Here are some of the techniques that are commonly Baiting, like phishing, includes offering something
used to retrieve sensitive information. very attractive to a customer at the cost of their login
details or private information. The “Bait” is available
A. Phishing
in both forms digital and physical. Digital say for
Phishing is the main type of social engg assaults that example some music or movie file download. While
are commonly conveyed as an chat, email, web downloading you get the infected files and caught into
promotion or site that has been intended to imitate a trap. Physical say for example some flash drive with a
real system and organisation. Phishing messages are name “Annual Appraisal Report” is intentionally left
created to convey a feeling of earnestness or dread with on someone’s desk. As its name is so attractive anybody
who will come and see it will definitely insert this drive
Amanpreet Kaur Sara*
to the system and he/she will be trapped. [2, 3]
IT Department,
Institute of Information Technology and C. Quid Pro Quo
Management
This type of Assault happens when assailants ask for
Nidhi Srivastava**
private or sensitive data from somebody in return for
IT Department,
something attractive or some kind of pay. Say for eg a
Institute of Information Technology and
customer may get a telephone call from the assailants
Management
IITM Journal of Management and IT

who, acted like a technology expert, offers free IT help to be good policies for successful defense against the
or innovation enhancements in return for login social engineering and all personnel should ensure to
accreditations. [1,4] Another regular case is a assailants, follow them. It is not about typical software system
acted like a specialist, requests access to the for Social engineering attacks but the people which in
organization’s system as a major aspect of an analysis themselves are quite fickle. There are certain counter
or experiment in return for Rs.1000/- . On the off measures which we can help in reduction of these
chance that an offer seems to be very genuine. Then is attacks.[18]
defiantly it is a quid pro quo.
Below mentioned are the prevention techniques for
D. Pretexting individual defense.
In pretexting preplanned situation is created (pretext) A. We should always be vigilant of any email which
to trap a targeted customer in order to reveal some asks for personal financial information or warns
sensitive information. In these type of situations of termination of online accounts instantly.
customer perform actions that are expected by a hacker B. If an email is not digitally signed, you cannot
and he caught into the trap and reveal his/her sensitive ensure if the same isn’t forged or spoofed. It is
information. [4] An elaborate lie, it most often involves highly recommendable to check the full headers
some prior research or setup and the use of this as anyone can mail by any mail.
information for impersonation (e.g., date of birth,
Social Security number, last bill amount) to establish C. Generally fraudulent person would ask for
legitimacy in the mind of the target. [5] information such as usernames, passwords, credit
card numbers, social security numbers, etc. This
E. Piggybacking kind of information is not asked normally by even
Other name for piggybacking is tailing. When a the authorized company representative. Hence one
unauthorized person physically follows an authorized should be careful.
person into an organization’s private area or system. D. You may find Phisher emails are generally not
Say for example sometimes a person request another personalized you may find something like this
person to hold the gate as he has forgotten his access “Dear Customer”. This is majorly because of the
card. Another example is to borrow someone’s laptop fact that these are intended to trap innocent people
or system for some times and installing malicious by sending mass mailers. Authorized mails will
software by entering into his restricted information have personalized beginning. However one should
zone. be vigilant as phisher could send specific email
intending to trap an individual. It could well then
F. Hoaxing
be like our case study.
Hoaxing is an endeavor to trap the people into thinking
something false is genuine. It likewise may prompt to E. One should very careful while contacting financial
sudden choices being taken because of fear of an institutions. It has to be thoroughly checked while
unfortunate incident. entering your critical information like bank card,
hard-copy correspondence, or monthly account
III. Preventions statement. Always keep in mind that the e-mails/
By educating self, user can prevent itself from the links could look very authentic however it could
problem of social engineering to large extent. be spurious.
Extremely common and easy way is not to give the F. One should always ensure that one is using a secure
password to anyone and by taking regular backup of website while submitting credit card or other
the data. There has to be strict action. Application of sensitive information via your Web browser.
authentication system like smart cards or biometrics
is a key. By doing this, you can prevent a high G. You should log on and change the password on
percentage of social engineering attempts. There has regular basis.[15]

Volume 8, Issue 1 • January-June, 2017 91


IITM Journal of Management and IT

H. Every bank, credit and debit card statements we humans are highly unpredictable due to sheer
should be properly checked and one should ensure curiosity and never ending greed without concern for
that all transactions are legitimate the consequences. We could very well face our own
I. You should not assume that website is legitimate version of a Trojan tragedy [11]. Biggest irony of social
just by looking at the appearance of the same. engineering attacks is that humans are not only the
J. One should avoid filling forms in email messages biggest problem and security risk, but also the best
or pop-up windows that ask for personal financial tool to defend against these attacks. Organizations
information. These are generally used by should definitely fight social engineering attacks by
spammers as well as phisher for attack in forming policies and framework that has clear sets of
future.[10] roles and responsibilities for all users and not just
security personnel. Also organization should make sure
IV. Conclusion that, these policies and procedures are executed by users
In today’s world, perhaps we could have most secured properly and without doubt regular training needs to
and sophisticated network or clear policies however be imparted given such incidents’ regular occurrence.

References
1. “Ouch” The monthly security newsletter for computer users issue(November 2014)
2. “Mosin Hasan, Nilesh Prajapati and Safvan Vohara” on “CASE STUDY ON SOCIAL ENGINEERING
TECHNIQUES FOR PERSUASION” in International journal on applications of graph theory in wireless
ad hoc networks and sensor networks (GRAPH-HOC) Vol.2, No.2, June 2010
3. “Christopher Hadnagy “ -A book on “Social Engineering -The Art of Human Hacking “Published by Wiley
Publishing, Inc. in 2011
4. The story of HP pretexting scandal with discussion is available at Davani, Faraz (14 August 2011). “HP
Pretexting Scandal by Faraz Davani”. Scribed. Retrieved 15 August 2011.
5. “Pretexting: Your Personal Information Revealed”, Federal Trade Commission
6. “Tim Thornburgh” on “Social Engineering: The Dark Art” published in ACM digital library Proceeding
New York in infoSecCD ’04 Proceedings of the 1st annual conference on Information security curriculum
development page 133-135.
7. “Valericã GREAVU-ªERBAN, Oana ªERBAN” on “ Social Engineering a General Approach” in Informatica
Economicã vol. 18, no. 2/2014
8. Malware : Threat to the Economy, Survey Study by Mosin Hasan, National Conference IT and Business
Intelligence (ITBI - 08)
9. White paper: Avoiding Social Engineering and Phishing Attacks,Cyber Security Tip ST04-014, by Mindi
McDowell,Carnegie Mellon University, June 2007.
10. Book of ‘People Hacking’ by Harl
11. FCAC Cautions Consumers About New “Vishing” Scam, Financial Consumer Agency of Canada, July 25,
2006.
12. Schulman, Jay. Voice-over-IP Scams Set to Grow, VoIP News, July 21, 2006.
13. Spying Linux: Consequences, Technique and Prevention by Mosin Hasan, IEEE International Advance
Computing Conference (IACC’09)

92 National Conference on Emerging Trends in Information Technology


IITM Journal of Management and IT

14. Redmon,- audit and policy Social Engineering manipulating source , Author: Jared Kee,SANS institute.
15. White paper ‘Management Update: How Businesses Can Defend against Social Engineering Attacks’ published
on March 16, 2005 by Gartner.
16. White paper, Social Engineering:An attack vector most intricate to tackle by Ashish Thapar.
17. The Origin of Social Engineering Bt Heip Dand MacAFEE Security Journal, Fall 2008.
18. Psychology: A Precious Security Tool by Yves Lafrance,SANS Institute,2004.
19. SOCIAL ENGINEERING: A MEANS TO VIOLATE A COMPUTER SYSTEM, By Malcolm Allen,
SANS Institute, 2007
20. Inside Spyware – Techniques, Remedies and Cure by Mosin hasan Emerging Trends in Computer Technology
National Conference

Volume 8, Issue 1 • January-June, 2017 93

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy