Skin Disease Detection Using Machine Learning
Skin Disease Detection Using Machine Learning
A PROJECT REPORT
ON
“SKIN DISEASE DETECTION USING MECHINE LEARNING”
Submitted in partial fulfillment of the requirements for the award of degree of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
By
ARATHI K 1EP16CS013
B DEEPIKA 1EP16CS018
BHAVANI S N 1EP16CS021
HAMSA K V 1EP16CS041
CERTIFICATE
This is to certify that the project work entitled “SKIN DISEASE DETECTION USING MECHINE
LEARNING” is a bonafide work being carried out by ARATHI K [1EP16CS013], B DEEPIKA
[1EP16CS018], BHAVANI S N [1EP16CS021] and HAMSA K V [1EP16CS041] in the partial
fulfillment of the requirements for VIII semester for the award of degree of Bachelor of Engineering
in Computer Science and Engineering of Visvesvaraya Technological University, Belagavi,
during the academic year 2019-2020. It is certified that all the corrections/suggestions indicated for
the project has been incorporated in this report. The report has been approved as it satisfies the
academic requirements prescribed by theUniversity.
1.…………………………………………….. ……………………………………………..
2.…………………………………………….. ...............................................................
ii
ACKNOWLEDGEMENT
Firstly, we thank the Management and Principal of East Point College of Engineering and
Technology, Bangalore for providing us an opportunity to work on this project. It gives us
immense pleasure to express our deep sense of gratitude whose words of advice have always
been a constant source of inspiration for us.
First and foremost, we would like to thank Late Dr. S M Venkatapathi, Chairman, East Point
Group of Institution, Bengaluru, for providing necessary infrastructure and creating a good
environment.
We express our gratitude to Dr. Prakash K, Principal, EPCET who has always been a great
source of inspiration
We would like to express our heartfelt thanks to Mr. Nityananda C R, Professor and Head of
Department of Computer Science and Engineering, EPCET for his valuable advice and
encouragement to us in completing this project work.
We are obliged to Mrs. Vishnupriya K, Assistant Professor, Dept. of CSE, who have
rendered his valuable assistance as the project guide.
We would like to thank our Parents and Friends for their support and encouragement during
the course of our mini project. Finally, we offer our regards to all the faculty members of CSE
department and all those who supported us in any respect during the mini project.
ARATHI K [1EP16CS013]
B DEEPIKA [1EP16CS018]
BHAVANI S N [1EP16CS021]
SHAMBHAVI S D [1EP16CS041]
iii
ABSTRACT
Skin diseases are hazardous and often contagious, especially melanoma, eczema, and
impetigo. These skin diseases can be cured if detected early. The fundamental problem with it is, only
an expert dermatologist is able to detect and classify such disease. Sometimes, the doctors also fail to
correctly classify the disease and hence provide inappropriate medications to the patient. Our paper
proposes a skin disease detection method based on Image Processing and Deep Learning Techniques.
Our system is mobile based so can be used even in remote areas. The patient needs to provide the
image of the infected area and it is given as an input to the application. Image Processing and Deep
Learning techniques process it and deliver the most accurate output. In this paper, we present a
comparison of two different approaches for realtime skin disease detection algorithm based on
accuracy. We have compared Navies Bias Classifier. The results of real-time testing are presented.
iv
Efficacy of review-based sentiment analysis for stock market prediction
1 INTRODUCTION 1
1.2 Contribution 3
2 LITERATURE SURVEY 7
3 REQUIREMENT SPECIFICATION 13
4 SYSTEM ANALYSIS 18
4.2.2 SVM 23
5 SYSTEM DESIGN 28
6 IMPLIMENTATION 32
6.1 Methodology 32
REFERENCES 47
LIST OF FIGURES
6.1 Methadology 32
CHAPTER 1
INTRODUCTION
The biggest organ of the body is human skin. Its weight lies between six and nine pounds and
surface area is about two square yards. Inner part of body is separated by skin from the outer
environment. It provides protection against fungal infection, bacteria, allergy, viruses and controls
temperature of body. Situations that frustrate, change texture of the skin, or damage the skin can
produce symptoms like swelling, burning, redness and itching. Allergies, irritants, genetic structure,
and particular diseases and immune system related problems can produce dermatitis, hives, and other
skin problems. Many of the skin diseases, such as acne, alopecia, ringworm, eczema also affect your
look. Skin can also produce many types of cancers. Image processing is used to detect these diseases
by using various methods like segmentation, filtering, feature extraction etc.
To get an improved image or to get meaningful information from an image, it is necessary to convert
an image into digital form and then perform functions onto that image. It is a part of signal
processing. The input is an image and it may be a video, a photograph and output is also another
image having same characteristics as input image. Mostly Image Processing models take input
samples as 2-D signals and after that they apply fixed signal processing methods to them. It is widely
used technology now days and it has various applications in the area of business. It is a new research
area within engineering and computer science too. The range of skin diseases is very wide.
As you know, approximately eight million people in the UK currently suffered from skin disease.
Skin disease doesn’t just damage the skin. It can have a large impact on human’s daily life [8],
destroy confidence of a person, stop their movement, and turn to depression. The worst situation is
that, it can even kill. It’s a serious issue that needs to be controlled, so it is necessary to take skin
diseases very seriously and identify it at an early stage and prevent it from spreading. Detection of a
disease depends upon many factors like which parameters are considered for disease detection.
Firstly, take an image, apply filters to remove noise from the image, segment the image to extract
meaningful information, feature extraction is done on the basis of input parameters and then classify
the diseases by using appropriate classifier.
Skin diseases have a serious impact on the psychological health of the patient. It can result in the loss
of confidence and can even turn the patient into depression. Skin diseases can thus be fatal. It is a
serious issue and cannot be neglected but should be controlled. So it is necessary to identify the skin
diseases at an early stage and prevent it from spreading. Human skin is unpredictable and almost a
difficult terrain due to its complexity of jaggedness, lesion structures, moles, tone, the presence of
dense hairs and other mitigating confusing features. Early detection of skin diseases can prove to be
cost effective and can be accessible in remote areas. Identifying the infected area of skin and
detecting the type of disease is useful for early awareness. In this paper, a detection system is
proposed which enables the users to detect and recognize skin disease.
In this system, the user has to provide the image of the affected area, the input image then undergoes
preprocessing which involves filtering to remove the noise, segmentation to extract the lesion and
then feature extraction to extract the features of the image and finally classifier to detect the affected
area. For classification, Support Vector Machine (SVM) is used. On the other hand, deep learning
algorithms have a competency to handle large datasets of complex computation hence, Naïve
Bayesian and Support Vector Machine (SVM) is also implemented as a part of the research area to
detect the affected area of skin. A comparison between SVM and Naïve Bayesian is also represented
with accuracy and confusion matrix. This paper proposed the solution for detecting the skin diseases
viz. Melanoma, Nevus and Basal Cell Carcinoma.
Now a day’s people are suffering from skin diseases, More than 125 million people suffering from
skin diseases also skin disease rate is rapidly increasing over last few decades specially Melanoma is
most diversifying skin disease. Nevus rate is high specially at rural areas. If skin diseases are not
treated at earlier stage, then it may lead to complications in the body including spreading of the
infection from one individual to the other. The skin diseases can be prevented by investigating the
infected region at an early stage. The characteristic of the skin images are diversified, so that it is
challenging job to devise an efficient and robust algorithm for automatic detection of the skin disease
and its severity. Skin tone and skin color plays an important role in skin disease detection. Color and
coarseness of skin are visually different. Automatic processing of such images for skin analysis
requires quantitative discriminator to differentiate the diseases.
Proposed system is combo model which is used for the prevention and early detection of skin disease,
Melanoma and Nevus. Basically skin disease diagnosis depends on the different characteristics like
color, shape, texture etc. there are no accepted treatment for skin diseases Different physicians will
treat differently for same symptoms. Key factor in skin diseases treatment is early detection further
treatment reliable on the early detection.
In this paper, Proposed system is used for the diagnosis multiple skin disease using statistical
parameter analysis. Statistical analysis is anxious with analysis of random data. Random data is
pattern of skin diseases. Standard database is used this data does not have any mathematical
expression, it has some statistical properties. To analyses random data we must analyze statistical
properties of it.
1.2 Contribution
In this paper, we present an image to diagnose multiple skin diseases using statistical parameter
analysis. Statistical analysis is concerned with the analysis of random data. This system is combo-
model which is to be used to diagnose multiple skin diseases at a time. The target skin diseases are
Melanoma, Nevus. The disease diagnosis and classification is built on statistical parameter analysis.
Statistical parameters includes: Entropy, Texture index, Standard deviation, Correlation fact
Depending on standard range of parameters skin disease is going to be diagnosed and classified.
The doctors typically have assumed diagnosis opinion, which most likely begin by searching for
further evidence that their assumption can be validated and in cases where it is not validated, they
will have missed other potential diagnosis. Bias essentially influences analysis made by medical
practitioners, just as with any human search that begins with keywords chosen by the user.
Additionally, if a doctor begins searching by symptoms, while this may be accurate, the order or
weight given to any of the symptoms would most likely give a bias towards related diagnosis when in
fact, there may be a symptom that is not given any credit and thus not included in the search or
considered in timely fashion.
The heavy dependencies on medical expert for medical image diagnosis analysis are a serious
challenge for regions (especially Low and Medium Income Countries) where the expert might not be
readily available, inadequate or nonresponsive to an urgent medical need (such as dermatological-
related). The aforementioned problems suggest that a better and manageable solution is needed
urgently with the view to minimize these dependencies and human bias, thus leading to our research
question.
In existing system, Gray-level co-occurrence matrix (GLCM) was introduced to segment the images
of skin disease. And they obtain output from Convolutional neural network (CNN) to get accurate
result.
The algorithms used are SVM and CNN which fails to provide accurate results when the size of data
set is very high or if the dataset has greater amount of noise. The main drawback lays in their
structural simplicity, especially in case of complex skin diseases, like psoriasis or skin cancers, the
pathogenesis of which results from complicated interactions between cellular or molecular
components.
In this paper we propose the image analysis system to detect the skin disease. Our system captures
image from standard database and put into the system to inform the user for preventing the threats
linked to the skin diseases. More briefly we present the image analysis system to detect different skin
diseases where user will able to take image of different moles or skin patches. Our system will
analyse and process the image and classifies the image to normal Melanoma, Nevus and Basal Cell
Carcinoma case based extracting the image features. An alert will be provided to the user to seek
medical help is the mole belongs to a typical a Melanoma category.
Advantages
Simple to implement
CHAPTER 2
LITERATURE SURVEY
Image Analysis Model for Skin Disease Detection Alaa Haddad; Shihab A.
Hameed IEEE2018
Skin disease is the most common disease in the world. The diagnosis of the skin disease requires a
high level of expertise and accuracy for dermatologist, so computer aided skin disease diagnosis
model is proposed to provide more objective and reliable solution. Many researches were done to
help detect skin diseases like skin cancer and tumor skin. But the accurate recognition of the disease
is extremely challenging due to the following reasons: low contrast between lesions and skin, visual
similarity between Disease and non-Disease area, etc. This paper aims to detect skin disease from the
skin image and to analyze this image by applying filter to remove noise or unwanted things, convert
the image to grey to help in the processing and get the useful information. This help to give evidence
for any type of skin disease and illustrate emergency orientation. Analysis result of this study can
support doctor to help in initial diagnoses and to know the type of disease. That is compatible with
skin and to avoid side effects.
Skin diseases such as Melanoma and Carcinoma are often quite hard to detect at an early stage and it
is even harder to classify them separately. Recently, it is well known that, the most dangerous form
of skin cancer among the other types of skin cancer is melanoma because it is much more likely to
spread to other parts of the body if not diagnosed and treated early. In order to classify these skin
diseases, "Support Vector Machine (SVM)" a Machine Learning Algorithm can be used. In this
paper, we propose a method to identify whether a given sample is affected with Melanoma or not.
The steps involved in this study are collecting labelled data of images that are pre-processed,
flattening those images and getting the pixel intensities of images into an array, appending all such
arrays into a database, training the SVM with labelled data using a suitable kernel, and using the
trained data to classify the samples successfully. The results show that the achieved accuracy of
classification is about 90%.
Automatic Classification of Clinical Skin Disease Images with Additional High-Level Position
Information Jingyi Lin ;Zijian Guo ; Dong Li ; Xiaorui Hu ;Yun Zhang IEEE 2019
Since skin disease is one of the most common human diseases, intelligent systems for classification
of skin diseases have become a new line of research in deep learning, which is of great significance
for both doctors and patients. Some skin-disease datasets have already been published, such as the
SD-198 dataset, which contains 6584 clinical skin-disease images of 198 categories. However,
because of the diversity of clinical dermatology, previous works have showed that the performance of
deep visual features is not as good as or even worse than hand-crafted features for skin disease
classification. In this paper, we propose an SD-198-P dataset, which includes additional high-level
position information in the SD-198 dataset to guide the generation of better deep visual features. Our
experiment shows that, after adding the position information, the performance of deep visual features
is better than that of hand-crafted features. To the best of our knowledge, our method outperforms the
current state-of-the-art clinical skin disease classification methods.
Skin Disease detection based on different Segmentation Techniques Kyamelia Roy ; Sheli Sinha
Chaudhuri ; Sanjana Ghosh ; Swarna Kamal Dutta ; Proggya Chakrabor IEEE 2019
The outer integument of the human body is skin. The skin pigmentation of human beings varies from
person to person and human skin type can be dry, oily, or combination. Such a variety in the human
skin provides a diversified habitat for bacteria and other microorganisms. Melanocytes in the human
skin, produces melanin which can absorb harmful ultraviolet radiation from sunlight which can
damage the skin and result in skin cancer. The necessary tools needed for early detection of these
diseases are still not a reality in most third world communities. If the symptoms of skin diseases such
as acne, dermatomyositis, candidiasis, cellulitis, Scleroderma, chicken pox, ringworm, eczema,
psoriasis, etc. are left untreated in its early stage then they can result in numerous health
complications and even death. Image segmentation is a technique which aids with the detection of
these skin diseases. In this paper, image processing techniques like adaptive thresholding, edge
detection, K-means clustering and morphology-based image segmentation have been used to identify
the skin diseases from the given image set. The acquired image set was pre-processed by deblurring,
noise reduction and then processed. Depending on the definite pattern (pertaining to a distinct
disease) present in the processed image the disease is detected at the output for a corresponding input
image.
Abstract- Dermatological Diseases are one of the biggest medical issues in 21st century due to
it’s highly complex and expensive diagnosis with difficulties and subjectivity of human
interpretation. In cases of fatal diseases like Melanoma diagnosis in early stages play a vital role in
determining the probability of getting cured? We believe that the application of automated methods
will help in early diagnosis especially with the set of images with variety of diagnosis. Hence, in this
article we present a completely automated system of dermatological disease recognition through
lesion images, a machine intervention in contrast to conventional medical personnel based detection.
Our model is designed into three phases compromising of data collection and augmentation,
designing model and finally prediction. We have used multiple AI algorithms like Convolution
Neural Network and Support Vector Machine and amalgamated it with image processing tools to
form a better structure, leading to higher accuracy.
Skin diseases are frequent diseases to every person and various types of infections are becoming very
frequent. You know that all of these diseases are very harmful, especially if not controlled at an early
stage. Skin diseases not only damage the skin. It can have a large effect on a person’s daily life,
destroy confidence of a person, hang their movement, and turn to depression. Sometimes, many
people try to treat these allergies by using their own therapy. However, if these methods are not
appropriate for that type of skin disease then it would make it more harmful. Skin diseases can easily
transfer from human to human so there is a need to control it their initial stage to prevent it from
spreading. This paper presents an implementation of a skin diseases diagnosis system which helps
user to detect human skin diseases and provides medical treatments timely. For this purpose, user will
have to upload a disease affected skin image to our system and give answers to the questions which
are asked to user according to the symptoms of the skin. These symptoms are used to identify the
disease and provide a medical treatment. This system works on technologies like image processing
and data mining for skin diseases detection. So the whole project is divided in to below major parts,
The image of skin disease is taken and various pre processing techniques are applied onto that image
for noise removal and image enhancement. This image is segmented by using a segmentation
technique i.e. thresholding segmentation. At last, data mining techniques are used to identify the skin
disease and to provide recommendations to users. This expert system pertain disease recognition
accuracy of 85% for Eczema, 95% for Impetigo and 85% for Melanoma. Both image based technique
and questionnaire technique help to increase reliability and performance of the system.
Limitations
This application is implemented only for three skin diseases (Eczema, Impetigo and
Melanoma).
It is implemented only for windows application so that is not yet develop for smart phones
like Android, IOS etc.
During image acquisition, the distance between camera lens and affected skin should be 5cm.
When capture the image for this application, it is mandatory to capture it without any light
effects.
It only supports English language not for other ordinary languages like Sinhala,Tamil.
Rule based and forward chaining inference engine methods are used to implement this model which
is used to identify the skin disease. By using this system, user is allowed to identify children skin
diseases via online and provide useful medical suggestions or advice timely. In this system, it
consists of diagnosis module, login module, info module, report module and management module.
There are two main modules called diagnosis and management module. In the diagnose module
questions are asked to the user and on the basis of answers given by the user, Children's
symptoms and condition are identified. This system may be an alternative for parents to identify skin
diseases of children, in response to the questions about the symptoms and the condition
children' skin.
In this model, the condition of the skin disease is identified by evaluating skin disease images by
using grey normalized symmetrical simultaneous occurrence stencils (GLCM) method. The proposed
system is used in an efficient and economical for the automatic recognition of skin diseases. This
system is useful for the skin to reduce the error with medical diagnosis. Another is the first test for
patients in rural areas, where the good doctors are missing. The system works with relational
databases to the storage of implying the need for textual skin images. This system can also work for
same type of images directly over feature vectors.
In artificial intelligence (AI), medical field is a recent area for research purpose. This paper
implements a mobile based medical assistance which is used for diagnosing skin diseases by the use
of CBR and image processing. This model was developed to help users to pre- examine their skin
situation whether they have a disease or not. Also to increase the awareness of skin diseases on what
it may do to our bodies which will lead to death or infecting other people and have a cure before it
gets worse. The proposed system is successfully implemented to detect 6 different skin diseases with
an accuracy of 90%. The scale of symptoms, which is used for testing, is 15%, for validation it is
10% and for testing it is 75%. This supervised system identify diseases at the rate of 90% where the
unsupervised system detect diseases at the rate of 80%.The detection rate of the sample disease with
the other related disease is as follows: Eczema – 88%; Psoriasis – 61%; Acne – 75%; Skin Cancer –
51%; Scabies – 43%; and Seborrhea Dermatitis – 34%.
The idea of "skin detection & quot; from an image is described as the categorization of the
existence pixels in that image into two skin and Non-skin classes. Many methods uses different color
space to extract features for the categorization of pixels, but most of these methods does not detect
different type of skin with high accuracy. The present method in this paper is implemented by using
& quot; Color based image retrieval & quot; (CBIR) technique. In this method, first of all by finding
means of CBIR method and image tiling and finding the relationship between pixel and its
neighbors, a set of feature vector is prepared and then at the test stage, training is used for skin
detection. Experimental results show that the proposed model identifies different type of skin with a
high accuracy and it is not sensitive to illumination intensity and with the movement of face. The
proposed method contains two steps such as train and test. First in training step, pure skin images
were trained and then in testing steps skin area were detected from non- skin areas.
CHAPTER 3
REQUIREMENT SPECIFICATION
3.2Functional Requirements
A function of software system is defined in functional requirement and the behavior of the
system is evaluated when presented with specific inputs or conditions which may include
calculations, data manipulation and processing and other specific functionality.
The functional requirements of the project are one of the most important aspects in terms
of entire mechanism of modules. After validating our model, it should be able to predict
the future stock market price.
Nonfunctional requirements describe how a system must behave and establish constraints of
its functionality. This type of requirements is also known as the system’s quality attributes.
Attributes such as performance, security, usability, compatibility are not the feature of the
system, they are a required characteristic. They are "developing" properties that emerge from
the whole arrangement and hence we can't compose a particular line of code to execute them.
Any attributes required by the customer are described by the specification. We must include
only those requirements that are appropriate for our project.
Reliability
The structure must be reliable and strong in giving the functionalities. The movements must
be made unmistakable by the structure when a customer has revealed a couple of
enhancements. The progressions made by the Programmer must be Project pioneer and in
addition the Test designer.
Maintainability
The system watching and upkeep should be fundamental and focus in its approach. There
should not be an excess of occupations running on diverse machines such that it gets hard to
screen whether the employments are running without lapses.
Performance
The framework will be utilized by numerous representatives all the while. Since the system
will be encouraged on a single web server with a lone database server outside of anyone's
ability to see, execution transforms into a significant concern. The structure should not
capitulate when various customers would use everything the while. It should allow brisk
accessibility to each and every piece of its customers. For instance, if two test specialists are
all the while attempting to report the vicinity of a bug, then there ought not to be any
irregularity at the same time.
Portability
The framework should to be effectively versatile to another framework. This is obliged when
the web server, which s facilitating the framework gets adhered because of a few issues,
which requires the framework to be taken to another framework.
Scalability
The framework should be sufficiently adaptable to include new functionalities at a later stage.
There should be a run of the mill channel, which can oblige the new functionalities.
Flexibility
Hardware Requirements
The most common set of requirements defined by any operating system or software application is the
physical computer resources, also known as hardware, a hardware requirements list is often
accompanied by a hardware compatibility list (HCL), especially in case of operating systems. An
HCL list tested, compatible, and sometimes in compatible hardware devices for a particular operating
systems or applications. The CPU is a fundamental system requirement for any software most
software running on different kinds of architecture defines processing power as the model and he
clock speed of the CPU. In this memory requirements are defined after considering demands of
applications, operating system, supporting software and files, and other running process. Hardware
requirements specifications list the necessary hardware for the proper functioning of the project.
Software Requirements
Software requirements deal with software resource requirements and prerequisites that need to be
installed on the computer to provide optimal functioning of an application. These requirements are
prerequisites are generally not included in the software installation package and need to be installed
separately before the software is installed. Software requirements specifications is a description of a
software system to be developed, laying out functional and non-functional requirements, and may
include a set of use cases that describe interactions the users will have the software.
Framework : Anaconda
CHAPTER 4
SYSTEM ANALYSIS
1. Preprocessing
3. Classification
The overall flow of the proposed method is represented in Figure. The performance of the Naïve
Bayes is analyzed using the feature matrix. Further, the performance of the Hog is studied for its
accuracy, sensitivity and specificity values. The process of diagnosing the eye diseases is illustrated
in the upcoming sections.
Image pre-processing is the initial step to identify the affected area. Multiple steps are performed in
the preprocessing phase to make the image suitable for the feature extraction process. The
abnormalities in the input image are detected and preprocessed for the following purpose:
In this research work, the techniques used for the preprocessing phase are:
Image resizing.
Histogram equalization.
2 Image Resizing
An image size can be changed in several ways. One of the simpler ways of increasing image size is
nearest-neighbor interpolation, replacing every pixel with the nearest pixel in the output; for up
scaling this means multiple pixels of the same colour will be present. Image resizing is necessary
when you need to increase or decrease the total number of pixels, whereas remapping can occur
when we are correcting for lens distortion or rotating an image. Zooming refers to increase the
quantity of pixels, so that when you zoom an image, we will see more detail.
3 Color Transformation
The retinal images are taken from the fundus camera in the form of RGB (Red, Green, and Blue).
Grayscale is a range of shades of gray without apparent color. The darkest possible shade is black,
which is the total absence of transmitted or reflected light. The lightest possible shade is white, the
total transmission or reflection of light at all visible wavelengths. Intermediate shades of gray are
represented by equal brightness levels of the three primary colors (red, green and blue) for
transmitted light for reflected light. In the case of transmitted light (for example, the image on a
computer display), the brightness levels of the red (R), green (G) and blue (B) components are each
represented as a number from decimal 0 to 255, or binary 00000000 to 11111111. For every pixel in
a red-green-blue (RGB) grayscale image, R = G = B. The lightness of the gray is directly
proportional to the number representing the brightness levels of the primary colors. Black is
represented by R = G = B = 0 or R = G = B = 00000000, and white is represented by R = G = B =
255 or R = G = B = 11111111. Because there are 8 bits in the binary representation of the gray level,
this imaging method is called 8-bit grayscale.
In some cases, rather than using the RGB or CMY color models to define grayscale, three other
parameters are defined. These are hue, saturation and brightness. In a grayscale image, the hue
(apparent color shade) and saturation (apparent color intensity) of each pixel is equal to 0. The
lightness (apparent brightness) is the only parameter of a pixel that can vary. Lightness can range
from a minimum of 0 (black) to 100 (white).
4.Histogram Equalization
The use of fundus camera to capture the retinal image results in an uneven illumination. The portions
near the center are well illuminated and hence it looks very bright. But the porti ons on the sides are
less illuminated and hence looks very dark. To address this issue, the histogram equalization is used.
As the regions of exudate and optic disc are much greater in intensity than the neighboring regions of
the image, the histogram equalization method is used to assign the neighboring regions greater
intensity.
Adaptive Histogram Equalization differs from ordinary histogram equalization in the respect that the
adaptive method computes several histograms, each corresponding to a distinct section of the image,
and uses them to redistribute the lightness values of the image. It is therefore suitable for improving
the local contrast and enhancing the definitions of edges in each region of an image.
Contrast Limited AHE (CLAHE) differs from adaptive histogram equalization in its contrast limiting.
In the case of CLAHE, the contrast limiting procedure is applied to each neighborhood from which a
transformation function is derived. CLAHE was developed to prevent the over amplification of noise
that adaptive histogram equalization can give rise to.
The HoG features are extracted from the localized ROI. The HoG features are invariant to geometric
and photometric transformation and thus used to describe the shape and edge of the structures present
within the image. As HoG features are related to edge information, the optic disc deformation due to
the presence of Eye disease can be depicted with these features. Deformation in the Optic disc is one
of the key parameters in the detection of Eye disease. To compute the HoG features, the image is
divided into small cells and the shape of the objects is obtained by counting the strength and
orientation of the spatial gradients in each cell.
The HoG extracts the features of the images that are present over the grid of overlapping rectangular
blocks in the search window. The histogram of each block is used to describe the frequency of the
gradient directions inside each block. The image is generally described by a set of local histograms.
These histograms count the occurrences of the gradient orientation and they become the local parts of
the images. The steps involved in calculating the histogram are:
A histogram of oriented gradients (HOG) is used in image processing applications for detecting objects in a
video or image, which by definition is a feature descriptor [2], proposed by Dalal and Triggs who used their
method for pedestrian detection. Figure shows the block diagram and block normalization scheme of HOG
feature extraction.
Gradient Computation
By applying two one-dimensional filtering techniques, the gradient of the image is easily obtained.
The calculated gradient can be either signed or unsigned. The next step involves the orientation
binning. Based on the number of bins, a histogram is calculated for each cell. This method is used to
0.split the image into various cells. Each cell has a spatial region with a predetermined size of pixels.
During each orientation, the HoG is calculated by gathering the number of feature values (votes) into
bins. The histogram considers the gradient at each point. By considering the magnitude of the
gradient, the edges are weighted. The gradient orientations around the edges are more prominent than
the uniform regions. The increase in the number of bins, in turn, increases the details of the
histogram. Based on the gradient distribution within the pixel patches, multiple feature descriptors
are available. The HoG are the feature descriptors to detect the object and it counts occurrences of
gradient orientation in the localized image portions. This process is same as the edge orientation
histograms, contrasts of shape and the descriptors of scale-invariant feature transform. As
preprocessing provides a slight impact on the performance, the HoG ensures normalized color and
gamma values by computing the gradient values.
4.2.2 SVM
An SVM is a classification based method or algorithm. There are some cases where we can use it for
regression. However, there are rare cases of use in unsupervised learning as well. SVM in clustering
is under research for the unsupervised learning aspect. Here, we use unlabelled data for SVM.
Since the topic is under research, we will only look at what it means. In regression, we call the
concept SVR or support vector regression. It is quite similar to SVM with only a few changes.
However, it is more complicated than SVM.
Now, we come to SVM. It is a strong data classifier. The support vector machine uses two or more
labelled classes of data. It separates two different classes of data by a hyperplane. The data points
based on their position according to the hyperplane will be put in separate classes. In addition, an
important thing to note is that SVM in Machine Learning always uses graphs to plot the data.
Therefore, we will be seeing some graphs in the article. Now, let’s learn some more stuff.
To understand SVM mathematically, we have to keep in mind a few important terms. These terms
will always come whenever you use the SVM algorithm. So let’s start looking at them one by one.
1. Support Vectors
Support vectors are special data points in the dataset. They are responsible for the construction of the
hyperplane and are the closest points to the hyperplane. If these points were removed, the position of
the hyperplane would be altered. The hyperplane has decision boundaries around it. And, the support
vectors help in decreasing and increasing the size of the boundaries. They are the main components
in making an SVM. We can see the picture for this.
The yellow and green points here are the support vectors. Red and blue dots are separate classes. The
middle dark line is the hyperplane in 2-D and the two lines alongside the hyperplane are the decision
boundaries. They collectively form the decision surface.
2. Decision Boundaries
Decision boundaries in SVM are the two lines that we see alongside the hyperplane. The distance
between the two light-toned lines is called the margin. An optimal or best hyperplane form when the
margin size is maximum. The SVM algorithm adjusts the hyperplane and its margins according to the
support vectors.
3. Hyperplane
The hyperplane is the central line in the diagram above. In this case, the hyperplane is a line because
the dimension is 2-D. If we had a 3-D plane, the hyperplane would have been a 2-D plane itself.
There is a lot of mathematics involved in studying the hyperplane. We will be looking at that. But, to
understand a hyperplane we need to imagine it first. Imagine there is a feature space (a blank piece
of paper). Now, imagine a line cutting through it from the center. That is the hyperplane. The math
equation for the hyperplane is a linear equation.
This is the equation. Here a0 is the intercept of the hyperplane. Also, a1 and a2 define the first and
second axes respectively. X1 and X2 are for two dimensions. Let us assume that the equation is equal
to E. So if the data points lie beneath the hyperplane then E<0. If they are above it, the E>=0. This is
how we classify data using a hyperplane.
In any ML method, we would have the training and testing data. So here we have n*p matrix which
has n observations and p dimensions. We have a variable Y, which decides in which class the points
would lie. So, we have two values 1 and -1. Y can only be these two values in any case. If Y is 1 then
data is in class 1. If Y is -1 then data is in class -1.
The Naive Bayes classifier is an efficient and simple probabilistic classifier based on Bayes’ theorem.
It is based on the Bayes Theorem. It predicts the class membership probabilities. “Naïve Bayesian
classifiers assume that the effect of an attribute value on a given class is independent of the values of
the other attributes. This assumption is called class conditional independence”. Due to this
assumption, the computation of the NB classifiers is better than the other classifiers. It is a simple
model that assigns class labels from a finite set to a vector of feature values. These classifiers assume
that the value of a particular feature is independent of any other feature. The advantage of naïve
Bayes is that it requires only a small number of training data. With the small number of training data,
the parameters can be estimated for classification. It classifies the data in two phases namely training
phase and prediction phase. In training phase, using training data, the parameters for probability
distributions are estimated, and in the prediction phase, for any unseen test data, the method
computes the posterior probability of that sample belonging to each class. The method thus classifies
the test data according to the largest posterior probability. Once the features of the training set are fed
into the classifier, the probabilities of individual features P(X) being presented the outcome (i.e., the
class – ‘Normal’ or ‘Eye disease ’), as well as the probabilities of each of the two classes, are
calculated. The Naive Bayes method is suitable for the discrete valued attributes as well as for large
size dataset, but in case of continuous valued attributes, Naive Bayes method is lacking in attribute
interactions. On the other hand, the decision tree does not give good performance when the data size
is very large. These limitations have been overcome by the notion of NB Tree algorithm. Proposed a
hybrid algorithm called Naïve Bayes Tree, which is a hybrid approach appropriate in learning
environment when various attributes are likely to be relevant for a classification task. NB Tree gives
relaxation to the attribute independence assumption of the Naïve Bayes algorithm. “NB Tree is a
hybrid classification technique which combines Decision Tree and Naive Bayes classification
algorithms. The algorithm is similar to the classical recursive partitioning schemes except that the
leaf nodes created are Naive-Bayes categorizers instead of nodes predicting a single class and the
learned knowledge is represented in the form of a tree. It combines the advantage of both Decision
Tree and Naïve Bayes Classification.” NB Tree induces highly accurate classifiers in practice. It has
been shown that NB Tree is accurate and scale up in terms of accuracy on real world datasets.
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features, all
of these properties independently contribute to the probability that this fruit is an apple and that is
why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:
Above,
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
CHAPTER 5
SYSTEM DESIGN
System design is the process of planning a new system to compliment or all together replace the old
system. The purpose of the design phase is the first step in moving from the problem domain to the
solution domain. The design of the system is the critical aspect that affects the quality of the aspects
of the system into physical aspects of the system. It is the process of defining the architecture,
modules, interfaces, and data for a system to satisfy specified requirements. System design could be
seen as the application of system theory product development. There is some overlap with the
disciplines of system analysis, system architecture, and system engineering.
A data flow diagram (DFD) is a graphical representation of the flow of data through an information
system, modeling its process aspects. A DFD is often used as a preliminary step to create an
overview of the system without going into great detail, which can later be elaborated. DFDs can also
be used for the visualization of data processing. DFD shows what kind of information will be input to
and output from the system, how the data will be advanced to the system, where the data will be
stored. It does not show information about process timing on weather processes will operate in
sequence or in parallel, unlike a traditional structured flowchart which focuses on control flow, or a
UML activity workflow diagram, which presents both control and data flows as a unified model.
The dataflow diagram is also known as bubble charts. DFD is a designing tool used in the top-down
approach to systems design. The DFDs can be used to provide the end user with the physical idea of
where the data the input ultimately has an effect upon the structure of the whole system from order to
dispatch to report.
ENROLMENT VERIFICATION
Image Image
Pre Processing
Pre Processing
Image Extraction
Feature Extraction
Feature Selection
In Fig 5.1 there are mainly two stages i.e., Enrollment and Verification respectively. In the
enrollment and verification stage the image can be preprocessed and improves the low contrast
image, it also includes the image enhancement, resizing of the image. After preprocessing the feature
of the image can be extracted, Hog features are extracted from the localized ROI. The feature
extraction followed by feature selection, In this the ROI is located using a rectangular mask and this
mask is selected by feature matrix, then finally the selected image is classified using Naïve Bayes
classifier and SVM classifier to detect the disease.
Activity diagram is defined as a UML diagram that focuses on the execution and flow of the behavior
of a system instead of implementation. Activity diagrams consist of activities that are made up of
actions which apply to behavioral modeling technology. It is a behavior that is divided into one or
more actions. Activities are a network of nodes connected by edges. There can be action nodes,
control nodes, or object nodes. Action nodes represent some action. Control nodes represent the
control flow of an activity. Object nodes are used to describe objects used inside an activity. Edges
are used to show a path or a flow of execution. Activities start at an initial node and terminate at a
final node.
In Fig 5.2 Image can be preprocessed and it also extracts its feature by using the feature extraction
method (Histogram of Gradients). Extracted feature of image can be verified by using INST rule and
it must hold the condition that, eye asymmetry is less than 0.2. After verifying the particular image, it
will detect whether that particular image is disease effected eye or normal eye by using Naïve Bayes
classifier. If the image matrix is less than or equal to 0 then it is considered as affected eye otherwise
it would be a normal eye.
Pre-Processing
Feature Extraction
YES YES
Y==1 MELANOMA Y==1
NO NO
YES
YES
Y==2 NEVUS Y==2
NO NO
BASAL CELL CARCINOMA
CHAPTER 6
IMPLEMENTATION
6.1 METHODOLOGY
The image is initially pre-processed and Resize, Histotrophic Equalization (HE) in image
acquisition. The HoG (Histogram of gradients) features are extracted from Collective competitive
ratio and number of statistical properties is derived. The derived properties constitute the HoG
features that are fed to the Naïve Bayes classifier and SVM classifier for identifying the diseases.
The classifier is trained and tested with disease image dataset. The methodology of the proposed
methodology is shown in Fig.6.1
Image Acquisition
Noise Removal
Feature Extraction using HOG
Classification
IMAGE ACQUISITION
NOISE REMOVAL
FEATURE EXTRACTION
USING HOG
CLASSIFICATION USING
SVM AND NAVIE BAYES
Image Acquisition
The first stage of our automated image analysis system is image acquisition. This stage is essential for
the rest of the system; hence, if the image is not acquired satisfactorily, then the remaining components
of the system may not achievable, or the results will not be reasonable. In this stage first image system
requires the resized image for the better results. Input image given to the system is in RGB form. But
for our proposed system requires gray images. Hence using RGB to GRAY conversion in MATLAB
we convert RGB images in to Gray images.
Noise Removal
It’s necessary to have quality images without any noise to get accurate result. Noisy images may lead
your algorithm towards incorrect result. Hence it becomes necessary to de-noise the image. Image de
noising is an important image processing task; there are many ways to de noise an image. The
important for good image de noising model is that it will remove noise while preserving edges.
Traditionally, linear model have been used. To de-noise the image we can use median filter. Median
filter does the work of smoothening of images.
Feature Extraction
To get an accurate result in biomedical image processing it is always necessary that biomedical image
must be a very good quality. However, practically this is not easy. Due to different reasons obtain low
or medium quality images. Hence it becomes necessary to improve their quality. To improve the
quality of an image using image enhancement algorithm. This algorithm enhances the iamge by
focusing on parameters like contrast, brightness adjustment.
Classification
The overall flow of the proposed method is represented in Figure. The performance of the Naïve
Bayes is analyzed using the feature matrix. Further, the performance of the Hog is studied for its
accuracy, sensitivity and specificity values. The process of diagnosing the eye diseases is illustrated
in the upcoming sections.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership
probabilities for each class such as the probability that given record or data point belongs to a
particular class. The class with the highest probability is considered as the most likely class. This is
also known as Maximum A Posteriori (MAP).
MAP(H)
= max( P(H|E) )
= max( (P(E|H)*P(H))/P(E))
= max(P(E|H)*P(H))
P(E) is evidence probability, and it is used to normalize the result. It remains same so, removing it
won’t affect.
Naive Bayes classifier assumes that all the features are unrelated to each other. Presence or absence
of a feature does not influence the presence or absence of any other feature. We can use Wikipedia
example for explaining the logic i.e.,
A fruit may be considered to be an apple if it is red, round, and about 4″ in diameter. Even if these
features depend on each other or upon the existence of the other features, a naive Bayes classifier
considers all of these properties to independently contribute to the probability that this fruit is an
apple.
In real datasets, we test a hypothesis given multiple evidence(feature). So, calculations become
complicated. To simplify the work, the feature independence approach is used to ‘uncouple’ multiple
evidence and treat each as an independent one.
How to run:
Run manage.py
Go to login page.
In this research, diabetic retinopathy method is used to diagnose the Diabetic Retinopathy
(DR). Initially, the dataset images are resized and histogram equalization is applied. Then the key
features from the preprocessed images are extracted using the Histogram of Gradient (HoG). Then
from HOG features model is constructed using navis bayes algorithm.
The dataset is used here is binrushed which consists of 4 classes of diseases. Total numbers of images
are 1285. When we tested with testing test for 4 classes it shows 90.02% of accuracy. To get more
number of disease classes we divided the images into 8 classes. Navis Bayes algorithms showed
overall accuracy of 77.23% even though other algorithms for multiclass classification failed to cross
50%. We also tested various scenarios for login pages, different types of images. Our algorithm
proven better results for most of the cases.
Input:
Training dataset T,
F. (ft, f2, f3,.., In) II value of the predictor variable in testing dataset.
Output:
Step:
Require: X and y loaded with training labeled data, a <= 0 or a <= partially trained SVM
User Acceptance Testing is a critical phase of any project and requires significant participation by the
end user. It also ensures the system meets the functional requirements.
Test Case Description User enters the valid EmailID and password
Password-*******
Test Case Description User should selects the original retina image
Melanoma, the most serious type of skin cancer, develops in the cells (melanocytes) that produce
melanin — the pigment that gives your skin its color. Melanoma can also form in your eyes and,
rarely, inside your body, such as in your nose or throat.
The exact cause of all melanomas isn't clear, but exposure to ultraviolet (UV) radiation from sunlight
or tanning lamps and beds increases your risk of developing melanoma. Limiting your exposure
to UV radiation can help reduce your risk of melanoma.
The risk of melanoma seems to be increasing in people under 40, especially women. Knowing the
warning signs of skin cancer can help ensure that cancerous changes are detected and treated before
the cancer has spread. Melanoma can be treated successfully if it is detected early.
An epidermal nevus (plural: nevi) is an abnormal, noncancerous (benign) patch of skin caused by an
overgrowth of cells in the outermost layer of skin (epidermis). Epidermal nevi are typically seen at
birth or develop in early childhood. Affected individuals have one or more nevi that vary in size.
There are several types of epidermal nevus that are defined in part by the type of epidermal cell
involved. The epidermis is composed primarily of a specific cell type called a keratinocyte. One
group of epidermal nevi, called keratinocytic or nonorganoid epidermal nevi, includes nevi that
involve only keratinocytes. Keratinocytic epidermal nevi are typically found on the torso or limbs.
They can be flat, tan or brown patches of skin or raised, velvety patches. As affected individuals age,
the nevi can become thicker and darker and develop a wart-like (verrucous) appearance. Often,
keratinocytic epidermal nevi follow a pattern on the skin known as the lines of Blaschko. The lines of
Blaschko, which are normally invisible on skin, are thought to follow the paths along which cells
migrate as the skin develops before birth. Keratinocytic epidermal nevi are also known as linear
epidermal nevi or verrucous epidermal nevi, based on characteristics of their appearance.
Basal cell carcinoma is a type of skin cancer. Basal cell carcinoma begins in the basal cells — a type
of cell within the skin that produces new skin cells as old ones die off.
Basal cell carcinoma often appears as a slightly transparent bump on the skin, though it can take
other forms. Basal cell carcinoma occurs most often on areas of the skin that are exposed to the sun,
such as your head and neck.
Most basal cell carcinomas are thought to be caused by long-term exposure to ultraviolet (UV)
radiation from sunlight. Avoiding the sun and using sunscreen may help protect against basal cell
carcinoma.
Detection of skin diseases is avery important step to reduce death rates, disease transmitrions and
development of the skin disease. Clinical procesdures to detect skin diseases are very expensive and
time consuming. Image processing techniques helps to build automated screening system for
dermatology at an initial stage. The extraction of features plays a key role in helping to clacify skin
diseases.
In this reaseach the method of detection was designed by using pre trained SVM abnd navie bayas. In
conclusion, we must not forget that this research has an effective role in the detection skin diseases in
soudhi Arabia because it has very hot weather for the presence oh=f weather: thses indicate =s that
skin diseases are widw spread. The reaserch supports medical efficiency in soudhi Arabia.
Future enhancement
1. A common model should be adopted for the identification of all types of skin disesases
2. Support for muiltilingualism to develop user-freidlyness.
3. To expand the muilti platform capability throuth an introduction to ios compatability.
REFERENCES
[2] KenPernezny, MonicaElliott, Aaron Palmateer and Nikolavranek Guidelines for Identification
and Management of Plant Disease Problems: PartII. Diagnosing Plant Diseases Caused by Fungi,
Bacteria and Viruses UFIFAS Extension.
[3] Anand. H. Kulkarni, AshwinPatil R. K. Applying image processing technique to detect plant
diseases International Journal of Modern Engineering Research (IJMER) Vol.2, Issue.5, Sep-Oct.
2012 pp-3661-3664ISSN: 2249-6645.
[4] Shivkumar Bagde, Swaranjali Patil, Snehal Patil, Poonam Patil Artificial Neural Network
Based Plant LeafDisease Detection International Journal of Computer Science and Mobile
Computing, Vol.4 Issue.4, April-2015, pg.900-905.
[5] Jagadeesh Devdas Pujari, Rajesh Yakkundimath and Abdulmunaf Syedhusain Byadgi Grading
and Classification of Anthracnose Fungal Disease of Fruits based on Statistical Texture Features
International Journal of Advanced Science and Technology Vol.52, March 2013.
[6] Hiteshwari Sabrol, Satish Kumar Recent Studies of Image and Soft Computing Techniques for
Plant Disease Recognition and Classification International Journal of Computer Applications(0975–
8887)Volume 126– No.1,September 2015.
[9] MunirahM.Yusof, RuhayaA.Aziz, and ChewS.Fei The Development of Online Children Skin
Diseases Diagnosis System International Journal of Information and Education Technology, Vol. 3,
No. 2, April2013.
[10] AnalKumar Mittra and Dr.Ranjan Parekh–Automated Detection of Skin Diseases Using Texture
Features.
[11]Carl Louie Aruta, Colinn Razen Calaguas, Jan Kryss Gameng, Marc Venneson Prudentino,
August Anthony ChestelJ. Lubaton Mobile-based Medical Assistance for Diagnosing Different
Types of Skin Diseases Using Case-based Reasoning with Image Processing International Journal of
Conceptions on Computing and InformationTechnologyVol.3, Issue.3,October’2015; ISSN:2345–
9808.
[14] Damanpreet Kaur and Prabhneet Sandhu-Human Skin Texture Analysis using Image Processing
Techniques– International Journal of Science and Research (IJSR), India.
[15] Okuboyejo, D., Olugbara O., Odunaike S (2013).Automating Skin Disease Diagnosis
Using Image Classification.
[18] Opstad Kruse OM, Prats-Montalbán JM, Indahl UG, KvaalK, FerrerA, FutsaetherCM. 2014.
Pixel classification methods for identifying and quantifying leaf surface injury from digital images.
Compute Electron Agric Vol. 108(2014), 155-165. DOI:
http://dx.DOI.org/10.1016/j.compag.2014.07.010.
[22] Monika Jhuria, Ashwani Kumar and Rushik esh Borse, “Image processing for smart farming
detection of disease and fruitgrading, ”IEEE 2nd International Conferenceon Image Information
Processing(ICIIP), Shimla,pp 521-526,2013.
[23] Ganesan P, Priya Chakravarty, Shweta Verma Segmentation Of Natural Color Images In
Hsi Color Space Based On Fcm Clustering. International Journal Of Advanced Research In
Computer Engineering & Technology(Ijarcet) Volume3Issue3, March2014.
[24] ShivRam Dubey, Pushkar Dixit, Nishant Singh, Jay Prakash Gupta. Infected Fruit Part Detection
Using K- Means Clustering Segmentation Technique International Journal Of Artificial Intelligence
And Interactive Multimedia,Vol.2,Nº2.