IJERT Developing A Web Based System For
IJERT Developing A Web Based System For
Abstract- In today’s world cancer is the most common diseases algorithms and their combination using ensemble approach
which lead to greatest number of death. Cancer is not one that are suitable for direct interpretability of their results.
disease; it is a group of more than 100 different and We are using an XGboost classifier approach to compare
distinctive diseases. Cancer can involve in any tissue of the other four classification algorithms and done the analysis
body and have many different forms and in each body part.
Breast Cancer is a grim disease and it is the only type of
of each classifiers accuracy of the best fit for the prediction
cancer that is widespread among women worldwide. As the of breast cancer.
diagnosis of this disease manually takes long hours and the
lesser availability of systems, there is a need to develop the 2. PROBLEM STATEMENT
automatic diagnosis system for early detection of cancer. So in To identify which machine learning classifier gives the best
this project we are developing a web based diagnosis system accuracy. To count the number of patients having benign
for which we have done the comparative study of the and malignant and also identify the type of tumor.
supervised machine learning classifiers to get to know which
classifier is giving the best accuracy. For that we have taken
dataset from the Wisconsin breast cancer database (WBCD) 3. PROPOSED METHODOLOGY
which is the benchmark database for comparing the results We acquire the breast cancer dataset of Wisconsin Breast
through different algorithms. In which we will use following Cancer diagnosis dataset and used jupyter notebook and
classification techniques of machine learning like Support Anaconda Spyder as the platform for the purpose of coding
Vector Machine (SVM), K-Nearest Neighbor (KNN), Random and get the Prediction UI (user interface) output from the
Forest (RF), Adaboost Classifier and XGboost Classifier for flask as in local server. Our methodology involves use of
the classification of benign and malignant tumor in which the
supervised learning algorithms and classification technique
machine is learned from the past data and can predict the
category of new input. like Support Vector Classifier, KNN, Random Forest,
Adaboost and Xgboost Classifier, with Dimensionality
Keywords- WBCD, Support Vector Machine, K-Nearest Reduction technique.
Neighbor, Random Forest, Adaboost Classifier and XGboost
Classifier. 3.1 Data Manipulation
The data that we have it is in dictionary format and in
1. INTRODUCTION sklearn we call it ‘Bunch’. We have the keys of the dataset
Breast cancer has become one of the most common i.e. (‘data’, ‘target’, ‘target_names’, ‘DESCR’,
diseases among women that lead to death. Breast cancer ‘feature_names’, ‘filename’ ) and the values of this are in
can be diagnosed by classifying tumors. There are two numeric format i.e. in 2d array format. Now the ‘Target’
different types of tumors i.e. malignant and benign tumors. means the patient who are having the breast cancer, the
Doctors need a reliable diagnosis procedure to distinguish tumor is benign or malignant. Here malignant means the
between these tumors. But generally it is very difficult to patient is having cancer and benign means the patient
distinguish the tumors even by the experts. So automation doesn’t have the cancer.
of diagnostic system is needed for diagnosing. As the most
prevalent cancer in women, breast cancer has always had a
high incidence rate and mortality rate. According to the
latest cancer statistics, breast cancer alone is expected to In this dataset we have 569 numbers of instances with 30
account for 25% of all new cancer diagnoses and 15% of features or attributes. As we know the features are in
all cancer deaths among women worldwide. In case of any numeric format, so our 30 features are with the numeric
sign or symptom, usually people visit doctor immediately, values of each of the instances.
who may refer to an oncologist, if required. The oncologist
can diagnose breast cancer by: Undertaking thorough the 3.2 DataFrame
patient’s medical history, examination of both the breasts So the keys and values that we have, we combine the ‘data’
and also check for swelling or hardening of any lymph and ‘target’ to make the dataframe, it is because without
nodes in the armpit. Here in this project, we have used the dataframe we cannot apply the machine learning algorithm
Wisconsin Breast Cancer Dataset (WBCD) of fine needle and by using the ‘feature_name’ and ‘target’ we have given
aspiration biopsy method and with that of the dataset we the column name and then we store that into the file, so that
have invoked the machine learning algorithms to predict it can help us in future purpose. Now we have checked our
whether the patient is having breast cancer or not. This dataset’s information and there are no null values, all the
paper compares performance of five classification
So here the count of malignant tumor instances are of 220- 3.4.1 Split DataFrame in Train and Test
230 and the benign tumor instances is high rather than In our project 75% data is trained data and 25% data is test
malignant. data.
features to the same level of magnitudes. This can be 3.5.1. (III) Random Forest Classifier
achieved by scaling. Random Forest classifier is a learning method that operates
by constructing multiple decision trees and the final
3.5 Model Selection decision is made based on the majority of the trees and is
This is the most important phase where machine learning chosen by the random forest. It is a tree-shaped diagram
algorithm selection is done for the developing a system used to determine a course of action. Each branch of the
where Data Scientists use various types of Machine tree represents a possible decision, instance, or reaction.
Learning algorithms which can be classified as: supervised Using of Random Forest Algorithm is one of the main
learning and unsupervised learning. For this breast cancer advantages is that it reduces the risk of over fitting and the
Prediction System, we only need Supervised Learning. required training time. Additionally, it also offers a high
level of accuracy.
3.5.1 Supervised Learning It runs efficiently in large databases and produces almost
The supervised learning algorithm learns from the training accurate predictions by approximating missing data.
data, which helps you to predict the outcomes for
unpredicted data. It helps you to optimize performance
criteria using experience also helps you to solve various
types of real-world computation problems and such
classifiers that are used mostly briefly explained below.
used a more regularized model formalization, to control Accuracy = (TP + TN) / (TP + TN + FP + FN) = (46 + 66)
over-fitting, which gives it better performance. / (46 + 66 + 0 + 2) *100 = 98.24
4. CONFUSION MATRIX
It is a summary of prediction results on a classification
problem with the number of correct and incorrect
predictions that are summarized with count values and
broken down by each class. This is the key to the confusion
matrix. It shows the ways in which your classification
model get confused when it make predictions. It gives
intuition not only into the errors being made by a classifier
but more importantly the types of errors that are being
made.
REFERENCES
[1] Ch. Shravya, K. Pravalika, Shaik Subhani, “Prediction of Breast
Cancer Using Supervised Machine Learning Techniques
International”, Journal of Innovative Technology and Exploring
Engineering (IJITEE) Volume-8 Issue-6, April 2019.
[2] Mamta Jadhav[1], Zeel Thakkar[2], Prof. Pramila M. Chawan[3],
“Breast Cancer Prediction using Supervised Machine Learning
Algorithms”, International Research Journal of Engineering and
Technology (IRJET)Volume: 06 Issue: 10 Oct 2019.
[3] R. Chtihrakkannan, P. Kavitha, T. Mangayarkarasi, R.
Karthikeyan, “Breast Cancer Detection using Machine Learning”,
International Journal of Innovative Technology and Exploring
Engineering (IJITEE) Volume-8 Issue-11, September 2019.
[4] Mandeep Rana[1], Pooja Chandorkar[2], Alishiba Dsouza[3],
Nikahat Kazi[4], “Breast Cancer Diagnosis and Recurrence
Prediction using Machine Learning techniques”, IJRET:
International Journal of Research in Engineering and Technology
Volume: 04 Issue: 04 Apr-2015.
[5] Varsha J. Gaikwad, “Detection of Breast Cancer in Mammogram
using Support Vector Machine”, International Journal of
Scientific Engineering and Research (IJSER) Volume 3 Issue 2,
February 2015.
[6] Susmitha Uddaraju[1], M. R. Narasingarao[2], “A Survey of
Machine Learning Techniques Applied for Breast Cancer
Prediction”, International Journal of Pure and Applied
Mathematics (IJPAM) Volume 117 No. 19 2017.
[7] Rajkamal kaur Grewal Babita Pandey, “Two Level Diagnosis of
Breast Cancer Using Data Mining”, International Journal of
Computer Applications (IJCA) Volume 89 – No 18, March 2014.
[8] Priyanka Gupta, Prof. Shalini L, “Analysis of Machine Learning
Techniques for Breast Cancer Prediction”, International Journal
Of Engineering And Computer Science (IJECS) Volume 7 Issue 5
May 2018.
[9] Ravi Aavula, R. Bhramaramba, “An Extensible Breast Cancer
Prognosis Framework for Predicting Susceptibility, Recurrence
and Survivability”, International Journal of Engineering and
Advanced Technology (IJEAT) Volume-8 Issue-5, June 2019.
[10] Dania Abed Aljawad1, Ebtesam Alqahtani2, Ghaidaa AL-
Kuhaili3, Nada Qamhan4, Noof Alghamdi5, Saleh Alrashed6,
Jamal Alhiyafi7, Sunday O. Olatunji8, “Breast Cancer Surgery
Survivability Prediction Using Bayesian Network and Support
Vector Machines”, 978-1-4673-8765-1/17/$31.00 ©2017 IEEE
[11] Mehrdad J. Gangeh, Senior Member, IEEE, Simon Liu, Hadi
Tadayyon, and Gregory J. Czarnota, “Computer Aided
Theragnosis Based on Tumour Volumetric Information in Breast
Cancer”, DOI 10.1109/TUFFC.2018.2839714, IEEE
[12] Madhuri Gupta1, Bharat Gupta2, “A Comparative Study of
Breast Cancer Diagnosis Using Supervised Machine Learning
Techniques”, 978-1-5386-3452-3/18/$31.00 ©2018 IEEE
[13] Afsaneh Jalalian, Babak Karasfi, “Machine Learning Techniques
for Challenging Tumor Detection and Classification in Breast
Cancer”, 978-1-7281-2842-9/18/$31.00 ©2018 IEEE
[14] U. Karthik Kumar1, M.B. Sai Nikhil2 and K. Sumangali3,
“Prediction of Breast Cancer using Voting Classifier Technique”,
978-1-5090-5905-8/17/$31.00 ©2017 IEEE
[15] Xingyui Li1 (Member, IEEE), Marko Radulovic2, Ksenija
Kanjer2, and Konstantinos N. Plataniotis1, “Discriminative
Pattern Mining for Breast Cancer Histopathology Image
Classification via Fully Convolutional Auto-encoder “, (Fellow,
IEEE) DOI 10.1109/ACCESS.2019.2904245, IEEE Access