Report On Brain Tumor - Docx1
Report On Brain Tumor - Docx1
Report On Brain Tumor - Docx1
Bachelor of Technology
in
by
26 November, 2019
DECLARATION
We certify that
(a) The work contained in this report has been done by us under the guidance
of our supervisor.
(b) The work has not been submitted to any other Institute for any degree or
diploma.
(c) We have conformed to the norms and guidelines given in the Ethical Code
of Conduct of the Institute.
(d) Whenever we have used materials (data, theoretical analysis, gures, and text)
from other sources, we have given due credit to them by citing them in the text of
the thesis and giving their details in the references. Further, we have taken
CERTIFICATE
This is to certify that the project report entitled “Brain Tumor Detection from
Magnetic Resonance Imaging using YOLOV3" submitted by Manika Debnath,
Bhagyasri Bora, Tilok Pegu (Roll No. GAU-C-17/079, GAU-C-17/074, GAU-C-
16/051) to Central Institute of Technology, Kokrajhar towards partial fulfillment of
requirements for the award of degree of Bachelor of Technology in Computer
Science and Engineering is a record of bona fide work carried out by them under
my supervision and guidance during 5th Semester, 2019.
BONAFIDE CERTIFICATE
This is to certify that the project titled “Brain Tumor Detection from Magnetic
Resonance Imaging using YOLO V3" is a bona fide record of the work done by
Manika Debnath(GAU-C-17/079), Bhagyasri Bora(GAU-C-17/074) and Tilok
Pegu(GAU-C-16/051) towards partial fulfillment of requirements for the award of
degree of Bachelor of Technology in Computer Science and Engineering of the
CENTRAL INSTITUTE OF TECHNOLOGY, KOKRAJHAR, during the year
2019.
Examiner
Abstract
Brain Tumor is a brain disease, causing pain and disability in older people,
and it can be characterized by progressive degradation of the brain
membrane. Tumor can affect the brain from functioning as it should.
Detection and progress monitoring of Brain Tumor can be done by
measuring pre-structural and structural changes associated with the
tissues. This project provides a step towards a solution by fully automating
the extraction of tumor in brain MRIs using YOLO to reduce the data
complexity. These results show an accuracy of 98% detection of tumor
when compared to the work of previous studies. This project concludes
with a proof of concept estimating the Tumor of individuals using the
generated annotations. The results with more weights produce more
accuracy.
Acknowledgements
We would like to express our gratitude and appreciation to all those who gave us the opportunity
to complete this report. A special thanks to our project coordinator, Mr. Ranjan Maity ,Asst.
Professor,CSE Department,CIT Kokrajhar, whose help, stimulating suggestions and
encouragement, helped us to coordinate our project especially in writing this report.
We would also like to acknowledge with much appreciation the crucial role of the sta of
Computer Science and Engineering Lab, who gave the permission to use all required machinery
and the necessary material to complete the project.
Last but not least, many thanks go to the head of the project, Mr. Mithun Karmakar, Asst.
Professor,CSE Department,CIT Kokrajhar who have given his full e ort in guiding the team in
achieving the goal as well as his encouragement to maintain our progress in track. We would
like to appreciate the guidance given by other supervisor as well as the panels especially in our
project presentation that has improved our presentation skills by their comment and tips.
Manika Debnath
Bhagyasri Bora
Tilok Pegu
Contents
1 Introduction 1
1.1 Field and Context . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Focus of the project . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 Medical Knowledge . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Magnetic Resonance Imaging . . . . . . . . . . . . . . 5
2.1.2 Brain Tumor . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Computer Science . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . 7
2.2.2 Machine Learning . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . 8
2.2.4 Tiny YOLO V3 . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Methods 11
3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Training, Validation, and Testing Sets . . . . . . . . . 12
3.3.3 2D and 3D data . . . . . . . . . . . . . . . . . . . . . 14
3.3.4 Annotation . . . . . . . . . . . . . . . . . . . 15
3.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1 Channels, Growth Rate, and Depth . . . . . . . . . . 16
3.4.2 Activation Functions . . . . . . . . . . . . . . . . . . . 17
3.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5.1 Metrics and Loss Functions . . . . . . . . . . . . . . . 18
3.5.2 Batch Size . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.4 Learning Rate Policy . . . . . . . . . . . . . . . . . . . 20
3.5.5 Early Stopping . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Results 22
4.2 Visual Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Discussion 27
6 Conclusion 27
1 Introduction
Neural networks are known to be feature selectors, meaning that they will
learn to extract information that is relevant to the task [9]. This assumes that
the size of the data set and the complexity of the problem enable the network
to find correlating features. Based on a series of tests, it was not possible to
create a stable algorithm that would predict the OA of an individual based on
their knee MRI. A brain tumor detection is required to reduce the MR images
to only relevant information in the frame. Since methods like region growing
lead to insufficient results due to similar intensities of other tissue, time-
consuming manual modifications are needed.
1.3 Focus of this Project
The goal of this project is to develop a fully automated non-contrast MRI
application for detection of brain tumor. The proposed approach is based on
Tiny YOLO V3 to achieve a robust and accurate segmentation of even highly
pathological brain structures. In this project we used Deep learning approach
based on Darknet-53 Architecture and for implementing and performing this
approach using Keras and Tenser-Flow(google colab) framework in python
Language. YOLO will be the base tool for this study because it’s the one of
2 Background
This chapter provides an overview of prerequisites that led to the main task
of this project. It features a section for the necessary medical knowledge as
well as a hierarchical derivation to the appropriate branch of computer
science.
This section will briefly explain how the imaging technology works that was
used to create the data set, as well as provide basic information about the
structure and growing process of growth plates in the human body.
Magnetic Resonance Imaging (MRI) uses magnetic fields and radio frequen
cies to probe tissue structure. In contrast to X-ray and CT scan which require
the exposure of ionizing radiation, MRI is a noninvasive imaging method. As
such it has become an essential diagnostic imaging modality in the medical
field [13].
96% of the human body is made up of hydrogen, oxygen, carbon, and nitrogen, all of
a spin. Due to the laws of electromagnetic induction, the motion of unbalanced charge produces
a magnetic field around itself. Hydrogen is the element used in MRI because its solitary
proton in the nucleus gives it a relatively large magnetic moment [13].
The positively charged hydrogen particles in water produce a signal when exposed to a
strong external magnetic field. This field is supplied by a magnet in the MRI machine
aligning the magnetic field of the hydrogen atoms to its own. Gradient coils are used to
cause a linear change in the strength of this field. By alternating the current of these
coils on the x, y and z axes, it is possible to calculate a three-dimensional image of the
tissue [13].
The primary task of this project revolves around a particular branch of com-
puter science problems. This section provides a path to where most of the
work will take place. Figure 1 visualizes the hierarchical relationship of the
following subsections.
2
.
2
.
1
F
i
g
2
:
H
Fig2:hirerachical relationship of relevant topics
Artificial Intelligence
Taking a complex problem like playing chess and continuing to break it into
smaller problems, until they can be solved with known logic. While it was
effective for certain tasks, fuzzy problems like image classification, speech
recognition or language translation were difficult to tackle. Over the years a
new approach was found, which today is referred to as machine learning
(ML).
It works by feeding input and output data into a pipeline, which will learn to transform one into the
other. With the advantage that no explicit pro-gramming is needed to generate the rules, comes
the disadvantage that prior input and output data is required for the initial learning process
The origin of the term deep learning is twofold. On the one hand, it refers to
the fact that ANNs can learn "deep" hierarchical information from data. On
the other, it describes that they show multiple layers of depth within their
architecture. ANNs are as old as machine learning itself, but only gained a lot
of their popularity in recent years [17].
Figure 2.2.2: An ANN with 3 input nodes, 2 hidden layers with 4 nodes each
and 2 output nodes
These dense layers inside artificial neural networks contain entities called
nodes that are jointly linked with one another. Every node in a dense layer is
connected to each node in the previous and following layer. This structure
gives ANNs the capacity to approximate complex mathematical functions and
learn global representations in the data. These nodes, also called
parameters, can range to hundreds of millions depending on the task [19].
2.2.4 You Only Look Once-
The purpose of this chapter is twofold. It will discuss the design and process-
ing decisions that were made for the development of this project, while also
giving critical insight into how these applied technologies work. The sections
for setup and data set describe the working environment for this study. Pre-
processing, architecture, training and post processing show in chronological
order how the development was addressed.
3.1 Setup
The processing of medical image data needed a library that could handle
these formats. PyDicom, SikitLearn is a Python libraries written in C++. It
includes many tools for image processing and is especially popular in the
medical field. Other libraries were also used for smaller tasks.
The data set is a collection MRIs of Human Brain. The number of available
samples grew during the project. For most of the development time, it
included 253 samples that came multiple MRIs sources. All images were
provided in the JPEG file format. It features lossless compression, as well as
technical Meta information about each image. Data featured a sagittal
perspective.
3.3 Preprocessing
The data was split into two subsets of which each was used for a different
purpose (training and testing). The training set is commonly the largest of the
two and contains the data that is applied to the actual learning process. It’s
the only portion of the data the network will draw direct conclusions from.
The second subset is referred to as the testing data, it’s only used once in
the very end, to give a final score. The idea is that by building a model based
on the validation results a certain amount of information bleed occurs, where
the network will implicitly learn from the validation data. In order to prevent
biased results, the testing data is used as a last performance reference.
• Training Set: 90% of the data
For creating a ground truth bounding boxes we perform annotation over our
dataset
Figure 3.3.2: Examples of slices with their ground truth bounding box
3.4 Architecture-
The YOLO design enables end-to-end training and real-time speeds while
maintaining high average precision. Our system divides the input image into
an S × S grid. If the center of an object falls into a grid cell, that grid cell is
responsible for detecting that object. Each grid cell predicts B bounding
boxes and confidence scores for those boxes. These confidence scores
reflect how confident the model is that the box contains an object and also
how accurate it thinks the box is that it predicts. Formally we define
confidence as Pr(Object) ∗ IOU truth predict . If no object exists in that cell,
the confidence scores should be zero. Otherwise we want the confidence
score to equal the intersection over union (IOU) between the predicted box
and the ground truth. Each bounding box consists of 5 predictions: x, y, w, h,
and confidence. The (x, y) coordinates represent the center of the box
relative to the bounds of the grid cell. The width and height are predicted
relative to the whole image. Finally the confidence prediction represents the
IOU between the predicted box and any ground truth box. Each grid cell also
predicts C conditional class probabilities, Pr(Classi |Object). These
probabilities are conditioned on the grid cell containing an object. We only
predict one set of class probabilities per grid cell, regardless of the number of
boxes B. At test time we multiply the conditional class probabilities and the
individual box confidence predictions, Pr(Classi|Object) ∗ Pr(Object) ∗
IOUtruth pred = Pr(Classi) ∗ IOUtruth pred (1) which gives us class-specific
confidence scores for each box. These scores encode both the probability of
that class appearing in the box and how well the predicted box fits the object.
For evaluating YOLO on PASCAL VOC, we use S = 7, B = 2. PASCAL VOC
has 20 labelled classes so C = 20. Our final prediction is a 7 × 7 × 30 tensor
the number of boxes B. At test time we multiply the conditional class probabilities and
the individual box confidence predictions, Pr(Classi|Object) ∗ Pr(Object) ∗ IOUtruth pred
= Pr(Classi) ∗ IOUtruth pred (1) which gives us class-specific confidence scores for
each box. These scores encode both the probability of that class appearing in the box
and how well the
predicted box.
f
i
t
s
t
h
e
F
i
Figure 3.4.1: The Architecture. Our detection network has 24 convolutional
layers followed by 2 fully connected layers. Alternating 1 × 1 convolutional
layers reduce the features space from preceding layers. We pre-train the
convolutional layers on the ImageNet classification task at half the resolution
(224 × 224 input image) and then double the resolution for detection.
Fast YOLO fewer convolutional layers (9 instead of 24) and fewer filters
in those layers. The network pipeline is summarized like below.
Therefore, we can see that, the input image goes through the network once
and then objects can be detected. And we can have end-to-end learning.
The number of parameters in a neural network has a high correlation with its
learning capacity. By adding more nodes that can be adjusted during training,
the model can approximate a more complex function that transforms the
input into the output. The downside is that a larger parameter count will also
increase the possibility of overfitting the data. A convention in the field of
CNNs is to gradually increase the number of channels, while the spatial
resolution is reduced due to the use of MaxPooling [21][25][26][27]. U-Net
also shows this behavior on the left side of its architecture.
A test was set up that compared the original U-Net against five smaller
variants. They varied in the total number of parameters and how the number
of channels changed from one layer to the next.
Training U-Net was very slow in two regards. Each training step lasted six
times longer compared to the smallest model, while it also took 25 times
more training steps to reach the same results. Since it was also overfitting to
a significant amount, it was not trained until the end. The second largest
model E also overfitted on the training data and never achieved comparable
results to the other four models.
Model D started with eight channels which were then increased by a factor of 2,
whereas B started out larger but only increased the channels by a rate of 1.5. Although
being the smaller model, B showed a higher accuracy in the end. It was concluded that
the initial number of output channels has a larger effect on the result than the growth
rate.
This theory was supported by the results from A and C, both of which kept
their initial number of channels throughout the network. These two showed
the highest scores in comparison to the other models while maintaining their
relatively small size. Since their results were identical, model A was kept
because of its faster training speed. It was interesting to see that the smallest
model performed best across the candidates.
3.5 Training
The training is where the actual learning takes place, and its process is
inspired by how humans improve their performance on tasks. When opposed
to a difficult question the first step would be a mere guess of what the answer
might be. In a second phase, this information is compared to the right answer
and processed by the brain accordingly.
Similarly, the training of a neural network consists of two parts. In the first
step, the input is fed forward through the network, and an answer prediction
is made. In the very beginning, this is just a random guess because ANNs
are initialized with random values.
The prediction is then compared to the right answer, and the error is cal-
culated. This metric can be fed back through the network to optimize the
parameters in the model. These two steps are called forward and backward
propagation.
Examining this equation you can see that Intersection over Union is simply a
ratio. In the numerator we compute the area of overlap between the
predicted
Bounding box and the ground-truth bounding box. The denominator is
the area of union, or more simply, the area encompassed by both the
predicted bounding box and the ground-truth bounding box.
Dividing the area of overlap by the area of union yields our final score — the
Intersection over Union.
The loss function used in our model is Binary cross-entropy loss function. It’s a multi label
classification. Also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus
a Cross-Entropy loss. It is independent for each vector component (class), meaning that the
loss computed for every CNN output vector component is not affected by other component
values. That’s why it is used for multi-label classification, were the insight of an element
belonging to a certain class should not influence the decision for another class. It’s called Binary
Cross-Entropy Loss because it sets up a binary classification problem
between C′=2C′=2 classes for every class in CC, as explained above. So when using this Loss,
the formulation of Cross Entroypy Loss for binary problems is often used:
3.5.3 Batch Size
The number of random samples per training step is referred to as the batch
size. In the past, it was believed that larger batches led to something called
the generalization gap [39], where the accuracy of a model would drop if it
was trained on unusually large batches. Recent work [40] suggests other
reasons for this decrease in accuracy. While common batch sizes range from
32 to 256, Goyal et al. showed accurate results using 8192 images per batch
when training a model on ImageNet [41].
One full iteration over the training samples is referred to as an epoch, and the
learning rate policy describes how the learning rate is changed from one
epoch to another. With the introduction of adaptive optimizers like Adam,
there has been a lower emphasize on this topic because the learning rate is
modified during training. Even though this reduces the number of possible
defects, training time can often be saved with the right initial learning rate.
Ten epochs were run at different learning rates to compare initial results and
to examine the point at which the model wouldn’t converge at all. 0.002 was
the highest rate at which the model started training, but 0.001 resulted in the
best score. Adding a decay that reduces the learning rate manually over time
did not improve the results with the use of Adam.
Neural networks will continuously minimize the loss on the training set. This
result needs to be validated on data the network hasn’t seen before. At a
certain point during training, the performance on the validation set will start to
decrease because the model is overfitting on the training data. The number
of iterations to reach this point is dependent on many IoU, as well as the
random values the network has been initialized with. As such, it’s difficult to
calculate how long the training will need to reach its peak.
Early stopping is a simple technique that will end the training process as
soon as the model stops improving on the validation data. To do this, a
patience value is defined how long the network should continue training after
the score has stopped increasing. This is important because not every epoch
will lead to a new best score on the validation data.
4 Results
The results in this chapter are based on data the network has not been
trained with. It was also made sure that there was no overlap in images of
people that were recorded multiple times. The network architecture was
developed using yolo V3 and darknet-53 architecture.
Epoch
Data Epoch 1 Epoch 50 Epoch 100 150 Epoch 250
Training Set 0.001 0.493 0.766 0.923 0.981
Test Set - - - - 0.980
The 98% mark is reached on the validation data, but only on the final
evaluation of the test set. Towards the end, the performance of the training
data pulls slightly ahead of the validation results.
The visual evaluation will focus on the results of the combined network since
its performance is on par with the one channel model while offering more
information about the associated bones.
Network is tested tested after some iteration we stopped our first iteration at
1000 iteration and then iterated till 3000 iteration.
Figure 4.1.0: Input, prediction, this was the result after1000 iteration
Figure 4.1.1: Input, prediction, prediction after 3000th iteration of the network.
In 1000th iteration we are getting a pretty good result, we iterated till 3000 iteration but
no major difference was found visually but the confidence level increased by a
significant amount.
5 Discussion
The greatest limitation of this study was time. Three weeks were spent on the
programming aspect and trying out different sets of parameters. Each new
idea came with several hours of training to verify the performance. As such,
only popular techniques in the deep learning field were applied. The training
of the model takes at least 8 hours for a good prediction so it was a limitation
as sometimes the weights we trained predicted wrong results. Another three
weeks were invested in writing this project to complete everything by the
deadline. Spending more time on data analysis, fine tuning parameters and
trying entirely new architectures might open up new possibilities.
6 Conclusion
References
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving
Deep into Rectifiers: Surpassing Human-Level Performance on
ImageNet Classification. feb 2015.
[2] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad
Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus
Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu,
Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto
Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff
Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg
Corrado, Macduff Hughes, and Jeffrey Dean. Google’s Neural Machine
Translation System: Bridging the Gap between Human and Machine
Translation. sep 2016.
[6] World Health Organization. Ionizing radiation, health effects and pro-
tective measures, 2016.
[7] Dennis Säring, Markus Mauer, and Eilin Jopp. Klassifikation des Ver-
schlussgrades der Epiphyse der proximalen Tibia zur Altersbestimmung.
pages 60–65. Springer, Berlin, Heidelberg, 2014.
[8] Eilin. Jopp. Methoden zur Alters- und Geschlechtsbestimmung auf dem
Pruefstand - eine rechtsmedizinische empirische Studie. Kovac, 2007.
[11] Erik B Dam, Martin Lillholm, Joselene Marques, and Mads Nielsen.
Automatic segmentation of high-and low-field knee MRIs using knee
image quantification with data from the osteoarthritis initiative.
[14] Gerhard Aumüller, Gabriela Aust, Andreas Doll, Jürgen Engele, Joachim
Kirsch, Siegfried Mense, and Laurenz Wurzinger. Anatomie. Number 2.
2010.
[15] David E. Attarian. Your Bones: What are growth plates?, 2013.
[16] Markus Auf der Mauer. Automated Quantification of the Growth Plate of
the Proximal Tibia for the Age Assessment in 3D MR Images Using a
Fuzzy-Logic Classification Approach. PhD thesis, 2015.
[25] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep Resid-ual Learning for Image Recognition. dec 2015.
[27] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convo-
lutional Networks for Biomedical Image Segmentation. Miccai, pages
234–241, may 2015.
[29] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and
Jiaya Jia. Pyramid Scene Parsing Network. dec 2016.
[30] Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. RefineNet:
Multi-Path Refinement Networks for High-Resolution Semantic Segmen-
tation. nov 2016.
[31] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Mur-
phy, and Alan L. Yuille. DeepLab: Semantic Image Segmentation with
Deep Convolutional Nets, Atrous Convolution, and Fully Connected
CRFs. jun 2016.
[32] Vladimir Nekrasov, Janghoon Ju, and Jaesik Choi. Global Deconvolu-
tional Networks for Semantic Segmentation. feb 2016.
[33] Fausto Milletari, Nassir Navab, and Seyed Ahmad Ahmadi. V-Net: Fully
convolutional neural networks for volumetric medical image segmenta-
tion. In Proceedings - 2016 4th International Conference on 3D Vision,
3DV 2016, pages 565–571, jun 2016.
[35] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and
Ruslan Salakhutdinov. Dropout: A Simple Way to Prevent Neural
Networks from Overfitting. Journal of Machine Learning Research,
15:1929–1958, 2014.
[36] Vinod Nair and Geoffrey E Hinton. Rectified Linear Units Improve
Restricted Boltzmann Machines.
[37] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and
Accurate Deep Network Learning by Exponential Linear Units (ELUs).
nov 2015.
[40] Elad Hoffer, Itay Hubara, and Daniel Soudry. Train longer, generalize
better: closing the generalization gap in large batch training of neural
networks. may 2017.
[41] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz
Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming
He. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. jun
2017.