0% found this document useful (0 votes)
71 views

Dissertation Thesis

Uploaded by

Umesh Pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Dissertation Thesis

Uploaded by

Umesh Pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Carcinomas Cancerous Cells

Identification in Chest region

A dissertation submitted
In Partial Fulfillment of the Requirement for the Degree of

MASTER OF TECHNOLOGY
In
COMPUTER SCIENCE & ENGINEERING
By

Urvashi Shri
(Roll No. 2019023120)

Under the supervision of


Mr. Rohit Kumar Tiwari
Assistant Professor

Department of Computer Science and Engineering


Madan Mohan Malaviya University of Technology, Gorakhpur
273016(U.P.)-India
June, 2021
© M. M. M. University of Technology, Gorakhpur, (U.P.) – 273010, INDIA
ALL RIGHTS RESERVED
iv
v

Certificate

Certified that Ms. Urvashi shri has carried out the research work presented in this thesis
entitled “Carcinomas cancerous cells Identification in chest region” for the award of
Master of Technology from Madan Mohan Malaviya University of Technology, Gorakhpur
under my supervision. The dissertation embodies result of original work and studies carried
out by Student herself and the contents of the dissertation do not form the basis for the award
of any other degree to the candidate or to anybody else.

Mr. Rohit Kumar Tiwari

(Assistant Professor)

Department of Computer Science & Engineering

M. M. M. University of Technology, Gorakhpur


vi

Candidate’s Declaration

I declare that this written submission is based on my own work and where others' ideas or
words have been included, I have adequately cited and referenced the original sources. I have
read and followed the guidelines provided by the university in writing the report. The work
has not been submitted to any other Institution for other degree/diploma/certificate in this
university or any other University of India or abroad.

Urvashi Shri
Roll No: 2019023120
Department of Computer Sc. and Engineering
Date:
vii
Approval Sheet

This dissertation entitled “Carcinomas Cancerous Cells Identification in Chest region”


by “ Urvashi Shri” is approved for the degree of Master of Technology.

Examiner

Supervisor
Mr. Rohit Kumar Tiwari

Head of Department

Prof. Rakesh Kumar

Dean, Research & Development

Prof.
viii
Acknowledgement

During the years as a student at the department of Computer Science & Engineering, I Have
had plenty of time and opportunities to become indebted to many people. Even so, perhaps
the greatest gratitude of all is of a more abstract nature; atmosphere. Being able to go to
work with an easy mind, knowing that positive people and encouraging minds awaits you,
had been invaluable to me. For this, I am indebted to everyone at the department.

I express my deep sense of gratitude and indebtedness to Mr. Rohit Kumar Tiwari, supervisor
for his valuable advice, constant encouragement and constructive criticism during the study
and during the preparation of this report.

I wish to thank my parents who have been always a source of inspiration for their never-
ending support and love throughout completion of the report and I am also thankful to my
friends who have helped in the successful completion of the report.

Urvashi Shri
xix

List of Figures

Fig 3.1 KNN Algorithm……………………………………………………………………..36

Fig 3.2 Steps of K-nearest neighbor classification……………………………………….....30

Fig 3.3 Image Classification .................................................................................................40

Fig 3.4 Proposed Modules.....................................................................................................40

Fig 3.5 Example of abnormal CXR’S .................................................................................41

Fig 4.1 1 Stage of Lung Cancer image .................................................................................44

Fig 4.2 2 Stage of Lung Cancer image .................................................................................44

Fig 4.3 3 Stage of Lung Cancer image .................................................................................45

Fig 4.4 Log Gabor Transfer Function .................................................................................46


xx
List of Table

Table 3.6 Performance analysis of EKNN method…………………………………………...43

Table 4.1 Tools and system configuration............................................................................... 44

Table 4.3 Comparison of classifier accuracy ........................................................................... 47

Table 4.4 Threshold value of lung cancer Nodules based on size in TNM staging standard

................................................................................................................................................ ...47
xxi
List of Abbreviations

Abbreviation Description

DIP Digital image processing


HDRI High Dynamic Range Imaging

SVD Singular value decomposition


ALCL Anaplastic large-cell lymph

BX Biopsy

Ca Cancer; Carcinoma
CXR Chest X-Ray

DLCL Diffuse large-cell Lymphoma


LN Lymph Node

Α Scaling factor
LL Approximation sub band

HOG Histogram of Oriented

Gradient
HIST Histogram

X-rays X-radiation
S Diagonal matrix

U Left singular matrix

V Right singular matrix


T Transpose

FIG Figure
PSNR Peak signal to noise ratio

CT Computed tomography

MSE Mean square error

RMSE Root mean square error


xxii
Abstract

In this modern era of digitization, progressed internet technology and evolved high speed
networks, the need of the hour is Content protection. The technique Digital image processing
is an extremely noticeable field for the research work since its various techniques are used as
a part of a wide range of applications like human system interface, medical representation,
image up gradation, law implementation, and digital watermarking for security purposes.
Lung cancer is a terrible disease. Lung cancer is a condition that causes cells to divide
uncontrollably into the lungs. It causes tumor growth which reduces a person's ability to
breathe. Overgrowth of cells leads to tumor growth and the harmful effects of cancer.
Doctors usually diagnose two lung cancer types, small cell and non-small cell, depending on
how they appear under the microscope. A person is more likely to have non-small cell lung
cancer than a small cell. While anyone can develop lung cancer, the possibility of exposure
to cigarette smoking and smoking may increase the likelihood that a person will experience
the condition. Challenges for predictions and segmentation raise the need of using learning
techniques. Current models initially perform image segmentation in all images. This
consumes more time since it differentiate both normal and abnormal images.
xxiii

Table of Content

Certificate .................................................................................................................................v
Candidate’s Declaration ........................................................................................................ vii
Approval Sheet ....................................................................................................................... ix
Acknowledgement .................................................................................................................. xi
List of Figures ...................................................................................................................... xiii
List of Table ...........................................................................................................................xv
List of Abbreviations........................................................................................................... xvii
Abstract .................................................................................................................................xix
Table of Content ....................................................................................................................xxi

CHAPTER 1 ..............................................................................................................................1

INTRODUCTION .....................................................................................................................1

1.1 Overview ....................................................................................................... ………….1

1.2 Motivation ..................................................................................................... ………….2

1.3 Objectives of Dissertation Work…….. ......................................................... ………….3

1.4 Organization of Dissertation .......................................................................... ………….3

CHAPTER 2 .............................................................................................................................29

LITERATURE REVIEW

2.1 Machine learning algorithm ........................................................................................... 29

CHAPTER 3 ...............................................................................................................................36

PROPOSED WORK
24

CHAPTER 4 .............................................................................................................................47

RESULTS AND DISCUSSION

4.1 Results..................................................................................................................... 47

CHAPTER 5 .............................................................................................................................48

CONCLUSION & FUTURE SCOPE

5.1 Conclusion .............................................................................................................. 48

5.2 Future Scope ........................................................................................................... 48

CURRICULUM-VITAE ..........................................................................................................49
25
1
INTRODUCTION

1.1 Overview

Digital image processing plays a major role in lung cancer detection. Lung disease mainly
the breathing gets affected, here are some common forms of lung diseases. Moreover, there
are many types of cancer such that Acute bronchitis, asthma, Chronic Obstructive
Pulmonary Disease (COPD), breast cancer, chronic bronchitis, Emphysema, Acute
respiratory distress syndrome (ARDS) and Lung cancer. Lung cancer is the most common
cause of cancer-related death in men and women, and responsible for million deaths. The
major causes of the lung diseases are smoking, inhaling the drugs, smoke and allergic
materials. The computed tomography (CT) images assists in detecting the extreme of the
lung diseases. For the analysis of the proposed method CT image is sufficient also the
visibility of soft tissue is better .There are two types of cancer: Small cell lung cancer and
non-small cell lung cancer which has three subtypes: Carcinoma, Adeno carcinoma and
Squamous cell carcinomas.

Cancer cells can be carried away from the lungs in blood, or lymph fluid that surrounds
lung tissue. Lymph flows through lymphatic vessels, which drain into lymph nodes located
in the lungs and in the centre of the chest. Lung cancer often spreads toward the centre of the
chest because the natural flow of lymph out of the lungs is toward the centre of the
chest. Metastasis occurs when a cancer cell leaves the site where it began and moves into a
lymph node or to another part of the body through the blood stream . Cancer that starts in the
lung is called primary lung cancer.

Correct Diagnosis and early detection of lung cancer can increase the survival rate. Image
improvement techniques are developing for earlier disease detection and treatment stages;
the time factor was taken in account to discover the abnormality issues in target images.
Image quality and accuracy are the important factors.

Techniques presently used include study of X-ray, CT scan, PET, MRI images. Treatments
include surgery, chemotherapy, radiation therapy and targeted therapy.

Lung Cancer is the leading cause of cancer death among both men and women. Early
detection significantly improves the chances of survival and prognosis. Detection of lung
nodules is time consuming due to the volume of data involved, and also suffers from inter
radiologist variance.
26
It perform a classification task for identify the infection from scan images. It is difficult to
achieve high performance metrics using only a single feature class. Choosing right set of
features can improve the overall accuracy of the system.

Background and Approach

• Computed Tomography (CT) is a widely used method for lung cancer screening.

• The goal of screening is to detect disease at the earliest stage possible .

• Accurate detection and interpretation of lung nodules is fundamental to the diagnosis


of lung cancer.

• A CT scan can contain millions of pulmonary voxels, and lung nodules are relatively
small.

• Automated methods with recent advances now match or exceed manual


interpretation .Automated methods reduce subjectivity and inter- radiologist
variance.

• Automated methods with recent advances now match or exceed manual


interpretation.

• Deep neural networks have shown superior performance in classification problems.

• Convolutional Neural Networks (CNNs) or a combination of 2D CNNs using


multiple views have been previously used in lung nodule detection.

Image Processing Techniques

Digital Image Processing is a quickly creating range in Computer Science. It is an


extremely noticeable field for the research work since its various techniques are used as a
part of a wide range of applications like human system interface, medical representation,
image up gradation, law implementation, and digital watermarking for security purposes.
Information/data is in the form of computerized substances like pictures, texts, audios &
videos. Image processing is the study of which operations can be performed on the
image. In this the image can be improved and important data can be extracted from the
image. An image can be defined as a 2 dimensional function, f(x, y), where x and y are the
spatial coordinates, and the coordinate of f on any pair of coordinates (x, y) is the coordinate
27
of that image is called intensity.

When the coordinate amplitudes of x, y, and f are finite, we call it a digital image. In
simple words, an image can be defined by a 2 dimensional array of rows and columns
arranged specifically. A digital image is made up of a finite number of elements, each of
which has a specific value at a particular location. These elements are referred to as picture
elements, image elements, and pixels. A pixel is widely used to represent the elements of a
digital image.

In this type of processing, images are manipulated by electric means by converting electric
signals. An example of this is the image of television. Digital image processing has
dominated analog image processing with the passage of time due to widespread use.

Digital image processing deals with developing a digital system that operates on digital
images. Digital image processing is also processing 2 dimensional signals and it is a
collection of pixels.

A signal is an electric or electromagnetic current used to carry data from one device or
network to another network or device. A signal is a mathematical function, and it conveys
some information. The signal can be one dimensional or two dimensional or a higher
dimensional signal.

A dimensional signal is a signal that is measured over time. A common example of this is a
voice signal. Two-dimensional signals are those that are measured at some other physical
quantity. An example of a two dimensional signal is a digital image .Anything that transfers
information between two objects or people is a signal. It also includes speech or (human
voice) or an image as a signal. When we talk, then our voice is converted into sound
wave/signal and then the signal gets converted into voice at the time of the person we are
talking to. Not only that, but the way a digital camera works involves the transfer of a signal
from one part of the system to another while receiving an image from a digital camera.

Artificial intelligence is the study of inserting human-like intelligence into machines.


Artificial intelligence has many applications for image processing. For example: Developing
a computer aided diagnosis system that helps doctors interpret images from X-rays, MRIs,
etc. and then highlights the specific stream to be examined by the doctor.
28

1.2 Motivation
Images cover up a noteworthy part of multimedia content. Examples of images
are digital arts, medical images, illustrative diagrams, and pictures of cultural heritage
paintings in computerized frame and advanced digital photographs. Advancement in
computing environment has made problems to copyright assurance and content
security. For example, images can be replicated, changed, and circulated effectively.
Digital image is a possibly decent technique in empowering content. Encryption
can offer privacy and integrity in content security, and the decoded content can be
additionally protected using digital image processing.

• The main motivation behind development of this project is to make a system for
easy and quick detection of cancer cells in the lungs at earlier stage.
• This is a MATLAB based system to perform a classification task for identify the
cancer from CT scan images.

These are the following reasons which motivate to me:

1. Easy sharing of digital image.

2. Editing of digital images.

3. Copying of digital image without quality degradation.

1.3 Objectives of Dissertation Work


• The goal of a CT screening program is to detect early lung cancer.
• It is difficult for doctors to interpret and identify the cancer from CT scan images. Therefore
it can be helpful for doctors to identify the cancerous cells accurately.
• The aim of this research is to detect features for accurate images comparison as pixels
percentage and mask labelling.

1.4 Organization of Dissertation


This dissertation is divided into 5 chapters. Chapter 1 is introductory chapter in which
describes the overview of our research. Chapter 2 describes brief introduction of literature
review. Chapter 3 Describes the proposed work in which working of the machine learning
algorithm, explanation with block diagram. Chapter 4 describes result and analysis. Chapter
5 describes conclusion & future scope.
29
CHAPTER 2
LITERATURE REVIEW

Even today, many research is being done about what are the main causes of cancer. Till
date no strong cause of cancer has been known. There are many types of terrible diseases
like cancer, one of them is lung cancer. Lung cancer also occurs due to genetic reasons, but
this thing cannot be said with certainty. Lung cancer ie, lung cancer mostly occurs in
smokers or those who take gutkha or drugs. Let us know how lung cancer occurs and about
its each stage.

What are the symptoms of lung cancer?

Many types of symptoms arise in the onset of lung cancer. These include severe cough,
shortness of breath, excessive mucus discharge, chest pain, and blood in the mucus.
According to a health website, most respiratory problems are due to lung disease. In this
disease, the patient does not get enough oxygen. Symptoms of lung cancer include pain in
the back and shoulders, shortness of breath, feeling more tired and yellowing of the eyes.

This is how cancer begins

The structure of the body is such that the process of creation and destruction of cells goes
on continuously. New cells replace old cells.

Every day about 40 thousand cells die in the body. New cells are produced in the same
proportion as cells are destroyed.

In the midst of this process, due to many different reasons, any one cell keeps on growing
continuously .The body is unable to stop the disordered growth of that cell. For this reason,
this cell grows and takes the form of cancer in the future.

what is stage three of lung cancer

Stage-3 in any type of cancer is the stage when the cancer starts engulfing other parts of the
patient's body as well. Talking about different types of cancer, stage-3 means that the
cancer develops in one part of their body and starts spreading to other parts of the body.
This means that lung cancer develops in the lungs and continues to spread in the lungs till
the third stage.

The fourth stage of lung cancer is very dangerous

The fourth stage in lung cancer is considered to be the most dangerous and deadly.
Those who reach this condition, their lives are in danger of death. But if lung cancer is
controlled before reaching this stage, then the chances of defeating this disease increase very
30
much. While in stage three, lung cancer spreads to the lymph node and the back side of the
lungs. Whereas after this it starts spreading to other organs, then this is its final stage and it
is called stage-4.

In which people the effect of lung cancer is more visible

Lung cancer mostly affects older people. Lung cancer is less common in people under 40.
Cancer grows more in the lungs when the lungs already start getting worse.

Stage 4 of lung cancer is divided into two parts. Small cell lung cancer and non-small cell
lung cancer.

Stage 4A small cell lung cancer

In this, the cancer spreads to distant places in the body. This may involve the lymph nodes
surrounding the lung or previously healthy lungs.

Stage 4B non-small cell lung cancer

In stage IVB lung cancer, the disease has spread to one or more distant organs or bones.

Stage 4 lung cancer treatment

Stage 4 of lung cancer is treated with chemotherapy, radiation therapy, surgery, targeted
therapy, and photodynamic therapy.

Non- Nodules image Nodules image

In this research, to obtain more accurate results we divided our work into the following three
stages:

Image Enhancement stage

To make the image better and enhance it from noising, corruption or interference. The
following three methods are used for this purpose: Gabor filter (has the best results), Auto
31
enhancement algorithm, and FFT Fast Fourier Transform (shows the worst results for image
segmentation).

Image Segmentation stage

To divide and segment the enhanced images, the used algorithms on the ROI of the image
(just two lungs, the methods used are: Thresholding approach and Marker-Controlled
Watershed Segmentation approach (this approach has better results than thresholding).

Features Extraction stage

To obtain the general features of the enhanced segmented image using Binarization and
Masking Approach.

Machine Learning Algorithms :


Although many types of algorithms are used inside machine learning, but we will talk about
some special algorithms -:

Linear Regression :
It is a supervised machine learning algorithms in which relation is told between dependent
and independent variables. Inside Linear Regression, we tell the relationship between the
dependent and independent variables by a line.

We can write the equation of that line as follows


Y = mX + b
Here Y a dependent variable and X an independent variable , and m here at slope and b
intercept where the line will cross the y axis.

Logistic Regression :
It is also used to implement supervised machine learning. The output of Logistic
Regression is always a binary value (0 or 1, True / False , Yes / No) Inside it always the
output of the dependent variable always carries a binary value only content. For this
reason it is also called 'logic regression'. For example whether there will be a match
tomorrow or not? Will it rain tomorrow or not? Etc

Decision Tree is a very popular machine learning algorithm which is being used to classify
the problem inside supervised learning. Decision Tree can be used in both Regression and
Classification. It classifies the input data within a particular class. While preparing the
32
Decision Tree model, it is trained in such a way that whenever it is given any unknown
input data, it can find out from which class it belongs. For example, take an insurance
company and suppose that company has to sell its insurance policies, then with the help of
decision tree, it can find out how many people can buy insurance if according to their age
by decision tree.

K-Means is an unsupervised machine learning algorithm that is used to solve the problem
of clustering. In this, the data sets are classified inside the clusters. Here cluster means
similar type of data group, which keep the same type of information. Here we represent
the number of the cluster. The K-Means algorithms picks up some points inside the
cluster, which are called cetroids.

There is a collection of Decision Tree inside the Random Forest, that is why it is also
called Random Decision Forest. It is a Supervised Learning Algorithm which can be used
to solve the problem of both classification and regression.
But it is mostly used to solve the problem of classification. Simply put, Random Forest or
Random Decision Forest is the process in which many Decision Trees are created and
merged to get a better and accurate result.

KNN Algorithm :
KNN Algorithm is a very simple, Machine Learning Algorithm, based on Supervised
Machine Learning. With the help of KNN, which works to classify the data, the category of
any new input data can be told which category it will belong to. We can also use it for
regression, but it is mostly used to solve the classification problem.The K-Nearest Neighbor
(KNN) Algorithm is a Supervised Machine Learning Algorithm, so it uses labeled data to
model it, and then whenever any new unlabeled data is passed to this KNN model, it With
the help of the given labeled data during training, one can easily classify the new unlabeled
data. KNN Algorithm is also called Lazy Learning Algorithm because it requires data in
advance to teach algorithms, only after that algorithm can take any decision.

To understand the need of KNN Algorithm, we will use an example in which we have two
types of data, category-1 data and category-2 data. Now whenever any new data comes and
we have to find out from which category that data will be billed, we can use Canon for that.
33
We can understand the working of KNN Algorithm by following steps:
1. In KNN Algorithm, we first consider any one value of a variable K, here the value of K
will tell the number of Nearest Never. Here we will concatenate the value of K to an
old value so that the algorithm can easily take the decision.
2. In the second step, we will calculate the distance of the nearest neighbor from the new
data point, for this we will use Euclidean Distance formula.
3. In the third step KNN Algorithm extracts the number of Nearest Neighbor point from
the new data point and also find out how many Nearest Neighbor point are billed by the
new data point.
4. KNN Algorithm, in the fourth step, designates the category of new data point. The new
data point will belong to the category whose more Nearest Neighbor point will be
considered as that category

2.1 Review of Digital Image Processing Techniques

Priyanka Basak, Dr. Asokke Nath [2017] proposed Lung cancer detection system
using image processing technique to classify the presence of cancer cells in lung and its
stages from the CT- scan images using various enhancement and segmentation
techniques, aiming for accuracy in result. Tumor is identified with accuracy from the
original image. Gabor filter and watershed segmentation gives best results for pre-
processing stage.

Wei Yang, Yumbi Liu, Liyan Lin,Zhakiyang Yun, Zhentai Lu [2017] The work
aims to develop a practical and useful methods for automatically segmenting lung
fields in CXRs. The core of proposed method is the effective use of the lung boundary
map produced by SED. The proposed SEDUCM method was evaluated on three CXR
datasets. This method can provide reasonable segmentation results on many cases of
the CRASS dataset. These results provide evidence that the trained SED demonstrates
good generalization capability and that proposed SEDUCM method is effective and
robust.

F.Taher, A. Kunshu & H. Alhmad [2016] present “A new hybrid watermarking


algorithm for MRI medical images using DWT and hash functions” [7]. This algorithm
authenticates the magnetic resonance tomography images and also provides copyright
protection. This algorithm contains the robust and fragile watermarks. It works in both
spatial and transform domain.
34
Furqan & Munish [2015] studied and analyzed “A robust watermarking technique
using Mat lab. They use the DWT-SVD scheme in the algorithm. In this paper, a
signature or secret message is secretly inserted in the cover image. Using 2 DWT they
divided the cover image into 4 sub bands and then applied SVD on each sub bands by
updating their singular values.

Dr. Thomas Geeorge, Dr. Narain Ponraj [2019] This paper has surveyed numerous
strategies, to distinguish the lung tumour in its beginning periods. The manual
examination of the samples is tedious, inaccurate and requires intensive trained person
to eliminate diagnostic errors. From the results obtained we could conclude that the
Local Binary Pattern performs better than other basic textural patterns as the histogram
features obtained were greater than that of the latter. The significant improvement in
the performance of classification of EEG signals, the LBG patterns were computed at
the key points of the EEG signals Histogram features that are extracted were lengthy
and increases the efficiency.

Saxena & Garg [2012] present a DWT-SVD based semi blind image watermarking
techniques. They focused on only high frequency band of the cover image. Semi blind
image means at the time of extraction cover image is not used. It has high robustness
under geometric attacks [14] This paper has surveyed numerous strategies, to
distinguish the lung tumour in its beginning periods. The manual examination of the
samples is tedious, inaccurate and requires intensive trained person to eliminate
diagnostic errors. From the results obtained we could conclude that the Local Binary
Pattern performs better than other basic textural patterns as the histogram features
obtained were greater than that of the latter.

Akram et al. implemented an automated pulmonary nodule detection system using


Artificial Neural Networks based on hybrid features consists of 2D and 3D Geometric
and Intensity based statistical features. The manual examination of the samples is
tedious, inaccurate and requires intensive trained person to eliminate diagnostic errors.

Ezoe et al. and Taino et al. A nearest cluster method is used to classify the detected
nodule candidate. To estimate the probability density function of the intensity value of
the trained ground glass opacity nodules. A user interactive framework for lung
segmentation with a K-nearest neighbourhood classifier was used to perform
classification.
35

Jafaar et al. ,A genetic algorithm produce to segment the lung part from the original
image, then they used morphology and susan thinning algorithm to detect lung’s edges.
The author present an intelligent medical system for lung cancer cell identification
based on a two- layer rule-based fuzzy knowledge model.

Saravanan Sreckara M and Manikantan K [2016] present a digital image


watermarking technique based on DWT& SVD transform along with DFT for color
image [6]. proposed Lung cancer detection system using image processing technique to
classify the presence of cancer cells in lung and its stages from the CT- scan images
using various enhancement and segmentation techniques, aiming for accuracy in result.
36
CHAPTER 3
PROPOSED WORK

A chest X-ray is often the first test doctor will do to look for any abnormal areas in the
lungs. Imaging test use x-rays to create pictures of the inside of the body. Imaging test might
be done for a number of reasons both before and after a diagnosis of lung cancer, including:

• To look at suspicious areas that might be cancer

• To learn how far cancer might have spread

• To help determine if treatment is working

• To look for possible signs of cancer coming back after treatment

In this proposed method, there are two steps to segment lung tissues from CT scan image,
namely

1. Lung Field Extraction

2. Boundary Analysis and Segmentation of lung


The proposed method involves three stages are shown in Fig.3.2. Initially the CT lung
images are pre-processed and segmented. Next stage is Feature extraction which is done by
Gabor filter the third stage is classification with K-NN, and optimization by Genetic
Algorithm.

Selection of k value

Calculation of Distance

Minimize the nearest

Classification using KNN

Finding main class

Fig. 3.2 -Steps of k nearest neighbor classification


37
The aim of this technique is to identify nearest k shortest distance value between the pixels
and classify the appeared example according to greatest comparable class.
Basically familiarity is demarcated with Euclidean distance measurement calculation. The
arbitrary instance x is defined by the feature vector.

< a1(x), a2(x),…an(x)

Here a1(x) represents the characteristic value of instance x. Then the distance among two
instance xi and xj is distinct to be d(xi,xj) as below

d(xi, xj) = √∑nr=1 (ar(xi)– ar(xj))2 Subsequently, the illustration is allocated to


most similar class from nearest neighbor method and it is also used to estimate the actual
value for an unidentified samples.

The K-Nearest Neighbor (KNN) Algorithm is a Supervised Machine Learning Algorithm,


so it uses labeled data to model it, and then whenever any new unlabeled data is passed to
this KNN model, it With the help of the given labeled data during training, one can easily
classify the new unlabeled data. KNN Algorithm is also called Lazy Learning Algorithm
because it requires data in advance to teach algorithms, only after that algorithm can take
any decision.

To understand the need of KNN Algorithm, we will use an example in which we have two
types of data, category-1 data and category-2 data. Now whenever any new data comes and
we have to find out from which category that data will be billed, we can use Canon for that.

K-NN Algorithm

1. In KNN Algorithm, we first consider any one value of a variable K, here the value of
K will tell the number of Nearest Never. Here we will concatenate the value of K to
an old value so that the algorithm can easily take the decision.
Apply first level selection of the number K of the neighbors

K is generally an odd number if number of classes is 2.


Or, Select k is set k=sqrt (n)
Where k is a positive integer.

2. Apply in second level calculation of the Euclidean distance of K number of


neighbors. If we have two points (x, y) and (a, b).
38

The formula for Euclidean distance (d) will be smallest Euclidean distance and based on the
number of smaller distances we perform our calculation.

3. Use KNN as per the calculated Euclidean distance.

The following formula is used to calculate the euclidean distance between points.

Distance between two points = √[(x22 – x11)2 + (y22 – y11)2]


Where,
(x11,y11) are the coordinates of the first point.
(x22,y22) are the coordinates of the second point.

4. KNN Algorithm extracts the number of Nearest Neighbor point from the new data point
and also find out how many Nearest Neighbor point are billed by the new data point. Among
these K neighbors, count the number of data points in each category KNN Algorithm, The
new data point will belong to the category whose more Nearest Neighbor point will be
considered as that category. Assign the new data points to that category for which the
number of neighbor is maximum. Then, KNN algorithm predict the class of the query point.

KNN Algorithm is a very simple, Machine Learning Algorithm, based on Supervised


Machine Learning. With the help of KNN, which works to classify the data, the category of
any new input data can be told which category it will belong to. We can also use it for
regression, but it is mostly used to solve the classification problem.

Image pre-processing:-

It is a technique in which operations are performed in an image to enhance the image and
extract important information and data from it.”
39
Following are the steps in image processing:-

• In this first the image is scanned.

• After scanning this image is stored in the second step.

• After this the image is enhanced to improve and improve it.

• Finally this image is interpreted.

Image Enhancement

In image enhancement stage three techniques used: Gabor filter, auto-enhancement and Fast
Fourier transform techniques. The Gabor function has been recognized as a very useful tool
in computer vision and image processing, especially for texture vision and image processing,
especially for texture properties in both spatial and frequency domain. Auto enhancement,
automatically adjusts and enhances the image (brightness, colour and contrast) to optimum
levels. Fast Fourier Transform technique operates on Fourier transform of image. Fast
Fourier Transform is a faster version of the Discrete Fourier Transform (DFT). It is difficult
for doctors to interpret and identify the infection from images.

Therefore it can be helpful for doctors to identify the cancerous or infected cells accurately.
It perform a classification task for identify the infection from scan images. It is difficult to
achieve high performance metrics using only a single feature class. Choosing right set of
features can improve the overall accuracy of the system .

In this proposed method, there are two steps to segment lung tissues from CT scan image,
namely

• Lung Field Extraction

• Boundary Analysis and Segmentation of lung


We will implement our project in MATLAB and testing is done with CT scan images.

Collected images Image pre-processing Feature Extraction

Classification process

Fig. 3.3:- Image Classification


40

• Features Extraction – In this stage, features like area, perimeter, centroid, diameter,
eccentricity and Mean intensity. These features later on are used as training features to
develop classifier. To know normality or abnormality of the images this process is used.

• Classification – It classify the presence of lung cancer in a CT images and blood samples.
In spite of CT scan reports are more effective than Mammography, therefore patient CT scan
images are categorized in normal and abnormal.

Lung Cancer Detection

Malignant Nodule Segmentation

Lung Cancer Stage Identification

Fig. 3.4- Proposed Modules

We have enhanced k nearest neighbor technique for identifying and classify lung cancer
tissues in images as malignant and benign classes. The image classification system is
designed by nearest neighbor that provides better results in classification of lung cancer
images.

The following program that represents sample code of nearest neighbor classification
technique that identifies the shortest neighbors in a set of points. To find the distance metric
Euclidean distance measurement can used.

Usage:

Neighbors distances = k Nearest Neighbors (data matrix, query matrix, k), 2. Data matrix (N
x D) - N vec- tors with dimensionality D (within which we search for the nearest neighbors),
3. Query matrix (M x D) - M query vectors with dimensionality D, 4. K (1 x 1) - Number of
nearest neighbors desired.
41
Function [neighbor Ids neighbor Distances] = k Nearest Neighbors (data matrix, query
matrix, k)

Neighbor Ids = zeros (size (query matrix, 1), k); Neighbor distances = neighbor Ids;

Num data vectors = size (data matrix, 1); Num query vectors = size (query matrix, 1); For
i=1: num queryvectors,

dist = sum((repmat(query matrix(i,:),num data vectors,1)

– data matrix).^2,2);

[sortval sortpos] = sort (dist,’ascend’); Neighbor Ids (i, :) = sortpos(1:k);

Distance of Neighbors (i, :) = sqrt (sortval(1:k)); end


}

Fig 3.5.Example of abnormal CXR’s

GRAPH CUT BASED LUNG SEGMENTATION:

Here we kept lung segmentation is done from the training masks we have collected. After
this we performed Normally segmentation of images have a poor contrast because of
hardware constraints and anatomical shape variations.

Step 1:Allign the training masks linearly to a given input CXR.

Step 2:Compute the Histogram representation for the lung model.


42
Step 3:Note the CXR and its calculated lung model.

The algorithm of a graph cut model in computer vision label the problems such as
segmentation and disparity estimation as energy minimization using an undirected weighted
graph G=(V,E).The set of vertices v represents the pixel properties such as intensity; and a
set of edges E connects this vertices.

The edge weighted graph typically represent the space between the vertices.

CURVATURE DESCRIPTOR HISTOGRAM:

It denotes the angle of the digital image normalisation with respect to intensity makes this
descriptor independent of image brightness.

HOG:
Histogram of oriented gradients are feature descriptors used in computer vision and image
processing for the purpose of object data technique count occurrences of gradient orientation in
some portions of an image.
This method is very similar to that of edge oriented histograms but differs in that it is computed
on a dense grid of uniformly spaced cells and uses overlapping normal contrast to improve
accuracy.HOG is weighted according to gradient magnitude. The image in to small computed
regions, and for each region a histogram of gradient directions or edge orientations for the pixels
within the region is computed. The histograms represents the descriptor.HOG is successfully
used in many detection systems.

LBP:

Local Binary feature used for classification in computer vision. LBP is first describe, and it has
been found to be a powerful feature for textural classification. It has been further determined
when LBP is combined with the HOG improves the detection performance considerably on some
data sets.
43

Table 3.6. Performance analysis of EKNN method

Error
distance
rates
Technique accuracy

K -NN 80% 20 –

92.85 7.15 –
EKNN
%
EKNN method 96.57 3.4 0.4187
% 3 6

According to the experimental result, the proposed KNN technique is best method for identifying
cancer tissues and efficient for classifying the lung images as benign or malignant. Based on the
result, proposed method produces the 96.57% accuracy in classification accuracy and as well as
minimise the processing time.
44

CHAPTER 4
RESULTS AND DISCUSSION

4.1 Results

The experiments were performed on a computer with windows 8.1 OS, Intel (R) core (TM)
i3 CPU with 4 GB RAM. The data for the experimental work was collected from various
research papers and online sources. D-RAM 1-GB is implemented as a hardware to support
the process.
MATLAB is a high level technical language and it is implemented as software to support
our process. It is a useful interacting environment for algorithm development, data
visualisation, data analysis, and numerical computation.
Outside , Matlab images may be of three types i.e .black and white, grey scale and coloured.
In Matlab, however there are four types of images. Black and White images are called
binary images, containing 1 for white and 0 for black. Grey scale images are called intensity
images, containing numbers in the range of 0 to 255 or 0 to 1.coloured imd image.First
image contains all images may be represented as RGB image or Indexed image. It exactly
exist of two matrices namely image matrix or map matrix. Each colour in the image is given
an indexed number and its image matrix, each colour is represented as an indexed number.

Table 4.1: Tools and System Configuration


Description Requirement
IDE MATLAB R2020a
RAM 4 GB
Hard Disk 500 GB
Processor Intel Core i5 CPU @
2.4GHz
Operating System Type 64-bit OS Windows 10

In proposed system, we presented cover image, The first cover image is subjected to read the
input Lung CT image. Binary format conversion is viewed as rich in data so the
preprocessing techniques detail of cover image is chosen for further utilize and the
Compliment Filter, Dilation, Erotion and ROI are kept as it may be. In the following step,
45
SVM is connected over the points of interest of the first picture to get U, S and V values . At
the same time, SVM network for classification is applied on the watermark image to make it
reasonable to insert into the cover image. We used data sets (LIDC and LUNA16) for this
process.

1 STAGE- Tumor less than 3 cm. There is not metastasis

2 STAGE- Tumor less than 6 cm. Single metastases are observed..


46

3 STAGE-Tumor more than 6 cm. Metastases in the lymph nodes.

A reliable screening system for cancer detection on radiographs is therefore a big step
towards more powerful cancer detection on radiographs is therefore a big step .Cancer
related abnormalities in chest x-rays are diffuse, and discriminating between normal
anatomical structures and abnormal patterns. Gabor filers used here are a traditional choice
for obtaining localised frequency information.

However there are two limitations. The maximum bandwidth of a gabor filter is limited to
approximately one octave and Gabor filters are not optimal if one is seeking broad spectral
information with maximal spatial localization.

G(W)=e(-log(w/wo)2)/2(log(k/wo)2

e.Comparison of Classifier accuracy:-For this comparison, an approximate of 150 lung


images can be used to identify the classifier with best accuracy.

TABLE 4.3: Comparison of Classifier Accuracy.

TECHNIQ NUMB ACCURACY


ER OF
UES IMAG
ES

k-NN 65% Normal


160 Analysis
47
68% Abnormal
Analysis

Binary-
89% Normal
Support 140 Analysis
Vector 77% Abnormal
machine Analysis

Multi-class 70% Normal


Support vector Analysis
170
machine 80% Abnormal
Analysis

The geometrical and statistical structures perimeter, diameter, irregularity index and area have been
estimated from the separated lung image nodules. The number of pixels from image that having the
values which gives the area of segmented cancer image. The values of images that gives
circumstantial of the image which is black. Lung cancer image is categorized incompletely in its can-
cer border.

For this investigation, the indiscretions in the cancer that are calculated by .

I = 4 π A/PΛ 24 π A

Where, P is the boundary of cancer A is area of cancer in the pixels. The indiscretion directory is
equivalent to only for circle and any other shape.

TABLE 4.4:- THRESHOLD VALUES OF LUNG CANCER NODULES ON


SIZE IN STAGING STANDARDS
STAGES SUB- NODULE SIZE
STAGES
T1 A Between 2mm and 20mm
B Between 20mm and 30mm
C Between 30mm and 40mm

a Between 20mm and 50mm


T2 b Between 30mm and 60mm
T3 No Staging Between 40mm and 60
T4 No Staging Greater than 60mm
48
INTENSITY HISTOGRAMS:
Intensity histograms aims at increasing the colour of the image. The grey level image is
shown in colour.

GRADIENT MAGNITUDE HISTOGRAMS:


It is used to describe the gradients (such as the pixel value). We calculated the performance
of the improved nearest neighbor classifier in terms of classification accuracy.

The classification accuracy process is correctly performed by following formula.

Accuracy = TP+TN/TP+TN+FP+FN

Where, TP is True Positive - properly classified positive cases, TN is True Negative -


properly classified negative cases, FP is False Positive - wrongly classified negative cases
and FN is False Negative – wrongly classified progressive cases. In this research, we have
admirably enhanced a res- olution for the identification of lung cancer tissues using image
mining algorithm.

Elimination of Noise in CT Images of Lung Cancer using Image Preprocessing


Filtering Techniques

. In medical imaging, elimination of noises is a challenging task. In order to overcome this


challenge, preprocessing is a crucial task to eliminate the noises in medical imaging. In this
paper, image pre-processing techniques like noise filters such as Mean, Median, and Wiener
are applied on CT scan images of lung cancer, to segment the image for further analysis of
cancer detection.
The performances of the different filters applied on CT images were evaluated using image
quality assessment metrics such as Mean-Squared Error (MSE) and Peak Signal-to-Noise
Ratio (PSNR).

The experimental study finds the Median filter is more effective in compare to other filters in
removing noises present in CT imaging of Lung cancer by having low MSE values and high
49
PSNR values.

Performance Evaluation

We have calculated the imperceptibility and robustness. To measure the performance of


proposed technique the three quality measures i.e, PSNR, SNR, RMSE. In the current
scheme we focused on properties, imperceptibility & robustness are used in this procedure.

PSNR (Peak Signal to Noise Ratio) The PSNR is used to calculate the term
imperceptibility. It calculates the similarity between the original cover image and the
watermarked image obtained by the embedding procedure. It calculates the difference
between the resultant image and the original image.

In general, the value of PSNR should be in the range of 20 db to 50 db. For better
watermarking technique, the value of PSNR should be high. So, our proposed technique is
robust. PSNR is defined as using following formula:

255 ∗ 255
PSNR = 20log10( )
MSE

Finally, using decision rules and thresholds, the classifier outputs its confidence in
classifying input Chest x-rays as a cancer positive case. The result is combines anxiety data
with lung models derived from the training set. We also analysis with various classifier
architectures. To increase the performance of lung segmentation, produce ordinary
performance related to other schemes. These comparison results test our schemes in the field
under real status. The factor sets and most of the .classifier planning we tested, provide a
identical performance. The accuracy of the system is 89.3%.

The drawback of this technique is that portable x-rays set not applicable to process with this
method.
50

CHAPTER 5
CONCLUSION & FUTURE SCOPE

5.1 Conclusion

Our proposed method gives effective outcomes contrasting with methods utilizing constant
scaling elements. The technique CNN classifier method to determine whether a CT image of
lung is cancerous or non-cancerous. Test results of the proposed technique have performed
through LUNA16 (Lung Nodules Analysis) dataset (CT scans with labeled nodules).

5.2 Future Scope

In future, we will perform the experiments on a large amount of amount of data and apply
more features such as nodule size, texture, features and position. We will also try to apply
the state-of the-art deep CNN methods for higher accuracy. We can improve the robustness
and imperceptibility of the images.
51
Curriculum Vitae

Urvashi Shri
M. Tech. (Computer Science & Engg.)

Education
▪ [Pursuing] M. Tech. (CSE) from Madan Mohan Malaviya University of
Technology, Gorakhpur (IN) with aggregate CGPA 7.2.

▪ [Aug 2015 – May 2019] B. Tech. (CSE) from Babasaheb Bhimrao


Ambedkar University, Lucknow (IN) with 73.61 %.

Personal Information

Father’s Name : Mr. Pramod Kumar


Email : Urvashishri850@gmail.com

Language Known -
English
Hindi

Address -
128-BLK Cantt,
Dilkusha, Sadar
Lucknow, Uttar Pradesh (India)-226002
52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy