Precision Medicine in Digital Pathology Via Image Analysis and Machine Learning
Precision Medicine in Digital Pathology Via Image Analysis and Machine Learning
Precision Medicine in Digital Pathology Via Image Analysis and Machine Learning
Precision medicine in
digital pathology via image
analysis and machine
Peter D. Caie, BSc, MRes, PhD 1, Neofytos Dimitriou, B.Sc 2,
Ognjen Arandjelovic, M.Eng. (Oxon), Ph.D. (Cantab) 2
School of Medicine, QUAD Pathology, University of St Andrews, St Andrews, United Kingdom;
School of Computer Science, University of St Andrews, St Andrews, United Kingdom
Precision medicine
The field of medicine is currently striving toward more accurate and effective clin-
ical decision-making for individual patients. This can be through many forms of
analysis and be put into effect at multiple stages of a patient’s disease progression
and treatment journey. However, the overarching goal is for higher treatment success
rates with lower side effects from potentially ineffectual, but toxic, therapies, and of
course better patient well-being and overall survival. For example, understanding if
a specific treatment for an individual patient’s cancer may help their treatment or in
fact be detrimental to their overall survival, as is the case with cetuximab treatment
in colorectal cancer. This process of treating the patient as an individual, and not as a
member of a broader and heterogeneous population, is commonly termed precision
medicine and has traditionally been driven by advances in targeted drug discovery
with accompanying translatable companion molecular tests. These tests report on
biomarkers measured from patient samples and inform if specific drugs will be
effective for patients, normally based on the molecular profile of their diagnostic tis-
sue sample. The tests may derive from our knowledge of biological process, like
designing inhibitors against EGFR receptor pathways, or from machine learninge
based mining of large multiomic datasets to identify novel drug targets or resistance
mechanisms. In the current era of digital medicine, precision medicine is being
applied throughout the clinical workflow from diagnosis, prognosis, and prediction.
Large flows of data can be tapped from multiple sources and no longer solely
through molecular pathology. This is made possible by the digitization, and avail-
ability, of patient history and lifestyle records, clinical reports, and through the adop-
tion of digital pathology and image analysis in both the realm of research and the
clinic. In fact, prior to the interrogation of digitized histopathological datasets by im-
age analysis, in vitro high-content biology-based drug screens were being developed
Artificial Intelligence and Deep Learning in Pathology. 149
Copyright © 2021 Elsevier Inc. All rights reserved.
150 CHAPTER 8 Precision medicine in digital pathology via image
by the pharmaceutical industry [1]. These screens also applied image analysis, but to
cultured cells exposed to genetic of small molecular manipulation, to segment and
classify cellular structures before capturing large multiparametric datasets that
inform on novel targets or drug efficacy. Similar methodology taken from high-
content biology and image analysis can be applied to digital pathology. This is
the case for classical object thresholdebased image analysis or for artificial intelli-
gence. The overarching aim being to either quantify and report on specific bio-
markers or histological patterns of known importance, or by capturing unbiasedly
collected multiparametric data and pattern recognition across segmented objects
and whole-slide images (WSI). The aim of distilling and reporting on the extracted
data from digital pathology images is to allow for the stratification of patients into
distinct groups that may inform the clinician on their optimal and personalized treat-
ment regimen.
Digital pathology
Digital pathology, the high-resolution digitization of glass-mounted histopatholog-
ical specimens, is becoming more common place in the clinic. This disruptive tech-
nology is on track to replace the reporting of glass slides down a microscope, as has
been the tradition for centuries. There remain certain obstacles to overcome for
wide-scale adoption of digital pathology; such as IT infrastructure, scanning work-
flow costs, and the willingness of the pathology community. However, as more in-
stitutes trend to full digitization, these obstacles diminish. The adoption of digital
pathology holds advantages over traditional microscopy in teaching, remote report-
ing and image sharing, and not least the ability to perform image analysis on the
resultant digitized specimens.
In essence, the technology is currently moving from the glass slide and micro-
scope to the digital image and high-resolution screen, although the manual viewing
and diagnosis remains the same. Clinical applications using digital pathology and
WSIs are currently restricted to the primary diagnosis of H&E stained slides. How-
ever, future applications, such as immunofluorescence, can bring advantages to dig-
ital pathology. Indeed, scanner vendors are frequently combining both brightfield
and multiplexed immunofluorescence visualization into their platforms. Immunoflu-
orescence allows the identification, classification, and quantification of multiple cell
types or biomarkers colocalized at the single-cell resolution and on a single tissue
section. The importance of this capability is becoming increasingly apparent as
we realize that the complex intercellular and molecular interactions within the tumor
microenvironment, on top of cellular morphology, play a vital role to a tumor’s pro-
gression and aggressiveness.
Humans are as adept at reporting from digital pathology samples as they are from
glass mounted ones. They do this by identifying diagnostic and prognostic patterns
in the tissue, while at the same time disregarding artifact and nonessential histology.
However, an inherent human setback in the field of pathology has been the standard-
ization and consistency of inter- and intraobserver reporting. This is specifically true
Applications of image analysis and machine learning 151
for the identification and semiquantification of more discrete and subtle morphol-
ogies as well as, for example, the counting of specific cell types, mitotic figures,
or biomarker expression across WSIs [2e4]. The automated analysis of such fea-
tures by computer algorithms can overcome these flaws and provide objective and
standardized quantification. With ongoing research providing much needed evi-
dence of the capability of image analysis and artificial intelligence to accurately
quantify and report on molecular and morphological features, it is only a matter
of time before these too translate into the clinic.
The use of image analysis and artificial intelligence can be knowledge driven or
data driven. The next section of this chapter will discuss how both methodologies
can be applied to digital pathology before we expand on the theory and concepts
behind the various artificial intelligence models commonly used in the field.
the specific regions are that they want to segment. They do so by selecting examples
across a training subset of images. Once the algorithm has learned the difference be-
tween the regions to be differentiated, it can be applied to a larger sample set in order
to automatically segment the image of the tissue. An example of this would be using
an antibody against cytokeratin to label the tumor and use this marker, on top of the
cancer cell morphology, to differentiate the tumor from the stroma (Fig. 8.1A). Once
this is performed, one can employ the before-mentioned threshold-based image
analysis to quantify, for example, Ki67 expression in only the tumor (Fig. 8.1B), tu-
mor buds only at the invasive margin [10], or CD3 and CD8 positive lymphocytic
infiltration within either the tumor core or invasive margin [6] (Fig. 8.2).
Invasive margin
(B) (C)
FIGURE 8.2 Quantification of lymphocytic infiltration with the tumor’s invasive margin or
(A) A whole slide image of a colorectal cancer tissue section labeled for tumor cells
(green) and nuclei (blue). The image analysis in this figure is performed using Indica Labs
HALO software. The red outline is the automatically detected deepest invasion of tumor
cells and the inset shows a zoomed in example of the image segmented into an invasive
margin (green) and the tumor core (blue). The purple square denotes where Figure (B
and C) originate from in the invasive margin. (B) Multiplexed immunofluorescence
visualized CD3 (yellow) and CD8 (red) positive lymphocytes. (C) Automated quantification
of these lymphocytes within just the invasive margin of the tumor.
to differentiate regions of interest for image analysis. Furthermore, the tissue spec-
imen is imperfect, and so, therefore, is the digitized WSI. These imperfections can
originate from multiple stages of tissue preparation. They can result in folds, tears,
or uneven thickness of the tissue to more subtle morphological differences due to
the ischemia or fixation times. If performing immunolabeling, nonspecific staining,
edge-effect, or autofluorescence can further create confounding issues for automated
image analysis. Examples of such artifacts can be seen in Fig. 8.3. All of the above
154 CHAPTER 8 Precision medicine in digital pathology via image
may cause inaccurate reporting, such as false positives, when applying image analysis
and machine learning to segment tissue across large patient cohorts. However, many
of these issues can be overcome by employing deep learning architecture to segment
the digital WSI. To do this, trained experts can annotate the regions of interest selected
for quantification, while further identifying and training the algorithm to ignore the
tissue artifact [11]. The human brain is extraordinary at pattern recognition and the
pathologist routinely and automatically ignores imperfect tissue specimens in order
to hone in on the area containing the information needed to make their diagnosis or
prognosis. The ever developing sophistication of deep learning architectures now al-
lows this level of analysis by automated algorithms.
For accurate tissue segmentation, using either machine learning or deep learning
methodology, a strong and standardized signal to noise is required within the feature
one is using to differentiate regions of interest. However, the inherently heteroge-
neous sample is also reflected in the marker used for segmentation, and this may
vary in brightness or intensity between patient samples; and even within the same
sample. Image color and intensity standardization algorithms can be employed prior
to the analysis of the image by artificial intelligence [12]. This can lead to a more
accurate fully automated tissue segmentation across diverse and large patient co-
horts; however, there is controversy over whether this is the best method to achieve
this goal and whether this methodology may reduce real diagnostic information.
Applications of image analysis and machine learning 155
Deep learning is not only being applied to segment tissue prior to biomarker or
cellular quantification but also has the ability to recognize histopathological patterns
in digitized H&E-stained tissue sections (Fig. 8.4). This has been demonstrated in
studies where the expert pathologist has annotated features of significance in order
to train the deep learning algorithms to identify these in unseen test and validation
sets [13,14]. The clinical application and methodology that allow the computer to
visualize and recognize more subtle prognostic associated patterns are covered in
(A) (B)
(C) (D)
more detail elsewhere in this book. Algorithms such as these are being developed to
aid the pathologist in their diagnosis, and it is only a matter of time until they are
applied routinely in the clinical workflow of pathology departments. Currently, these
algorithms are not being designed to automatically report on patient samples,
without any human verification, rather to act as an aid to the pathologist in order
to increase the speed of their reporting, for example, by highlighting the areas of in-
terest, as a method to triage urgent cases or as a second opinion. However, as deep
learning becomes more sophisticated, there is a strong possibility that in the future,
computer vision algorithms may perform an aspect of autonomous clinical report-
ing. Later in this chapter, we discuss what regulatory concerns to address when
designing an algorithm for translation into the clinic.
Deep learning architectures are now able to predict patient molecular subgroups
from H&E-labeled histology. They do this from only the morphology and histolog-
ical pattern of the tissue sample [15,16]. This may be quite remarkable to imagine;
however, the histopathologist would most likely have predicted this to be possible.
They have been fully aware of the complex and important variations of morphology
present in the tissue and how they affect patient outcome; even if they have not been
able to link these to molecular subtypes. This, however, has real implications beyond
academic research. Molecular testing is expensive and requires complex instrumen-
tation. If the same information, relevant to personalized pathology, can be gleaned
from the routine and cheap H&E-labeled section, this could have significant mone-
tary impact when calculating health economics.
Spatial resolution
We have briefly covered the importance of reporting the tissue architecture in the
field of precision medicine when quantifying biomarkers of interest, as opposed to
other molecular technology that destroys the tissue and thus the spatial resolution
of its cellular components. Histopathologists know that context is key to an accurate
diagnosis and prognosis. The tumor microenvironment is complex, with many
cellular and molecular interactions that play a role in inhibiting or driving tumor pro-
gression and responses to therapy. The quantification of a stand-alone biomarker,
even by image analysis, may not be enough to predict accurate prognosis or predic-
tion for an individual patient. This is the case for PDL-1 immunohistochemical
testing, where even patients with PDL-1 positive tumors may not respond to anti-
PDL-1 therapy [17]. Similarly, there is an advantage to quantifying prognostic his-
topathological features such as lymphocytic infiltration or tumor budding in distinct
regions within the tumor microenvironment. Traditionally, image analysis has quan-
tified a single prognostic feature across a single tissue section and with proven suc-
cess at patient stratification. However, by applying multiplexed
immunofluorescence, it is now possible to visualize multiple biomarkers and histo-
logical features within a single tissue section. Image analysis software can further-
more calculate and export the exact x and y spatial coordinates of each feature of
interest across the WSI, or recognize specific patterns within the interacting cellular
Applications of image analysis and machine learning 157
milieu of the tumor microenvironment (Fig. 8.5). This not only brings the advantage
of measuring more than one prognostic feature, but also allows new insights into the
understanding of disease progression based on quantifying novel interactions at the
spatial resolution of the tissue. This was demonstrated by Nearchou et al. who
showed that the density of tumor budding and immune infiltrate were significantly
associated with stage II colorectal cancer survival, but furthermore that their specific
interaction added value to a combined prognostic model [6]. Studies such as this
show that complex spatial analysis may be key to the success of accurate prognosis
for the individual patient.
(C) (D)
and is difficult, if not impossible, to sort and analyze by eye. Machine learning can
be applied to large datasets from both molecular and digital pathology in order to
understand the optimal features that allow patient stratification, and thus clinical
decision-making, in the field of personalized medicine. However, caution must be
taken when deciding on which machine learning algorithm to apply to your data.
If one model is superior at analyzing one dataset, based on, e.g., the search for
the optimal area under the receiver operator, it does not mean that the same algo-
rithm will be as successful at analyzing a second and distinct dataset under similar
computational restrictions. This forms part of the “no free lunch theorem” [18] of
which we will touch upon again later in the chapter. In plain terms, different machine
learning algorithms are superior to others when applied to specific datasets. For
example, some algorithms excel at analyzing data with low dimensionality (such
as K-Nearest Neighbor) and, however, become intractable or return poor results
when the data dimensions increase. On the other hand, random forests excel at
analyzing high-dimensional data. Similarly, some models are better than others at
separating data in a linear fashion. As we rarely know a priori which model, or
which settings of their hyperparameters, is optimal at separating a specific dataset,
it is prudent to test multiple machine learning methodologies across a single dataset.
Automated workflows can be designed that split data into balanced training and test
sets, apply feature reduction algorithms (if needed), test multiple machine learning
models, and set their hyperparameters before returning the model and features used
to best separate ones data and answer the clinical question being asked [19].
An example that demonstrates the usefulness of combining many of the topics
that we have discussed in this chapter is that of Schmidt et al., who designed an im-
age analysis and deep learning workflow to better predict a patient’s response to ipi-
limumab [20]. They combined basic image analysis thresholding to classify
lymphocytic infiltrating cells and applied deep learning to negate artifact and necro-
sis in the images. Furthermore, they used pathologist annotations to train for specific
regions of interest, where they compartmentalized the tumor and stroma prior to
quantification of the lymphocytic contexture. Finally, they applied multiple methods
of analyzing their resultant data before reporting the final optimal model that pre-
dicts response to treatment.
Beyond augmentation
To make deep learning an effective method for patient diagnosis in the clinic, there
must be a sufficient amount of labeled annotations and a wide variety of samples to
be representative of the larger population. In the case of deep learning, this requires
training and validation on 1000s of patient samples, obtained from multiple interna-
tional institutes and prepared by multiple individuals. This is not an easy or fast task
to perform. A major drawback to the field is a lack of such large well-annotated and
curated datasets that are available to data scientists. However, there is a wealth of
data held in each hospital located in their glass microscope slide archives and that
go back decades. These data can be traced back to each patient treated in that
Practical concepts and theory of machine learning 159
institute along with their clinical reports. A sample tissue section from each patient
in the archive will be stained with H&E, and as each prospective patient’s samples
are also stained with H&E, it makes sense to concentrate deep learning efforts on
digital pathology samples prepared with this stain. The bottleneck is therefore not
access to patient samples but the expert’s digitized annotations of regions of interest
that pertain to diagnosis. To overcome this bottleneck, researchers are forgoing im-
age level annotations and developing weakly supervised deep learning architectures
that rely only on slide level annotations, namely the diagnosis of the patient. Simply
put, the computer is not told where in the image the cancer is, rather that somewhere
in the image there are cancer cells. This methodology has been shown to be effective
and relies on the data-driven analysis of patients, where the computer vision algo-
rithm identifies subtle and complex morphologies that relate to diagnosis or prog-
nosis. Thus, the machine can now inform the human on what is of pathological
significance within the patient’s individual tissue specimen. These patterns may
have gone unnoticed to date, or have been too complex to allow for a standardized
reporting protocol to be produced by human effort. This type of methodology has
been tested in colorectal cancer [21], prostate cancer, and lymph nodal metastasis
of breast cancer [22]. Further information on these methodologies can be found in
the following chapter.
diagnostic tasks or prognostic stratification (e.g., high vs. low risk). Others provide a
more assistive role to a human expert by providing automatic or semiautomatic seg-
mentation of images, classification of cells, detection of structures of interest (cells,
tissues types, tumors, etc.) from images, and so on. Although future work may lead
to the fully automatic diagnosis of patients, currently the drive is for clinically trans-
ferable tools to aid the pathologist in their diagnosis.
Common techniques
The common clinical questions highlighted in the previous section can all be broadly
seen as to have the same form as the two most frequently encountered machine
learning paradigms: namely those of classification (e.g., “is a disease present or
not?” and “is the patient in the high risk category or not”?) and regression (e.g.,
“what is the severity of disease?” and “what is the patient’s life expectancy?”).
Therefore it can hardly come as much of a surprise to observe that most of the
work in the area to date involves the adoption and adaptation of well-known and
well-understood existing techniques, and their application on pathology data.
Here we give the reader a flavor of the context, and pros and cons, of some of these
that have been applied to analyzing the multiparametric data extracted from the im-
age analysis of digital pathology specimens.
Supervised learning
Even the mere emergence of digitization to clinical pathology, that is the use of
computerized systems for storing, logging, and linking information, has led to the
availability of vast amounts of labeled data as well as the associated meta-data.
The reduced cost of computing power and storage, increased connectivity, and wide-
spread adoption of technology have all contributed greatly to this trend which has
escalated yet further with the rising recognition of the potential of artificial intelli-
gence. Consequently, supervised machine learning in its various forms has attracted
a great amount of research attention and continues to be one of the key focal points
of ongoing research efforts.
Both shallow and deep learning algorithms have been successfully applied to
clinical problems. Deep learning strategies based on layer depth and architecture
of neural networkebased models have been discussed in detail in Chapters 2e4.
This section will therefore focus on the mathematical underpinnings of shallow
learning algorithms such as Naı̈ve Bayes, logistic regression, support vector ma-
chines, and random forests. In contrast, the mathematics of deep learning models
are highly complex, as they encompass multilayer perceptron models, probabilistic
graphical models, residual and recurrent networks, reinforcement and evolutionary
learning, and so on. Detailed overview of these is presented in Chapters 2e4, but the
mathematics involved are beyond the scope of the book. For a rigorous mathematical
treatment of deep learning, see Deep Learning by Goodfellow, Bengio, and Cour-
ville, or any of the other modern texts in the field.
Practical concepts and theory of machine learning 161
where P (Cj) is the prior probability of the class Cj and p(xijCj) is the conditional
probability of the feature xi given class Cj (readily estimated from data using a su-
pervised learning framework) [23].
The key potential weakness of naı̈ve Bayesebased algorithms, be they regres-
sion or classification oriented, is easily spotted, and it lies in the unrealistic assump-
tion of feature independence. Yet, somewhat surprisingly at the first sight, these
simple approaches often work remarkably well in practice and often outperform
more complex and, as regards the fundamental assumptions of feature relatedness,
more expressive and more flexible models [19].
There are a few reasons why this might be the case. One of these includes the
structure of errorsdif the conceptual structure of data relatedness under a given rep-
resentation is in a sense symmetrical, errors in the direction of overestimating con-
ditional probabilities and those in the direction of underestimating them can cancel
out in the aggregate, leading to more accurate overall estimates [24]. Another
equally important factor contributing to often surprisingly good performance of
methods which make the naı̈ve Bayes assumption emerges as a consequence of
the relationship between the amount of available training data (given a problem
of a specific complexity) and the number of free parameters of the adopted model.
It is often the case, especially considering that in digital pathology class imbalance
poses major practical issues, that more complex models cannot be sufficiently well
trained; thus, even if in principle able to learn a more complex functional behavior,
this theoretical superiority cannot be exploited.
While a good starting point and a sensible baseline, naı̈ve Bayesebased methods
are in the right circumstances outperformed by more elaborate models, some of
which we summarize next.
The model is trained (i.e., the weight parameter w learned) by maximizing the
likelihood of the model on the training dataset, given by:
2 Y
Prðyi jxi ; wÞ ¼ T ; 8.3
i¼1 i¼1
1 þ eyi w xi
penalized by the complexity of the model:
pffiffiffiffiffiffi e2s2 w w ;
1 T
s 2p
which can be restated as the minimization of the following regularized negative log-
log 1 þ eyi w xi þ wT w:
L¼C 8.5
A coordinate descent approach, such as the one described by Yu et al. [25], can
be used to minimize L.
is sought by minimizing
1X n X n
ci yi ci kðxi ; xj Þyj cj 8.7
2 i¼1 j¼1
Practical concepts and theory of machine learning 163
subject to the constraints ci yi ¼ 0 and 0 ci 1=ð2nlÞ. The regularizing
parameter l penalizes prediction errors. Support vectorebased approaches usually
perform well even with relatively small training datasets and have the advantage
of well-understood mathematical behavior (which is an important consideration in
the context of regularly compliance, among others).
Stepping back for a moment from the technical detail, intuitively what is
happening here is that the algorithm is learning which class exemplars are the
“most problematic” ones, i.e., which exemplars are nearest to the class boundaries
and thus most likely to be misclassified. These are the support vectors that give
the approach its name. Inspection of these is insightful. Firstly, a large number of
support vectors (relative to the total amount of training data) should immediately
raise eyebrows as it suggests overfitting. Secondly, by examining which exemplars
end up as support vectors, an understanding of the nature of learning that took place
can be gain as well as of the structure of the problem and data representation, which
can lead to useful and novel clinical insight.
Random forests
Random forest classifiers fall under the broad umbrella of ensemble-based learning
methods [30]. They are simple to implement, fast in operation, and have proven to be
extremely successful in a variety of domains [31,32]. The key principle underlying
the random forest approach comprises the construction of many “simple” decision
trees in the training stage and the majority vote (mode) across them in the classifi-
cation stage. Among other benefits, this voting strategy has the effect of correcting
for the undesirable property of decision trees to overfit training data [33]. In the
training stage, random forests apply the general technique known as bagging to in-
dividual trees in the ensemble. Bagging repeatedly selects a random sample with
replacement from the training set and fits trees to these samples. Each tree is grown
without any pruning. The number of trees in the ensemble is a free parameter which
is readily learned automatically using the so-called out-of-bag error [29].
Much like in the case of naı̈ve Bayese and k-nearest neighborebased algo-
rithms, random forests are popular in part due to their simplicity on the one hand,
and generally good performance on the other. However, unlike the former two ap-
proaches, random forests exhibit a degree of unpredictability as regards the structure
of the final trained model. This is an inherent consequence of the stochastic nature of
tree building. As we will explore in more detail shortly, one of the key reasons why
this characteristic of random forests can be a problem in regulatory reasonsd
clinical adoption often demands a high degree of repeatability not only in terms
164 CHAPTER 8 Precision medicine in digital pathology via image
Unsupervised learning
We have already mentioned the task of patient stratification. Indeed, the need for
stratification emerges frequently in digital pathology, for example, due to the hetero-
geneity of many diseases or differential response of different populations to treat-
ment or the disease itself [19].
Given that the relevant strata are often unknown a priori, often because the lim-
itations imposed by previously exclusively manual interpretation of data and the
scale at which the data would need to be examined to draw reliable conclusions,
it is frequently desirable to stratify automatically. A common way of doing this is
by means of unsupervised learning, by applying a clustering algorithm such as a
Gaussian mixture model or more frequently due to its simplicity and fewer free pa-
rameters, the k-means algorithm [21]. Then any subsequent learning can be per-
formed in a more targeted fashion by learning separate models for each of the
clusters individually.
Let X ¼ {x1, x2, ., xn} be a set of d-dimensional feature vectors. The k-means
algorithm partitions the points into K clusters, X1, ., XK, so that each datum be-
longs to one and only one cluster. In addition, an attempt is made to minimize the
sum of squared distances between each data point and the empirical mean of the cor-
responding cluster. In other words, the k-means algorithm attempts to minimize the
following objective function:
k X
JðX1 ; .; Xk Þ ¼ kci xk2 ; 8.8
i¼1 x˛Xi
The exact minimization of the objective function in Ref. [1] is an NP-hard prob-
lem [34]. Instead, the k-means algorithm only guarantees convergence to a local
minimum. Starting from an initial guess, the algorithm iteratively updates cluster
centers and data-cluster assignments until (1) a local minimum is attained, or (2)
an alternative stopping criterion is met (e.g., the maximal desired number of itera-
tions or a sufficiently small sum of squared distances). The k-means algorithm starts
from an initial guess of cluster centers.
Often, this is achieved simply by choosing k data points at random as the centers
of the initial clusters, although more sophisticated initialization methods have been
proposed [35,36]. Then, at each iteration t ¼ 0, . the new datum-cluster assignment
is computed:
Image-based digital pathology 165
ðtÞ ðtÞ 2
Xi ¼ x: x ˛ X^arg minx cj ¼ i : 8.10
In other words, each datum is assigned to the cluster with the nearest (in the
Euclidean sense) empirical mean. Lastly, the locations of cluster centers are recom-
puted from the new assignments by finding the mean of the data assigned to each
X ðtÞ
cj ¼ x=Xi 8.11
Thus, deep learning algorithms too are not endowed with a quasi-magical ability
to overcome this fundamental limitation but are also constrained by some prior
knowledge (and are hence not agnostic). In particular, a deep neural network con-
tains constraints, which emerge from the type of its layers, their order, and the num-
ber of neurons in each layer, the connectedness of layers, and other architectural
aspects. This is important to keep in mind and not see deep learning as necessarily
superior to “conventional” approaches or see them as the universal solution to any
greatest similarity to a new example under consideration can be brought up. With
support vectorebased methods, the closest examples (in a kernel sense) or the sup-
port vectors which define class boundaries can be similarly used to gain insight into
what was learned and how a decision was made. With random forests, a well-known
method of substituting features with dummy features can be used to quantify which
features are the most important ones in decision-making and “sanity checked” by an
expert. This process can be not only confirmatory but can also lead to novel clinical
insight. While explainability has for a long time been seen as a potential disadvan-
tage (cf. with human memory and brain: can one localize where the concept of
“cake” is stored?) of deep learningebased approaches, which used to be seen as pro-
verbial “black boxes,” recent years have seen huge strides of progress in this area
[57]. For example, looking at which neurons in a network fire together (cf. the
mantra of biological reinforcement learning “what fires together wires together”)
can be insightful. A higher (semantic) level of insight can be gained by looking at
different layers and visualizing features learneddtypically layers closer to input
tend to learn simple, low-level appearance elements, which are then combined to
compose more complex visual patterns downstream [58]. Another ingenious tech-
nique involves so-called “occlusion” whereby parts of an input image are occluded
by a uniform pattern (thus effecting localized information loss) and quantifying the
impact such occlusion has on the decisiondthe greater the impact, the greater the
importance of a particular image locus [59].
The issues of repeatability and reproducibility have gained much prominence in
academic circles, and wider, in recent years. The difference between the two concepts
has been convincingly highlighted and discussed by Drummond [60], but this nuance
does not appear to have penetrated regulatory processes as of yet. In particular, the
practical impossibility of perfect reproducibility of experimental outcomes for some
machine learning algorithms poses a major obstacle in their adoption in clinical prac-
tice due to the stochastic nature of their operation. We have already alluded to this in
the previous section in our overview of random forests. In particular, even if the same
training data are used and the same manner of training employed, the parameters of a
trained random forest or neural network will differ from instance to instance. This is
an inherent consequence of the stochastic elements of their training processes and is
something that does not sit comfortably with some. While this sentiment is not diffi-
cult to relate to, it does illustrate what can be argued to be an inconsistency in how
human and machine expertize are treated. In particular, the former has been shown
to exhibit major interpersonal variability (two different, competent, and experienced
pathologists arriving at different conclusions from the same data), as well as intraper-
sonal one (e.g., depending on the degree of fatigue, time of day, whether the decision
is made in the anteprandial or postprandial period, etc.). This kind of double standard
is consistent with a broad range of studies involving human-human and human-
machine interaction and is neurologically well understood, with the two types of
engagement differently engaging important brain circuitry such as the ventromedial
prefrontal cortex and the amygdala [61]. Generalizability is a concept that pervades
machine learning. It refers to the ability of an algorithm to learn from training data
170 CHAPTER 8 Precision medicine in digital pathology via image
some information contained within it which would allow it to make good decisions on
previously unseen input, often seemingly rather different from anything seen in the
training stage. The issue of generalizability underlies the tasks of data representation,
problem abstraction, mathematical modeling of learning, etc. Herein we are referring
to generalizability in a very specific context. Namely, a major challenge in the practice
of pathology concerns different protocols and conditions in which data are acquired.
Put simply, the question being asked is what performance can I expect to see from an
algorithm evaluated using data a technician acquired using particular equipment on
one cohort of patients in a specific lab, when it is applied on data acquired by a
different technician in a different lab from a different cohort. It can be readily seen
that slight changes in the data acquisition (such as the duration of exposure to a
dye), different physical characteristics of lab instruments, or indeed different demo-
graphics of patients all pose reasonable grounds for concern. Indeed, at present,
most of the work in digital pathology is rather limited in this regard, in no small
part due to the variety of obstacles in data sharing: there are issues of ethics and pri-
vacy, as well as financial interests at stake.
The authors would like to acknowledge Inés Nearchou for kindly providing images for the
figures within this chapter.
[1] Caie PD, Walls RE, Ingleston-Orme A, Daya S, Houslay T, Eagle R, et al. High-content
phenotypic profiling of drug response signatures across distinct cancer cells. Molecular
Cancer Therapeutics 2010;9(6):1913e26.
[2] Deans GT, Heatley M, Anderson N, Patterson CC, Rowlands BJ, Parks TG, et al. Jass’
classification revisited. Journal of the American College of Surgeons 1994;179(1):11e7.
[3] Lim D, Alvarez T, Nucci MR, Gilks B, Longacre T, Soslow RA, et al. Interobserver
variability in the interpretation of tumor cell necrosis in uterine leiomyosarcoma. The
American Journal of Surgical Pathology 2013;37(5):650e8.
[4] Chandler I, Houlston RS. Interobserver agreement in grading of colorectal cancersd
findings from a nationwide web-based survey of histopathologists. Histopathology
[5] Caie PD, Zhou Y, Turnbull AK, Oniscu A, Harrison DJ. Novel histopathologic feature
identified through image analysis augments stage II colorectal cancer clinical reporting.
Oncotarget 2016;7(28):44381e94.
[6] Nearchou IP, Lillard K, Gavriel CG, Ueno H, Harrison DJ, Caie PD. Automated analysis
of lymphocytic infiltration, tumor budding, and their spatial relationship improves prog-
nostic accuracy in colorectal cancer. Cancer Immunology Research 2019;7(4):609e20.
[7] Khameneh FD, Razavi S, Kamasak M. Automated segmentation of cell membranes to
evaluate HER2 status in whole slide images using a modified deep learning network.
Computers in Biology and Medicine 2019;110:164e74.
References 171
[8] Widmaier M, Wiestler T, Walker J, Barker C, Scott ML, Sekhavati F, et al. Comparison
of continuous measures across diagnostic PD-L1 assays in non-small cell lung cancer
using automated image analysis. Modern Pathology 2019;33.
[9] Puri M, Hoover SB, Hewitt SM, Wei BR, Adissu HA, Halsey CHC, et al. Automated
computational detection, quantitation, and mapping of mitosis in whole-slide images
for clinically actionable surgical pathology decision support. Journal of Pathology
Informatics 2019;10:4.
[10] Brieu N, Gavriel CG, Nearchou IP, Harrison DJ, Schmidt G, Caie PD. Automated
tumour budding quantification by machine learning augments TNM staging in
muscle-invasive bladder cancer prognosis. Scientific Reports 2019;9(1):5174.
[11] Brieu N, Gavriel CG, Harrison DJ, Caie PD, Schmidt G. Context-based interpolation of
coarse deep learning prediction maps for the segmentation of fine structures in immu-
nofluorescence images. SPIE; 2018.
[12] Roy S, kumar Jain A, Lal S, Kini J. A study about color normalization methods for his-
topathology images. Micron 2018;114:42e61.
[13] Cruz-Roa A, Gilmore H, Basavanhally A, Feldman M, Ganesan S, Shih NNC, et al. Ac-
curate and reproducible invasive breast cancer detection in whole-slide images: a deep
learning approach for quantifying tumor extent. Scientific Reports 2017;7:46450.
[14] Litjens G, Bandi P, Ehteshami Bejnordi B, Geessink O, Balkenhol M, Bult P, et al. 1399
H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON
dataset. GigaScience 2018;7(6).
[15] Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Clas-
sification and mutation prediction from nonesmall cell lung cancer histopathology im-
ages using deep learning. Nature Medicine 2018;24(10):1559e67.
[16] Sirinukunwattana K, Domingo E, Richman S, Redmond KL, Blake A, Verrill C, et al.
Image-based consensus molecular subtype classification (imCMS) of colorectal cancer
using deep learning. bioRxiv 2019:645143.
[17] Havel JJ, Chowell D, Chan TA. The evolving landscape of biomarkers for checkpoint
inhibitor immunotherapy. Nature Reviews Cancer 2019;19(3):133e50.
[18] Wolpert DH, Macready WG. No free lunch theorems for optimization. Transactions on
Evolutionary Computation 1997;1(1):67e82.
[19] Dimitriou N, Arandjelovic O, Harrison DJ, Caie PD. A principled machine learning
framework improves accuracy of stage II colorectal cancer prognosis. NPJ Digital Med-
icine 2018;1(1):52.
[20] Harder N, Schonmeyer R, Nekolla K, Meier A, Brieu N, Vanegas C, et al. Automatic
discovery of image-based signatures for ipilimumab response prediction in malignant
melanoma. Scientific Reports 2019;9(1):7449.
[21] Yue X, Dimitriou N, Arandjelovic O. Colorectal cancer outcome prediction from
H&E whole slide images using machine learning and automatically inferred pheno-
type profiles. February 01, 2019. Available from:
[22] Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V,
Busam KJ, et al. Clinical-grade computational pathology using weakly supervised
deep learning on whole slide images. Nature Medicine 2019;25(8):1301e9.
[23] Bishop CM. Pattern recognition and machine learning. New York, USA: Springer- Ver-
lag; 2007.
[24] Vente D, Arandjelovic O, Baron V, Dombay E, Gillespie S. Using machine learning for
automatic counting of lipid-rich tuberculosis cells influorescence microscopy images.
172 CHAPTER 8 Precision medicine in digital pathology via image
[45] Xing F, Yang L. Robust nucleus/cell detection and segmentation in digital pathology
and microscopy images: a comprehensive review. IEEE Reviews in Biomedical Engi-
neering 2016;9:234e63.
[46] Fan J, Arandjelovic O. Employing domain specific discriminative information to
address inherent limitations of the LBP descriptor in face recognition. In: Proc. IEEE
international joint conference on neural networks; 2018. p. 3766e72.
[47] Lowe DG. Distinctive image features from scale-invariant keypoints. International Jour-
nal of Computer Vision 2003;60(2):91e110.
[48] Arandjelovic O. Object matching using boundary descriptors. In: Proc. British machine
vision conference; 2012.
[49] Mehta N, Raja’S A, Chaudhary V. Content based sub-image retrieval system for high-
resolution pathology images using salient interest points. In: 2009 Annual international
conference of the IEEE engineering in medicine and biology society. IEEE; 2009.
p. 3719e22.
[50] Karunakar Y, Kuwadekar A. An unparagoned application for red blood cell counting
using marker controlled watershed algorithm for android mobile. In: 2011 Fifth inter-
national conference on next generation mobile applications, services and
technologies. IEEE; 2011. p. 100e4.
[51] Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-
consistent adversarial networks. In: Proceedings of the IEEE international conference
on computer vision; 2017. p. 2223e32.
[52] Jamaluddin MF, Fauzi MFA, Abas FS. Tumor detection and whole slide classification of
h&e lymph node images using convolutional neural network. In: Proc. IEEE Interna-
tional conference on signal and image processing applications; 2017. p. 90e5.
[53] Albayrak A, Ünlü A, Çalık N, Bilgin G, Türkmen I,_ Çakır A, Ça- par A, Töreyin BU,
Ata LD. Segmentation of precursor lesions in cervical cancer using convolutional neural
networks. In: Proc. Signal processing and communications applications conference;
2017. p. 1e4.
[54] Mackillop WJ. The importance of prognosis in cancer medicine. TNM Online; 2003.
[55] Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH. Patch-based convolutional
neural network for whole slide tissue image classification. In: Proc. IEEE conference
on computer vision and pattern recognition; 2016. p. 2424e33.
[56] Zhu X, Yao J, Zhu F, Huang J. Wsisa: making survival prediction from whole slide his-
topathological images. In: IEEE conference on computer vision and pattern recognition;
2017. p. 7234e42.
[57] Erhan D, Bengio Y, Courville A, Vincent P. Visualizing higher-layer features of a deep
network. University of Montreal 2009;1341(3):1.
[58] Cooper J, Arandjelovic O. Visually understanding rather than merely matching ancient
coin images. In: Proc. INNS conference on big data and deep learning; 2019.
[59] Schlag I, Arandjelovic O. Ancient Roman coin recognition in the wild using deep
learning based recognition of artistically depicted face profiles. In: Proc. IEEE interna-
tional conference on computer vision; 2017. p. 2898e906.
[60] Drummond C. Replicability is not reproducibility: nor is it good science. 2009.
[61] Kätsyri J, Hari R, Ravaja N, Nummenmaa L. The opponent matters: elevated fmri
reward responses to winning against a human versus a computer opponent during inter-
active video game playing. Cerebral Cortex 2013;23(12):2829e39.