The Application of Bayesian Theorem
The Application of Bayesian Theorem
Volume 49 (2023)
1. Introduction
Bayesian theorem, discovered by Bayes, has attracted the attention of the mathematical circle. This
outstanding achievement, which has an important influence on both modern probability theory and
mathematical statistics, still gains high popularity after more than 200 years and has been widely used
and studied in many fields, such as [1],[2],[3]. In 1774, Laplace published his paper, giving a
comprehensive and popular exposition of Bayesian ideas [4]. Then, in 1781 and 1786, Laplace
published two more papers on Bayesian theory, improving and expanding on the ideas in 1774.
Through these articles, people can have a deeper and more direct understanding of Bayes’ thoughts.
Next, Finetti proposed and developed the subjective probability theory [5]. Besides, the modern utility
theory is established by Neumann and Morgenstern. In particular, the most perfect theoretical form
of Bayesian theory was proposed by Savage, which is called classical Bayesian theory. This theorem
mainly describes how to calculate the probability of event A happening given that event B occurs. In
𝑃(𝐴∩𝐵)
mathematical terms, that means 𝑃(𝐴|𝐵) = [6]. Prior probability, the probability that people
𝑃(𝐵)
get from a subjective judgment of events, is represented by 𝑃(𝐴). Conditional probability is a
modified probability based on the objective investigation, which is represented by 𝑃(𝐴|𝐵). The
discovery of Bayesian theorem greatly promotes the development of probabilistic statistics.
Compared with traditional probability estimation that cannot be modified, Bayesian theorem can be
constantly modified, which substantially improves the practicability of probability statistics [7].
Compared with traditional classical estimation, Bayesian theorem takes subjectivity as the starting
point and shows great advantages. However, Bayesian theorem also has certain limitations, as it is
based on subjective judgment and has strong subjectivity [8]. Due to each person's different
interpretation of prior information, the prior probabilities obtained are different, and the posterior
probabilities obtained are also varied, which lacks scientific objectivity [7].
Based on the above certain understanding of Bayesian theorem, application and promotion of
Bayesian theorem are introduced and explained in the following three aspects, which are finance,
520
Highlights in Science, Engineering and Technology AMMSAC 2023
Volume 49 (2023)
computer science, and medicine, respectively. In finance, Bayesian theorem helps decision-makers
to predict important evidence in the decision-making process based on Bayesian theorem, combining
the information by existing data and subsequent sampling results. Therefore, investors will improve
the effectiveness of decision-making and reduce the blindness of investment. When it comes to
computer science, algorithms based on Bayesian theorem can correctly identify and filter spam, which
greatly enhances the efficiency of the use of cyberspace and saves valuable time for businessmen. At
last, Bayesian theorem also shows satisfying performance in medicine. Through it, doctors can
combine expert opinions with previous cases to make a more reasonable and effective diagnosis for
patients.
The wide application of Bayesian theorem benefits from its characteristics to develop rapidly,
which brings countless novelty and convenience to our daily life. With the development of society,
Bayesian theorem will continue to develop, resulting in better use in more emerging fields.
521
Highlights in Science, Engineering and Technology AMMSAC 2023
Volume 49 (2023)
(𝑥−𝜃)2
1
ℎ(𝜃) = 𝑒 2𝜏2 (5)
√2𝜋𝜏
In order to get the expression of 𝑘(𝑥), we utilize function (1) and function (2), combining the
theorem 2.1.1. After calculation, the result is as follow.
1 1 𝜇 𝑥 2 (𝜇−𝑥)2
1 −2𝜌[𝜃−𝜌( 2 + 2 )] −
𝑓 (𝑥|𝜃) ℎ(𝜃) = 𝑒 𝜏 𝜎 𝑒 2(𝜎2 +𝜏2 ) ,
2𝜋𝜎𝜏
1 1 𝜏2 +𝜎 2
where 𝜌 = + = . (6)
𝜏2 𝜎2 𝜏2 𝜎 2
From the above equations (4), (5) and (6), the marginal density function of 𝑥can be calculated
which is 𝑘(𝑥).
(𝑥−𝜇)2
1 −
𝑘(𝑥) = 𝑒 2(𝜎2 +𝜏2 ) (7)
√2𝜋𝜌𝜎𝜏
As a result, we can utilize function (6) and function (7) to obtain the expression of the posterior
function ℎ(𝜃|𝑥), which is especially vital clue in decision-making process.
1 1 𝜇 𝑥 2
𝜌 − 𝜌[𝜃− ( 2 + 2 )]
ℎ(𝜃|𝑥) = 𝑒 2 𝜌 𝜏 𝜎 (8)
√2𝜋
Based on the above theorems, we know that the distribution of posterior function ℎ(𝜃|𝑥) follows
the normal distribution.
1
ℎ(𝜃|𝑥)~𝑁(𝜇(𝑥), ), (9)
𝜌
1 𝜇 𝑥 𝜎2 𝜏2
Where 𝜇(𝑥) = ( 2 + 2) = 𝜇+ 𝑥.
𝜌 𝜏 𝜎 𝜎 2 +𝜏2 𝜎 2 +𝜏2
Through the results presented, posterior information is the combination of prior information and
sample information. To be more specific, 𝜇(𝑥) is exactly the weighted average of the mean μ of the
𝜎2 𝜏2
prior distribution and the observed value x, whose coefficients are 2 2 and 2 2, respectively.
𝜎 +𝜏 𝜎 +𝜏
Hence, we can conclude that the application in the discussion makes good use of prior information
and sample information, aiding the decision-makers to better master the detail of different situations.
If possible, the scientists in this field of work will spare no effort to test its efficiency by applying
it in more practical cases and investigating the expression function of discrete random variables to
broaden its usable range.
2.2. Computer Science: Spam Filtering
With the development of the Internet, E-mail as a fast and economical way of communication has
been popularized. Mail is one of the most popular applications on the Internet [9]. Therefore, when
there is a large number of unrecognized spam filling our mailboxes, the network broadband and server
storage space will be misused, causing great trouble to the high-speed operation of the network. To
decline this waste, many scientists have made detailed studies in this area. Returning to the mail itself,
the problem of mail classification can be regarded as a binary classification problem, which divides
the mail into legitimate mail and spam, and various text classification methods can be utilized to filter
spam. Therefore, rule-based Ripper algorithm [10], decision tree C4.5 algorithm [11], Boosting
method [12], Rough Set method [13], kNN algorithm [14], and Bayesian classification method all
have been applied in learning this problem. Among them, Bayes’ classification method has been
widely studied and discussed due to its unique and outstanding performance [2].
In the following analysis, we decide to make assumptions and clearly explain this topic base on
Navie Bayse. Hoping to better understand this problem, we must introduce the related concept of the
theory first.
Definition 2.2.1 [2] Bayesian network is a binary B =< G, θ >. Thereinto, G is a directed
acyclic graph, in which nodes represent random a variable𝑋𝑖 , and directed edges between nodes
522
Highlights in Science, Engineering and Technology AMMSAC 2023
Volume 49 (2023)
represent conditional dependencies between random variables. θ is the parameter vector of nodes,
and each component is a conditional probability table, which defines the local probability distribution
of corresponding nodes.
The structure of Bayesian network indicates that a node 𝑋𝑖 is independent of the non-descendant
nodes in the network under the condition of the given parent node. A Bayesian classifier is a Bayesian
network for classification tasks, which contains a node 𝐶 representing categorical variable and a node
𝑋𝑖 representing characteristic variables. Taking 𝑋 (the realization of the characteristic variable is
(𝑥, 𝑥2 , … 𝑥𝑛 ) for example, Bayesian network allows us to compute the probability of each possible
categories 𝑐𝑘 , 𝑃(𝐶 = 𝑐𝑘 |𝑋 = 𝑥), and the task of classification is to find out which 𝑐𝑘 maximizes
the 𝑃(𝐶 = 𝑐𝑘 |𝑋 = 𝑥).
Theorem 2.2.1 [2]
𝑃 (𝑋 = 𝑥|𝐶 =𝑐𝑘 )𝑃 (𝐶 =𝑐𝑘 )
𝑃(𝐶 = 𝑐𝑘 |𝑋 = 𝑥) = (10)
𝑃 (𝑋 = 𝑥)
In this formula, 𝑃 (𝑋 = 𝑥) is the same for every 𝑐𝑘 , so we don't have to worry about it. The
prior probability 𝑃 (𝐶 = 𝑐𝑘 ) can be expressed as the proportion of the total number of vectors
belonging to category 𝑐𝑘 to the total number of vectors in the sample space. In order to calculate
𝑃(𝐶 = 𝑐𝑘 |𝑋 = 𝑥), we adopt the Navie Bayse which assumes that each characteristic variable 𝑋𝑖 is
independent under a given category variable 𝐶. This classification takes the initial form of variable
independence hypothesis, which is also the most restrictive form [15].
In the Bayesian network shown in Figure 1, there are only arcs from category variables to feature
variables, while there are no arcs between feature variables 𝑋𝑖 .
As a result, 𝑃 (𝑋 = 𝑥 |𝐶 = 𝑐𝑘 ) = ∏𝑖 𝑃 (𝑋𝑖 = 𝑥𝑖 |𝐶 = 𝑐𝑘 ).
In the calculation process, the maximum likelihood estimation of 𝑥𝑖 which is taken from the
𝑛
sample is treated as 𝑃(𝑋𝑖 = 𝑥𝑖 ) under given 𝑐𝑘 . Therefore, 𝑃(𝑋𝑖 = 𝑥𝑖 |𝐶 = 𝑐𝑘 ) = 𝑥𝑖_𝑖𝑛_𝑐𝑘 [2].
𝑛𝑐𝑘
𝑛𝑐𝑘 is the sample number in the category 𝑐𝑘 , and 𝑛𝑥𝑖_𝑖𝑛_𝑐𝑘 is the number of samples whose
characteristic variable 𝑋𝑖 is equal to 𝑥𝑖 in category 𝑐𝑘 [2].
The efficiency of naive Bayes is relatively high. Let 𝑚 be the number of classification variables
and 𝑁 be the number of training samples, the time of learning the naive Bayes model is 𝑂(𝑚𝑁)
and the time of classification at runtime is 𝑂(𝑚) [2]. Besides, through this method, many
classification applications, including spam filtering, have achieved surprising results. Extension of
this assumption to further improve performance has been widely studied, resulting in several variants
of the naive Bayes. TAN (Tree Augmented Naive Bayes) is one of the variants, which loosens the
independence hypothesis in naive Bayes and extends the structure of naive Baye, as it allows variables
other than category variables to have a tree structure in Figure 2.
523
Highlights in Science, Engineering and Technology AMMSAC 2023
Volume 49 (2023)
𝑃(𝐷+ |𝑇 + ) is the probability of positive (illness) in the diagnostic test, which is called the positive
prediction value. The prevalence can be interpreted as 𝑃(𝐷+ ), which is the frequency of illness in
the subject population. 𝑃(𝑇 + |𝐷+ ) is the probability of positive test results of patients, which is
called sensitivity. 𝑃(𝑇 − |𝐷− ) is the probability of negative test results for patients without disease,
namely the specific degrees.
Such applications have yielded good feedback in practical results, such as the specificity of
antigen and antibody detection often reaching more than 99%. However, we should not ignore that
we can only use this model to make some inferences about the results of disease diagnosis only when
we have mass reliable data. That is definitely a disappointing outcome to research rare diseases, which
may even greatly influent the patient’s daily life, due to the lack of study samples.
Here is a clinical example to show how Bayesian theorem can be used in a real case [3]:
In order to solve several vital problems in breast lump disease, a retrospective analysis was
performed in a hospital to collect the diagnosis of breast lump disease in the past 12 months. After
analysis, the hospital finally diagnosed 240 cases of fibro tumor, 16 cases of breast disease, and 50
cases of breast cancer. The basic age, disease status, and lump surface area of these patients are shown
in Table 1.
Table 1. Clinical features of 450 cases of breast lump
Fibro tumor Breast disease Breast cancer
(𝐷1 ) (𝐷2 ) (𝐷3 )
Age <40 192 (80.0%) 133(83.1%) 7(14.0%)
(𝑇1 ) >40 48(20.0%) 27(16.9%) 43(86.0%)
524
Highlights in Science, Engineering and Technology AMMSAC 2023
Volume 49 (2023)
As shown in Table 2, using a single measure of mass surface irregularities as a criterion, the patient
had a 48% chance of fibroadenoma, a 34% chance of breast disease, and an 18% chance of breast
cancer, according to the Bayesian formula.
Generally, it is very difficult to use multiple indicators as conditions for further probability
calculation, so we assume that the multiple indicators are independent. For this example, this is a
reasonable assumption. Hence, we have the formula of m indicators (𝐼𝑖 ):
𝑃(𝐷𝑖 )𝑃(𝐼1 | 𝐷𝑖 )𝑃(𝐼2 | 𝐷𝑖 )…𝑃(𝐼𝑚 | 𝐷𝑖 )
𝑃(𝐷𝑖 |𝐼1 𝐼2 … 𝐼𝑚 ) = ∑𝐼 𝑃(𝐷𝑖 )𝑃(𝐼1 | 𝐷𝑖 )𝑃(𝐼2 | 𝐷𝑖 )…𝑃(𝐼𝑚 | 𝐷𝑖 )
(12)
When the two indicators of irregular surface of lump and age >40 years are used simultaneously,
the posterior probability is: Suppose𝐴 = 0.5333 × 0.2 × 0.513, 𝐵 = 0.3556 × 0.169 × 0.538, 𝐶 =
0.1111 × 0.86 × 0.92.
𝐴
𝑃(𝐷1 |𝐴2 𝑆2 ) = = 0.3127 (13)
𝐴+𝐵+𝐶
𝐵
𝑃(𝐷2 |𝐴2 𝑆2 ) = = 0.1845 (14)
𝐴+𝐵+𝐶
𝐶
𝑃(𝐷3 |𝐴2 𝑆2 ) = = 0.5028 (15)
𝐴+𝐵+𝐶
3. Conclusion
The content and development of Bayesian theorem are briefly introduced to provide a clear
picture of its strengths and development prospect at the beginning. In the following, the applications
of Bayesian theorem in three main fields will be discussed, which are finance, computer science, and
medicine respectively. In the financial field, investors often encounter the problem of making
decisions under uncertainty. Given this risk, the method to improve the correctness of the decision
is given by us, that is, to conduct sampling experiments at first. According to the information provided
by the sampling results, decision-makers can deepen their understanding of various natural states that
affect decision-making before making decisions, which greatly reduces the blindness and risk of
investment. When it comes to computer science, the principle of Bayesian classifier is clearly
explained. In addition, in order to effectively identify spam, the Naive Bayes classification algorithm
and one of its improvements are introduced and compared. In medicine, doctors can utilize Bayesian
theorem to gather information related to the diagnosis that has occurred, based on their existing
experience. As a result, the efficiency of clinical medicine and the correct rate of disease diagnosis
are improved. In addition, the advantages of Bayesian theorem in this field are verified by the
simulation of real cases.
References
[1] N. Zhang, J. Li, Risk Decision of Project Investment Based on Bayes Method, Optimization of Capital
Construction, vol. 27, 2006, pp. 54-56. DOI: CNKI: SUN: JJYH.0.2006-03-017
[2] M. Zhang, Y. Li, W. Li, Survey of Application of Bayesian Classifying Method to Spam Filtering,
Application Research of Computers, vol. 22, 2005, pp. 14-19. DOI: 10.3969/j.issn.1001-3695. 2005.
08.004
[3] B. Tang, The Application of Bayesian Theorem in Disease Diagnosis, Science and Technology &
Innovation, 2019, pp. 152-153. DOI: 10.15913/j.cnki.kjycx.2019.01.152
525
Highlights in Science, Engineering and Technology AMMSAC 2023
Volume 49 (2023)
[4] Z. Yu, Who Established Bayesian School? Bayes or Laplace? Journal of Statistics and Information, 2008,
pp. 85-90. DOI: 10.3969/j.issn.1007-3116.2008.01.017
[5] X. Ren, Z. Li, A General Survey of the Development and Research Trend of Bayesian Decision Theory,
Studies in Philosophy of Science and Technology, vol. 30, 2013, pp. 1-7. DOI: CNKI: SUN:
KXBZ.0.2013-02-002
[6] X. Yao, Analysis of Bayes theorem and its application, Education Modernization, vol.4, 2017, pp. 254-
256. DOI: 10.16541/j.cnki.2095-8420.2017.06.093
[7] C. Liao; A brief analysis of Bayes theorem and its application, Motherland, 2019, pp. 63-64. DOI: CNKI:
SUN: ZUGU.0.2019-12-026
[8] J. Yang, D. Chen, X. Cheng, Several applications of Bayes' formula, College Mathematics, vol.27, 2011,
pp. 166-169. DOI: 10.3969/j.issn.1672-1454.2011.02.037
[9] W. Luo, X. Gao, X. Ou, Y. Liu, D. Li, The Data Security and Secure Content of E-mail, Journal of
Computer Applications, 2002, pp. 22-24. DOI: CNKI: SUN: JSJY.0.2002-03-006
[10] W. Cohen, Fast Effective Rule Induction, in Machine Learning, Proceedings of the 12th International
Conference, Lake Taho, California, Mongan Kanfmann, 1995, pp. 115-123. DOI: 10.1016/b978-1-55860-
377-6.50023-2
[11] X. Carreras, L. Marquez, Boosting Trees for Anti-Spam E-mail Filtering, Proceedings of Euro Conference
Recent Advances in NLP (RANLP-2001), 2001, pp. 58-64. DOI: 10.1016/0375-6505(85)90011-2
[12] I. Androutsopoulos, G. Paliouras, E. Michelakis, Learning to Filter Unsolicited Commercial E-mail,
Technical Report 2004/2, NC SR Demokritos, vol. 46, 2004, pp. 153-154. DOI:
10.1109/MAP.2004.1388867
[13] Y. Liu, X. Du, X. Huang, Z. Hou, C. Guo, E. Zhou, H. Luo, The Design of Spam Intelligent Filtering
System, Computer Technology and Development, 2003, pp. 1-3. DOI: CNKI: SUN: WJFZ.0.2003-04-
001
[14] H. Drucker, D. Wu, V. N. Vapnik, Support Vector Machines for Spam Categorization, IEEE Transactions
on Neural Networks, vol. 10, 1999, pp. 1048-1054. DOI: 10.1109/72.788645
[15] P. Langley, W. Iba, K. Thompson. An Analysis of Bayesian Classifiers, Proc. of the 10th National Con.fon
Artificial Intelligence, Menlo Park, AAAI Press, 1992, pp. 223-228.
[16] S. Gao, Application of Bayesian Theorem in Disease Diagnosis, Mathematics Learning and Research,
2008, pp. 116. DOI: CNKI: SUN: SXYG.0.2008-12-103
526