0% found this document useful (0 votes)

15 views

Sam Roweis Probx

Uploaded by

Jingkui Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Sam Roweis Probx

Uploaded by

Jingkui Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Random Variables and Densities

Review: • Random variables X represents outcomes or states of world.

Instantiations of variables usually in lower case: x
Probability and Statistics We will write p(x) to mean probability(X = x).
• Sample Space: the space of all possible outcomes/states.
(May be discrete or continuous or mixed.)
• Probability mass (density) function p(x) ≥ 0
Sam Roweis Assigns a non-negative number
P to each pointRin sample space.
Sums (integrates) to unity: x p(x) = 1 or x p(x)dx = 1.
Intuitively: how often does x occur, how much do we believe in x.
• Ensemble: random variable + sample space+ probability function
October 2, 2002

Probability Expectations, Moments

• We use probabilities p(x) to represent our beliefs B(x) about the • Expectation of a function a(x) is written E[a] or hai
states x of the world. X
• There is a formal calculus for manipulating uncertainties E[a] = hai = p(x)a(x)
represented by probabilities. x
P P 2
• Any consistent set of beliefs obeying the Cox Axioms can be
e.g. mean = x xp(x), variance = x(x − E[x]) p(x)
mapped into probabilities. • Moments are expectations of higher order powers.
(Mean is first moment. Autocorrelation is second moment.)
1. Rationally ordered degrees of belief:
if B(x) > B(y) and B(y) > B(z) then B(x) > B(z) • Centralized moments have lower moments subtracted away
2. Belief in x and its negation x̄ are related: B(x) = f [B(x̄)] (e.g. variance, skew, curtosis).
3. Belief in conjunction depends only on conditionals: • Deep fact: Knowledge of all orders of moments
B(x and y) = g[B(x), B(y|x)] = g[B(y), B(x|y)] completely defines the entire distribution.
Joint Probability Conditional Probability

• Key concept: two or more random variables may interact. • If we know that some event has occurred, it changes our belief
Thus, the probability of one taking on a certain value depends on about the probability of other events.
which value(s) the others are taking. • This is like taking a ”slice” through the joint table.
• We call this a joint ensemble and write p(x|y) = p(x, y)/p(y)
p(x, y) = prob(X = x and Y = y)

z
z

p(x,y|z)

p(x,y,z)
y

x
y

Marginal Probabilities Bayes’ Rule

• We can ”sum out” part of a joint distribution to get the marginal • Manipulating the basic definition of conditional probability gives
distribution of a subset of variables: one of the most important formulas in probability theory:
X
p(x) = p(x, y) p(y|x)p(x) p(y|x)p(x)
p(x|y) = =P 0 0
y p(y) x0 p(y|x )p(x )
• This is like adding slices of the table together. • This gives us a way of ”reversing”conditional probabilities.
Σz p(x,y)

• Thus, all joint probabilities can be factored by selecting an ordering

z
for the random variables and using the ”chain rule”:
y

p(x, y, z, . . .) = p(x)p(y|x)p(z|x, y)p(. . . |x, y, z)

y x

x
P
• Another equivalent definition: p(x) = y p(x|y)p(y).
Independence & Conditional Independence Entropy

• Two variables are independent iff their joint factors: • Measures the amount of ambiguity or uncertainty in a distribution:
X
p(x, y) = p(x)p(y) H(p) = − p(x) log p(x)
p(x,y)
p(x)
x

x • Expected value of − log p(x) (a function which depends on p(x)!).

= • H(p) > 0 unless only one possible outcomein which case H(p) = 0.
• Maximal value when p is uniform.
p(y) • Tells you the expected ”cost” if each event costs − log p(event)
• Two variables are conditionally independent given a third one if for
all values of the conditioning variable, the resulting slice factors:
p(x, y|z) = p(x|z)p(y|z) ∀z

Be Careful! Cross Entropy (KL Divergence)

• Watch the context: • An assymetric measure of the distancebetween two distributions:
e.g. Simpson’s paradox X
KL[pkq] = p(x)[log p(x) − log q(x)]
• Define random variables and sample spaces carefully: x
e.g. Prisoner’s paradox • KL > 0 unless p = q then KL = 0
• Tells you the extra cost if events were generated by p(x) but
instead of charging under p(x) you charged under q(x).
Statistics (Conditional) Probability Tables
• Probability: inferring probabilistic quantities for data given fixed • For discrete (categorical) quantities, the most basic parametrization
models (e.g. prob. of events, marginals, conditionals, etc). is the probability table which lists p(xi = k th value).
• Statistics: inferring a model given fixed data observations • Since PTs must be nonnegative and sum to 1, for k-ary variables
(e.g. clustering, classification, regression). there are k − 1 free parameters.
• Many approaches to statistics: • If a discrete variable is conditioned on the values of some other
frequentist, Bayesian, decision theory, ... discrete variables we make one table for each possible setting of the
parents: these are called conditional probability tables or CPTs.

z z

p(x,y,z) p(x,y|z)

y y

x x

Some (Conditional) Probability Functions Exponential Family

• Probability density functions p(x) (for continuous variables) or • For (continuous or discrete) random variable x
probability mass functions p(x = k) (for discrete variables) tell us p(x|η) = h(x) exp{η >T (x) − A(η)}
how likely it is to get a particular value for a random variable 1
(possibly conditioned on the values of some other variables.) = h(x) exp{η >T (x)}
Z(η)
• We can consider various types of variables: binary/discrete is an exponential family distribution with
(categorical), continuous, interval, and integer counts. natural parameter η.
• For each type we’ll see some basic probability models which are • Function T (x) is a sufficient statistic.
parametrized families of distributions.
• Function A(η) = log Z(η) is the log normalizer.
• Key idea: all you need to know about the data is captured in the
summarizing function T (x).
Bernoulli Multinomial

• For a binary random variable with p(heads)=π: • For a set of integer counts on k trials
p(x|π) = π x(1 − π)1−x
 
k! x x
X 
p(x|π) = π1 1 π2 2 · · · πnxn = h(x) exp xi log πi

π x1!x2! · · · xn!
= exp log x + log(1 − π) 
i

1−π
P
• Exponential family with: • But the parameters are constrained: i πi = 1.
Pn−1
π So we define the last one πn = 1 − i=1 πi.
η = log
1−π
nP o
n−1 πi
p(x|π) = h(x) exp i=1 log πn x i + k log π n
T (x) = x
A(η) = − log(1 − π) = log(1 + eη ) • Exponential family with:
h(x) = 1 ηi = log πi − log πn
• The logistic function relates the natural parameter and the chance T (xi) = xi
of heads A(η) = −k log πn = k log i eηi
P
1
π= h(x) = k!/x1!x2! · · · xn!
1 + e−η

Poisson • The softmax function relates the basic and natural parameters:
eηi
• For an integer count variable with rate λ: πi = P ηj
je
λxe−λ
p(x|λ) =
x!
1
= exp{x log λ − λ}
x!
• Exponential family with:
η = log λ
T (x) = x
A(η) = λ = eη
1
h(x) =
x!
• e.g. number of photons x that arrive at a pixel during a fixed
interval given mean intensity λ
• Other count densities: binomial, exponential.
Gaussian (normal) Important Gaussian Facts

• For a continuous univariate random

variable: • All marginals of a Gaussian are again Gaussian.
1 1 Any conditional of a Gaussian is again Gaussian.
p(x|µ, σ 2) = √ exp − 2 (x − µ)2
2πσ ( 2σ )
1 µx x2 µ2
=√ exp − − − log σ
2πσ σ 2 2σ 2 2σ 2
p(y|x=x0)
• Exponential family with: p(x,y)
x0
η = [µ/σ 2 ; −1/2σ 2]
T (x) = [x ; x2]
A(η) = log σ + µ/2σ 2
√ Σ
h(x) = 1/ 2π
• Note: a univariate Gaussian is a two-parameter distribution with a p(x)
two-component vector of sufficient statistis.

Multivariate Gaussian Distribution Gaussian Marginals/Conditionals

• For a continuous vector random variable:
• To find these parameters is mostly linear algebra:
−1/2 1 > −1 Let z = [x>y>]> be normally distributed according to:
p(x|µ, Σ) = |2πΣ| exp − (x − µ) Σ (x − µ)
2
x a A C
z= ∼N ;
y b C> B
where C is the (non-symmetric) cross-covariance matrix between x
• Exponential family with: and y which has as many rows as the size of x and as many
η = [Σ−1µ ; −1/2Σ−1] columns as the size of y.
T (x) = [x ; xx>] The marginal distributions are:
A(η) = log |Σ|/2 + µ>Σ−1µ/2 x ∼ N (a; A)
h(x) = (2π)−n/2 y ∼ N (b; B)
and the conditional distributions are:
• Sufficient statistics: mean vector and correlation matrix.
x|y ∼ N (a + CB−1(y − b); A − CB−1C>)
• Other densities: Student-t, Laplacian.
• For non-negative values use exponential, Gamma, log-normal. y|x ∼ N (b + C>A−1(x − a); B − C>A−1C)
Moments Generalized Linear Models (GLMs)
• For continuous variables, moment calculations are important. • Generalized Linear Models: p(y|x) is exponential family with
• We can easily compute moments of any exponential family conditional mean µ = f (θ >x).
distribution by taking the derivatives of the log normalizer A(η). • The function f is called the response function.
• The q th derivative gives the q th centred moment. • If we chose f to be the inverse of the mapping b/w conditional
mean and natural parameters then it is called the canonical
dA(η) response function.
= mean
dη η = ψ(µ)
d2A(η)
= variance f (·) = ψ −1(·)
dη 2
···
• When the sufficient statistic is a vector, partial derivatives need to
be considered.

Parameterizing Conditionals Potential Functions

• When the variable(s) being conditioned on (parents) are discrete, • We can be even more general and define distributions by arbitrary
we just have one density for each possible setting of the parents. energy functions proportional to the log probability.
e.g. a table of natural parameters in exponential models or a table
X
p(x) ∝ exp{− Hk (x)}
of tables for discrete models. k
• When the conditioned variable is continuous, its value sets some of • A common choice is to use pairwise terms in the energy:
the parameters for the other variables. X X
H(x) = ai x i + wij xixj
• A very common instance of this for regression is the i pairs ij
“linear-Gaussian”: p(y|x) = gauss(θ >x; Σ).
• For discrete children and continuous parents, we often use a
Bernoulli/multinomial whose paramters are some function f (θ >x).
Special variables Likelihood Function

• If certain variables are always observed we may not want to model • So far we have focused on the (log) probability function p(x|θ)
their density. For example inputs in regression or classification. which assigns a probability (density) to any joint configuration of
This leads to conditional density estimation. variables x given fixed parameters θ.
• If certain variables are always unobserved, they are called hidden or • But in learning we turn this on its head: we have some fixed data
latent variables. They can always be marginalized out, but can and we want to find parameters.
make the density modeling of the observed variables easier. • Think of p(x|θ) as a function of θ for fixed x:
(We’ll see more on this later.)
L(θ; x) = p(x|θ)
`(θ; x) = log p(x|θ)
This function is called the (log) “likelihood”.
• Chose θ to maximize some cost function c(θ) which includes `(θ):
c(θ) = `(θ; D) maximum likelihood (ML)
c(θ) = `(θ; D) + r(θ) maximum a posteriori (MAP)/penalizedML
(also cross-validation, Bayesian estimators, BIC, AIC, ...)

Multiple Observations, Complete Data, IID Sampling Maximum Likelihood

• A single observation of the data X is rarely useful on its own. • For IID data:
p(xm|θ)
Y
• Generally we have data including many observations, which creates p(D|θ) =
a set of random variables: D = {x1, x2, . . . , xM } m
log p(xm|θ)
X
• Two very common assumptions: `(θ; D) =
m
1. Observations are independently and identically distributed
according to joint distribution of graphical model: IID samples. • Idea of maximum likelihod estimation (MLE): pick the setting of
2. We observe all random variables in the domain on each parameters most likely to have generated the data we saw:
∗ = argmax `(θ; D)
θML
observation: complete data. θ
• Very commonly used in statistics.
Often leads to “intuitive”, “appealing”, or “natural” estimators.
Example: Bernoulli Trials Example: Univariate Normal
• We observe M iid coin flips: D=H,H,T,H,. . . • We observe M iid real samples: D=1.18,-.25,.78,. . .
• Model: p(H) = θ p(T ) = (1 − θ) • Model: p(x) = (2πσ 2)−1/2 exp{−(x − µ)2/2σ 2}
• Likelihood: • Likelihood (using probability density):
`(θ; D) = log p(D|θ) `(θ; D) = log p(D|θ)
Y m m
= log θx (1 − θ)1−x M 1 X (xm − µ)2
mX = − log(2πσ 2) −
2 2 m σ2
xm + log(1 − θ) (1 − xm)
X
= log θ
m m • Take derivatives and set to zero:
= log θNH + log(1 − θ)NT ∂` = (1/σ 2)
P
∂µ m(xm − µ)
• Take derivatives and set to zero: ∂` M + 1 P (x − µ)2
= − 2σ
∂σ 2 2 2σ 4 m m
∂` NH N
= − T
P
⇒ µML = (1/M ) m xm
∂θ θ 1−θ 2 = (1/M ) m x2m − µ2ML
P
∗ NH σML
⇒ θML =
NH + N T

Example: Multinomial Example: Univariate Normal

• We observe M iid die rolls (K-sided): D=3,1,K,2,. . .

P
• Model: p(k) = θk k θk = 1
• Likelihood (for binary indicators [xm = k]):
B
`(θ; D) = log p(D|θ)
Y Y [xm=1] [xm=k]
= log θxm = log θ1 . . . θk A
Xm m
m
X X
= log θk [x = k] = Nk log θk
k m k
P
• Take derivatives and set to zero (enforcing k θk = 1):
∂` N
= k −M
∂θk θk
Nk x
∗
⇒ θk =
M
Example: Linear Regression Sufficient Statistics

• In linear regression, some inputs (covariates,parents) and all • A statistic is a function of a random variable.
outputs (responses,children) are continuous valued variables. • T (X) is a “sufficient statistic” for X if
• For each child and setting of discrete parents we use the model: T (x1) = T (x2) ⇒ L(θ; x1) = L(θ; x2) ∀θ
p(y|x, θ) = gauss(y|θ x, σ 2)>
• Equivalently (by the Neyman factorization theorem) we can write:
• The likelihood is the familiar “squared error” cost: p(x|θ) = h (x, T (x)) g (T (x), θ)
1 X m
`(θ; D) = − 2 (y − θ>xm)2 • Example: exponential family models:
2σ m
p(x|θ) = h(x) exp{η >T (x) − A(η)}
• The ML parameters can be solved for using linear least-squares:
∂`
(y m − θ>xm)xm
X
=−
∂θ m
⇒ θML∗ = (X>X)−1X>Y

Example: Linear Regression Sufficient Statistics are Sums

• In the examples above, the sufficient statistics were merely sums

(counts) of the data:
Bernoulli: # of heads, tails
y Multinomial: # of each type
X
x Gaussian: mean, mean-square
x x Regression: correlations
x x
x
x x
x x x • As we will see, this is true for all exponential family models:
x x x x
x x
sufficient statistics are average natural parameters.
x x x
x • Only exponential family models have simple sufficient statistics.
x
x

Y x
MLE for Exponential Family Models Fundamental Operations with Distributions

• Recall the probability function for exponential models: • Generate data: draw samples from the distribution. This often
>
p(x|θ) = h(x) exp{η T (x) − A(η)} involves generating a uniformly distributed variable in the range
[0,1] and transforming it. For more complex distributions it may
• For iid data, sufficient statistic is m T (xm):
P
involve an iterative procedure that takes a long time to produce a
single sample (e.g. Gibbs sampling, MCMC).
! !
m m
X X
>
`(η; D) = log p(D|η) = log h(x ) −M A(η)+ η T (x ) • Compute log probabilities.
m m
When all variables are either observed or marginalized the result is a
• Take derivatives and set to zero: single number which is the log prob of the configuration.
∂` P m ∂A(η)
∂η = m T (x ) − M ∂η • Inference: Compute expectations of some variables given others
∂A(η) 1 P T (xm) which are observed or marginalized.
⇒ ∂η = M m
1
ηML = M m T (xm)
P • Learning.
Set the parameters of the density functions given some (partially)
recalling that the natural moments of an exponential distribution observed data to maximize likelihood or penalized likelihood.
are the derivatives of the log normalizer.

Basic Statistical Problems Learning

• Let’s remind ourselves of the basic problems we discussed on the • In AI the bottleneck is often knowledge acquisition.
first day: density estimation, clustering classification and regression. • Human experts are rare, expensive, unreliable, slow.
• Density estimation is hardest. If we can do joint density estimation • But we have lots of data.
then we can always condition to get what we want:
• Want to build systems automatically based on data and a small
Regression: p(y|x) = p(y, x)/p(x)
amount of prior information (from experts).
Classification: p(c|x) = p(c, x)/p(x)
Clustering: p(c|x) = p(c, x)/p(x) c unobserved
Known Models

• Many systems we build will be essentially probability models.

• Assume the prior information we have specifies type & structure of
the model, as well as the form of the (conditional) distributions or
potentials.
• In this case learning ≡ setting parameters.
• Also possible to do “structure learning” to learn model.

Jensen’s Inequality

• For any concave function f () and any distribution on x,

E[f (x)] ≤ f (E[x])
f(E[x])

E[f(x)]

√
• e.g. log() and are concave
P
• This allows us to bound expressions like log p(x) = log z p(x, z)

Probability Cheatsheet
100% (1)
Probability Cheatsheet
10 pages
Operation Manual XW200 - Version02
100% (2)
Operation Manual XW200 - Version02
44 pages
Unit-Ii: Probability I: Introductory Ideas
No ratings yet
Unit-Ii: Probability I: Introductory Ideas
28 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
ML_Lec 2- Review of probability and statistics
No ratings yet
ML_Lec 2- Review of probability and statistics
30 pages
stochbasics_handout
No ratings yet
stochbasics_handout
36 pages
Probability Cheatsheet
No ratings yet
Probability Cheatsheet
10 pages
PTSP
No ratings yet
PTSP
101 pages
Data Analysis for Social Scientists Cheatsheet
No ratings yet
Data Analysis for Social Scientists Cheatsheet
12 pages
Probability_FoundationalMathofAI_S24
No ratings yet
Probability_FoundationalMathofAI_S24
7 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
Probability Theory Cheat Sheet
No ratings yet
Probability Theory Cheat Sheet
10 pages
Mit Micro Economics Lecture
No ratings yet
Mit Micro Economics Lecture
9 pages
Probability Cheatsheet
100% (2)
Probability Cheatsheet
10 pages
Material_MAT3003_Modules-(1+2+3)
No ratings yet
Material_MAT3003_Modules-(1+2+3)
63 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
Probability Cheatsheet v2.0: Thinking Conditionally
No ratings yet
Probability Cheatsheet v2.0: Thinking Conditionally
10 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
AAS24_1
No ratings yet
AAS24_1
29 pages
Operations_Research_Lesson_3-1
No ratings yet
Operations_Research_Lesson_3-1
42 pages
ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
lec-1 probabilistic models
No ratings yet
lec-1 probabilistic models
29 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Rvrlecture 1
No ratings yet
Rvrlecture 1
20 pages
probs_stats
No ratings yet
probs_stats
26 pages
Stats Review
No ratings yet
Stats Review
65 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Conditional probability, Bayes rule
No ratings yet
Conditional probability, Bayes rule
22 pages
Probability notes
No ratings yet
Probability notes
19 pages
Probability
No ratings yet
Probability
9 pages
3 Probability
No ratings yet
3 Probability
33 pages
Isds Exam 2 Notes
No ratings yet
Isds Exam 2 Notes
10 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
110 pages
Probability-The Science of Uncertainty and Data
No ratings yet
Probability-The Science of Uncertainty and Data
4 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
SF 2940 Forms
No ratings yet
SF 2940 Forms
23 pages
Probability Random Variables and Random Processes Part 1
100% (10)
Probability Random Variables and Random Processes Part 1
30 pages
Advance Statistics
No ratings yet
Advance Statistics
292 pages
Week 5 Chapter 4 Basic Probability
No ratings yet
Week 5 Chapter 4 Basic Probability
45 pages
PTSP PPT
No ratings yet
PTSP PPT
74 pages
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
Material_MAT3003_Modules-(1+2) (1)
No ratings yet
Material_MAT3003_Modules-(1+2) (1)
48 pages
PROBABILITY AND RANDOM PROCESSES
No ratings yet
PROBABILITY AND RANDOM PROCESSES
167 pages
What Is A Data Set?
No ratings yet
What Is A Data Set?
19 pages
An Introduction to probability and statistical inference 1st Edition George G. Roussas 2024 Scribd Download
100% (15)
An Introduction to probability and statistical inference 1st Edition George G. Roussas 2024 Scribd Download
50 pages
CENG 222 Statistical Methods For Computer Engineering
No ratings yet
CENG 222 Statistical Methods For Computer Engineering
31 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
TrajectoryNet - A Dynamic Optimal Transport Network For Modeling Cellular Dynamics
No ratings yet
TrajectoryNet - A Dynamic Optimal Transport Network For Modeling Cellular Dynamics
15 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Stat 133 Class Notes Spring 2011
No ratings yet
Stat 133 Class Notes Spring 2011
351 pages
PHD Unimi R07738
No ratings yet
PHD Unimi R07738
134 pages
Os J101 PDF
No ratings yet
Os J101 PDF
142 pages
Between Dust and Clay
No ratings yet
Between Dust and Clay
14 pages
Refrigeration Cycle
100% (1)
Refrigeration Cycle
32 pages
SAP System Administration: General, J2EE & CTS Administration
100% (1)
SAP System Administration: General, J2EE & CTS Administration
14 pages
John Zerzan, Primitivism, and The Value of Culture: Rodrigo Gomes Guimarães
No ratings yet
John Zerzan, Primitivism, and The Value of Culture: Rodrigo Gomes Guimarães
38 pages
Ibm Websphere Iis Datastage Enterprise Edition V7.5
No ratings yet
Ibm Websphere Iis Datastage Enterprise Edition V7.5
11 pages
Msn-Agacnp Spring 1 FT - Vo Polverine
No ratings yet
Msn-Agacnp Spring 1 FT - Vo Polverine
1 page
Fm6 Pipe Network Bench
No ratings yet
Fm6 Pipe Network Bench
2 pages
Sword Art Online 12 - Alicization Rising
100% (1)
Sword Art Online 12 - Alicization Rising
307 pages
FAFL Final Lecture 5.1
No ratings yet
FAFL Final Lecture 5.1
16 pages
Security Principles
No ratings yet
Security Principles
10 pages
Technical Writing Notes
No ratings yet
Technical Writing Notes
50 pages
National University of Computer & Emerging Sciences: Internet Structure
No ratings yet
National University of Computer & Emerging Sciences: Internet Structure
17 pages
CV Muh Fathan Anis Fuadi
No ratings yet
CV Muh Fathan Anis Fuadi
2 pages
Science Rocks Lesson Plan
No ratings yet
Science Rocks Lesson Plan
2 pages
Matura Ang PR 2015 Próbna
No ratings yet
Matura Ang PR 2015 Próbna
2 pages
Master in English 1 - Midterm Written Test Type A
No ratings yet
Master in English 1 - Midterm Written Test Type A
6 pages
Discharge Characteristics of Sharp Crested Weir of Curved Plan-Form
No ratings yet
Discharge Characteristics of Sharp Crested Weir of Curved Plan-Form
6 pages
SQL Joins & Subquery
No ratings yet
SQL Joins & Subquery
9 pages
WIDA-Foundations Exploring The WIDA English Language Development Framework ParticipantPacket
100% (1)
WIDA-Foundations Exploring The WIDA English Language Development Framework ParticipantPacket
28 pages
PL2303G DriverInstallerv1.3.0 ReleaseNote
No ratings yet
PL2303G DriverInstallerv1.3.0 ReleaseNote
3 pages
The Grit Scale Presentation Final-2
No ratings yet
The Grit Scale Presentation Final-2
19 pages
Pellet Plant Presentation by Anubhav 1
100% (3)
Pellet Plant Presentation by Anubhav 1
66 pages
11.2010 Radmi Thermalanalisisofschrs800continuousexcavator
No ratings yet
11.2010 Radmi Thermalanalisisofschrs800continuousexcavator
17 pages
100-DBMS Multiple Choice Questions
67% (3)
100-DBMS Multiple Choice Questions
17 pages
Excerpt From "Making Habits, Breaking Habits: Why We Do Things, Why We Don't, and How To Make Any Change Stick" by Jeremy Dean
No ratings yet
Excerpt From "Making Habits, Breaking Habits: Why We Do Things, Why We Don't, and How To Make Any Change Stick" by Jeremy Dean
11 pages
Liefgreen Lagnado 2023
No ratings yet
Liefgreen Lagnado 2023
18 pages
1/6 (Row1 Col1)
No ratings yet
1/6 (Row1 Col1)
6 pages
Tic Tac Toe Review Chapter 5
No ratings yet
Tic Tac Toe Review Chapter 5
19 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sam Roweis Probx

Uploaded by

Sam Roweis Probx

Uploaded by

Random Variables and Densities

Review: • Random variables X represents outcomes or states of world.

Probability Expectations, Moments

Marginal Probabilities Bayes’ Rule

• Thus, all joint probabilities can be factored by selecting an ordering

p(x, y, z, . . .) = p(x)p(y|x)p(z|x, y)p(. . . |x, y, z)

x • Expected value of − log p(x) (a function which depends on p(x)!).

Be Careful! Cross Entropy (KL Divergence)

Some (Conditional) Probability Functions Exponential Family

• For a continuous univariate random

Multivariate Gaussian Distribution Gaussian Marginals/Conditionals

Parameterizing Conditionals Potential Functions

Multiple Observations, Complete Data, IID Sampling Maximum Likelihood

Example: Multinomial Example: Univariate Normal

• We observe M iid die rolls (K-sided): D=3,1,K,2,. . .

Example: Linear Regression Sufficient Statistics are Sums

• In the examples above, the sufficient statistics were merely sums

Basic Statistical Problems Learning

• Many systems we build will be essentially probability models.

• For any concave function f () and any distribution on x,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.