100% found this document useful (1 vote)

173 views

18.6501x Fundamentals of Statistics

This document provides a summary of key concepts related to the Gaussian distribution and properties of the normal distribution, including: 1) The Gaussian distribution is ubiquitous in statistics due to the central limit theorem. It is characterized by two parameters: the mean μ and variance σ2. 2) It has a symmetric bell-shaped density function and useful properties like being invariant under affine transformations. 3) The standard normal distribution N(0,1) allows computing probabilities by looking up values in its cumulative distribution function. 4) Concepts like quantiles, symmetry, and adding/multiplying normal distributions are discussed.

Uploaded by

Alexander CTO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

173 views

18.6501x Fundamentals of Statistics

Uploaded by

Alexander CTO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

18.

6501x Fundamentals of Statistics The Gaussian Distribution Properties

Because of the CLT, the Gaussian (a.k.a. normal) distribution is ubiquitous in • If (Tn )n≥1 converges a.s., then it also converges in probability, and the two
This is a cheat sheet for statistics based on the online course given by Prof. Philippe statistics.
Rigollet. Compiled by Janus B. Advincula. limits are equal.
• X ∼ N µ, σ 2

• If (Tn )n≥1 converges in probability, then it also converges in distribution.
• E [X] = µ
Last Updated December 18, 2019 • Convergence in distribution implies convergence in probability if the limit
• Var(X) = σ 2 > 0
has a density (e.g. Gaussian):
Gaussian density (PDF)
Introduction to Statistics ! (d)
Tn −−−−→ T ⇒ P (a ≤ Tn ≤ b) −−−−→ P (a ≤ T ≤ b)
1 (x − µ)2 n→∞ n→∞
fµ,σ2 (x) = √ exp −
σ 2π 2σ 2
What is Statistics? Addition, Multiplication, Division
Useful Properties of Gaussian
Statistical view Data comes from a random process. The goal is to learn how this It is invariant under affine transformation.
process works in order to make predictions or to understand what plays a role in it. Assume
a.s./P a.s./P
• If X ∼ N µ, σ 2 , then for any a, b ∈ R,

Tn −−−−→ T and Un −−−−→ U.
n→∞ n→∞

2 2
Probability aX + b ∼ N aµ + b, a σ . a.s./P
• Tn + Un −−−−→ T + U
n→∞
• Standardization: If X ∼ N µ, σ 2 , then

a.s./P
X−µ • Tn Un −−−−→ T U
n→∞
Z= ∼ N (0, 1)
σ • If, in addition, U 6= 0 a.s., then
Observations
Truth We can compute probabilities from the CDF of Z ∼ N (0, 1):
(Data)
u−µ v−µ
Tn a.s./P T
P (u ≤ X ≤ v) = P ≤Z≤ −−−−→
σ σ Un n→∞ U

• Symmetry: If X ∼ N 0, σ 2 , then −X ∼ N 0, σ 2 . If x > 0,

Slutsky’s Theorem
Statistics P (|X| > x) = P (X > x) + P (−X > x) = 2 P (X > x)
Let (Xn ), (Yn ) be two sequences of random variables such that
Quantiles Let α ∈ (0, 1). The quantile of order 1 − α of a random variable X is the
number qα such that (d) P
Statistics vs. Probability P (X ≤ qα ) = 1 − α. (i) Tn −−−−→ T and (ii) Un −−−−→ u
n→∞ n→∞

Probability Previous studies showed that the drug was 80% effective. Then we can where T is a random variable and u is a given real number. Then,
anticipate that for a study on 100 patients, in average 80 will be cured and at least 65
(d)
will be cured with 99.99% chances. 1−α • Tn + Un −−−−→ T + u
n→∞
78
Statistics Observe that 100 patients were cured. We (will be able to) conclude that
(d)
we are 95% confident that for other studies, the drug will be effective on between • Tn Un −−−−→ T u
n→∞
69.88% and 86.11% of patients.
α Tn (d) T
• If, in addition, u 6= 0, then −−−−→ .
Probability Redux Un n→∞ u
qα
Let X1 , . . . , Xn be i.i.d. random variables with E [X] = µ and Var(X) = σ 2 . Continuous Mapping Theorem
Let F denote the CDF of X.
Law of Large Numbers If f is a continuous function, then
• F (qα ) = 1 − α
n
1 X P,a.s. • If F is invertible, then qα = F −1 (1 − α)
a.s./P/(d)
Tn −−−−−−−→ T ⇒
a.s./P/(d)
f (Tn ) −−−−−−−→ f (T ) .
Xn = Xi −−−−→ µ. n→∞ n→∞
n i=1
n→∞
• P (X > qα ) = α

• If X ∼ N (0, 1), P |X| > qα/2 = α
Central Limit Theorem Foundation of Inference
Three Types of Convergence
√ Xn − µ (d)
n
σ
−−−−→
n→∞
N (0, 1). Almost Surely (a.s.) Convergence Statistical Model
Let the observed outcome of a statistical experiment be a sample X1 , . . . , Xn of n

a.s.
Equivalently, Tn −−−−→ T ⇐⇒ P ω : Tn (ω) −−−−→ T (ω) =1
n→∞ n→∞ i.i.d. random variables in some measurable space E (usually E ⊆ R) and denote by P
√ (d)

2

n Xn − µ −−−−→ N 0, σ . their common distribution. A statistical model associated to that statistical experiment
n→∞ Convergence in Probability is a pair
Hoeffding’s Inequality Let n be a positive integer and X, X1 , . . . Xn be i.i.d. P E, (Pθ )θ∈Θ
Tn −−−−→ T ⇐⇒ P (|Tn − T | ≥ ) −−−−→ 0 ∀ > 0
random variables such that E [X] = µ and X ∈ [a, b] almost surely. Then, n→∞ n→∞
where
Convergence in Distribution
− 2n
2 • E is called sample space;
P X n − µ ≥ ≤ 2e (b−a)2 ∀ > 0 (d)

Tn −−−−→ T ⇐⇒ E [f (Tn )] −−−−→ E [f (T )] • (Pθ )θ∈Θ is a family of probability measures on E;
n→∞ n→∞
for all continuous and bounded function f . • Θ is any set, called parameter set.
Parametric, Nonparametric and Semiparametric Models Confidence Intervals • If ψ = 0, H0 is not rejected.
• If ψ = 1, H0 is rejected.

• Usually, we will assume that the statistical model is well-specified, i.e., Let E, (Pθ )θ∈Θ be a statistical model based on observations X1 , . . . , Xn , and
defined such that ∃θ such that P = Pθ . This particular θ is called the true assume Θ ⊆ R. Let α ∈ (0, 1). Errors
parameter and is unknown. • Confidence interval (C.I.) of level 1 − α for θ: Any random (depending on
• Rejection region of a test ψ:
• We often assume that Θ ⊆ Rd for some d ≥ 1. The model is called X1 , . . . , Xn ) interval I whose boundaries do not depend on θ and such that n
parametric. Rψ = x ∈ E : ψ(x) = 1 .
Pθ [I 3 θ] ≥ 1 − α, ∀θ ∈ Θ.
• Sometimes we could have Θ be infinite dimensional, in which case the model • Type 1 error of a test ψ:
is called nonparametric. • C.I. of asymptotic level 1 − α for θ: Any random interval I whose αψ : Θ0 → R (or [0, 1])
boundaries do not depend on θ and such that
• If Θ = Θ1 × Θ2 , where Θ1 is finite dimensional and Θ2 is infinite θ 7→ Pθ [ψ = 1]
dimensional, then we have a semiparametric model. In these models, we lim Pθ [I 3 θ] ≥ 1 − α, ∀θ ∈ Θ.
n→∞ • Type 2 error of a test ψ:
only care to estimate the finite dimensional parameter and the infinite
dimensional one is called nuisance parameter. βψ : Θ 1 → R
iid
Example We observe R1 , . . . , Rn ∼ Ber(p) for some unknown p ∈ (0, 1). θ 7→ Pθ [ψ = 0]
Identifiability
• Statistical model: {0, 1}, (Ber(p))p∈(0,1) • Power of a test ψ:
The parameter θ is called identifiable if and only if the map θ ∈ Θ 7→ Pθ is injective, πψ = inf (1 − βψ (θ))
i.e., θ∈Θ1
0 • From CLT:
θ 6= θ =⇒ Pθ 6= Pθ0 √ Rn − p (d)
np −−−−→ N (0, 1) Level, test statistic and rejection region
or equivalently, p(1 − p) n→∞
Pθ = Pθ0 =⇒ θ=θ .
0 • A test ψ has level α if
• It yields αψ (θ) ≤ α, ∀θ ∈ Θ0 .
Parameter Estimation  p p  • A test ψ has asymptotic level α if
qα p(1 − p) qα p(1 − p)
Statistic Any measurable function of the sample, e.g., X̄n , max Xi , etc. I = R n − 2
√ , Rn + 2 √  lim αψ (θ) ≤ α, ∀θ ∈ Θ0 .
n→∞
i n n
Estimator of θ Any statistic whose expression does not depend on θ • In general, a test has the form
• But this is not a confidence interval because it depends on p! ψ = 1{Tn > c}

Three solutions: for some statistic Tn and threshold c ∈ R. Tn is called the test statistic. The
• An estimator θ̂n of θ is weakly (resp.strongly) consistent if rejection region is Rψ = {Tn > c}.
1. Conservative bound
P (resp. a.s.) p-value The (asymptotic) p-value of a test ψα is the smallest (asymptotic) level α at
θ̂n −−−−−−−−→ θ (w.r.t. P).
n→∞ 2. Solving the (quadratic) equation for p which ψα rejects H0 .
3. Plug-in
• An estimator θ̂n of θ is asymptotically normal if Methods of Estimation
√ (d)
n θ̂n − θ −−−−→ N 0, σ
2
The Delta Method
n→∞
Let (Zn )n≥1 be a sequence of random variables that satisfies
Total Variation Distance

Let E, (Pθ )θ∈Θ be a statistical model associated with a sample of i.i.d. r.v.
Bias of an Estimator √ (d) X1 , . . . , Xn . Assume that there exists θ ∗ ∈ Θ such that X1 ∼ Pθ∗ .

2
n (Zn − θ) −−−−→ N 0, σ
• Bias of an estimator of θ̂n of θ: n→∞
Statistician’s goal: Given X1 , . . . , Xn , find an estimator θ̂ = θ̂(X1 , . . . , Xn ) such
h i 2
for some θ ∈ R and σ > 0 (the sequence (Zn )n≥1 is said to be asymptotically that Pθ̂ is close to Pθ∗ for the true parameter θ ∗ .
bias θ̂n = E θ̂n − θ
normal around θ). Let g : R → R be continuously differentiable at the point θ. Then, The total variation distance between two probability measures Pθ and Pθ0 is
defined by
• If bias θ̂n = 0, we say that θ̂n is unbiased. • (g (Zn ))n≥1 is also asymptotically normal around g (θ). TV (Pθ , Pθ0 ) = max |Pθ (A) − Pθ0 (A)|
A⊂E
• More precisely, Total Variation Distance between Discrete Measures Assume that E is discrete
Jensen’s Inequality
√ (d)

0 2 2 (i.e., finite or countable). The total variation distance between Pθ and Pθ0 is
• If the function f (x) is convex, n (g (Zn ) − g (θ)) −−−−→ N 0, g (θ) σ .
n→∞ 1 X
TV (Pθ , Pθ0 ) = |pθ (x) − pθ0 (x)|
E [f (X)] ≥ f (E [X]) . 2 x∈E
Introduction to Hypothesis Testing
• If the function g(x) is concave, Total Variation Distance between Continuous Measures Assume that E is
Statistical Formulation Consider a sample
X1 , . . . , Xn of i.i.d. random variables continuous. The total variation distance between Pθ and Pθ0 is
E [g (X)] ≤ g (E [X]) . and a statistical model E, (Pθ )θ∈Θ . Let Θ0 and Θ1 be disjoint subsets of Θ.
1
Z
Consider the two hypotheses: TV (Pθ , Pθ0 ) = |fθ (x) − fθ0 (x)| dx
Quadratic Risk 2
• H0 : θ ∈ Θ0 Properties of Total Variation
• We want estimators to have low bias and low variance at the same time.
• H1 : θ ∈ Θ1 • TV (Pθ , Pθ0 ) = TV (Pθ0 , Pθ ) symmetric
• The risk (or quadratic risk) of an estimator θ̂n ∈ R is
H0 is the null hypothesis and H1 is the alternative hypothesis. • TV (Pθ , Pθ0 ) ≥ 0, TV (Pθ , Pθ0 ) ≤ 1 positive
2
R θ̂n = E θ̂n − θ = variance + bias
2
Asymmetry in the hypotheses H0 and H1 do not play a symmetric role: the data is • If TV (Pθ , Pθ0 ) = 0, then Pθ = Pθ0 definite
only used to try to disprove H0 . Lack of evidence does not mean that H0 is true. • TV (Pθ , Pθ0 ) ≤ TV (Pθ , Pθ00 ) + TV (Pθ00 , Pθ0 ) triangle inequality
• Low quadratic risk means that both bias and variance are small. A test is a statistic ψ ∈ {0, 1} such that: These imply that the total variation is a distance between probability distributions.
Kullback-Leibler (KL) Divergence • gradient vector: Multivariate Delta Method Let (Tn )n≥1 sequence of random vectors in Rd such
 ∂h(θ) 
The Kullback-Leibler (KL) divergence between two probability measures Pθ and that
 ∂θ1  √ (d)
Pθ0 is defined by n (Tn − θ) −−−−→ Nd (0, Σ) ,
 .  n→∞
 
d
∇h(θ) =  .  ∈ R
 .  for some θ ∈ Rd and some covariance Σ ∈ Rd×d . Let g : Rd → Rk (k ≥ 1) be

pθ (x)

P
 pθ (x) log if E is discrete  ∂h(θ) 
continuously differentiable at θ. Then,

pθ0 (x)

x∈E

KL (Pθ , Pθ0 ) = Z ∂θd
√


 fθ (x) (d) |
 fθ (x) log
 dx if E is continuous • Hessian matrix: n (g (Tn ) − g (θ)) −−−−→ N 0, ∇g(θ) Σ ∇g(θ) ,
E f θ 0 (x) n→∞
∂ 2 h(θ) ∂ 2 h(θ)
 

KL-divergence is also known as relative entropy. ... ∂g(θ) ∂gj
 ∂θ ∂θ
 1 1 ∂θ1 ∂θd  where ∇g(θ) = = ∈ Rd×k
Properties of KL-divergence . .
 ∂θ ∂θi 1≤i≤d

Hh(θ) =  . .. .  ∈ Rd×d

1≤j≤k
 2 .
 . . 
• KL (Pθ , Pθ0 ) 6= KL (Pθ0 , Pθ ) in general 2

 ∂ h(θ)
...
∂ h(θ)  Fisher Information
• KL (Pθ , Pθ0 ) ≥ 0 ∂θd ∂θ1 ∂θd ∂θd
Define the log-likelihood for one observation as
• If KL (Pθ , Pθ0 ) = 0, then Pθ = Pθ0 (definite) h is concave ⇐⇒ x| Hh(θ)x ≤ 0, ∀x ∈ Rd , θ ∈ Θ d
• KL (Pθ , Pθ0 ) KL (Pθ , Pθ00 ) + KL (Pθ00 , Pθ0 ) in general `(θ) = log L1 (X, θ) , θ∈Θ⊂R .
h is strictly concave ⇐⇒ x| Hh(θ)x < 0, ∀x ∈ Rd , θ ∈ Θ
Assume that ` is a.s. twice differentiable. Under some regularity conditions, the
Maximum Likelihood Estimation Consistency of Maximum Likelihood Estimator Under mild regularity conditions,
Fisher information of the statistical model is defined as
we have
∗
MLE P
Likelihood, Discrete Case Let E, (Pθ )θ∈Θ be a statistical model associated with θ̂n −−−−→ θ
| |
I(θ) = E ∇`(θ)∇`(θ) − E [∇`(θ)] E [∇`(θ)] = −E [H`(θ)] .
a sample of i.i.d. r.v. X1 , . . . , Xn . Assume that E is discrete (i.e., finite or countable). n→∞

Definition The likelihood of the model is the map Ln (or just L) defined as Covariance In general, when θ ⊂ Rd , d ≥ 2, its coordinates are not necessarily If Θ ⊂ R, we get
0 00
independent. The covariance between two random variables X and Y is I(θ) = Var ` (θ) = −E ` (θ) .
n
Ln : E ×Θ→R
Cov(X, Y ) := E [(X − E [X]) (Y − E [Y ])] Asymptotic Normality of the MLE
(x1 , . . . , xn ; θ) 7→ Pθ [X1 = x1 , . . . , Xn = xn ]
= E [XY ] − E [X] E [Y ]
n
Y Theorem Let θ ∗ ∈ Θ (the true parameter). Assume the following:
= Pθ [Xi = xi ]
Properties
i=1 1. The parameter is identifiable.
• Cov (X, X) = Var (X)
2. For all θ ∈ Θ, the support of Pθ does not depend on θ.

Likelihood, Continuous Case Let E, (Pθ )θ∈Θ be a statistical model associated
• Cov (X, Y ) = Cov (Y, X)
with a sample of i.i.d. r.v. X1 , . . . , Xn . Assume that all the Pθ have density fθ . 3. θ ∗ is not on the boundary of Θ.
• If X and Y are independent, then Cov (X, Y ) = 0
Definition The likelihood of the model is the map L defined as 4. I(θ) is invertible in a neighborhood of θ ∗ .
n Covariance Matrix The covariance matrix of a random vector
L:E ×Θ→R 5. A few more technical conditions.
(d) |

(1) d
n X = X ,...,X ∈R
Y MLE
(x1 , . . . , xn ; θ) 7→ fθ (xi ) Then, θ̂n satisfies
i=1 is given by
|
θ ∗ w.r.t. Pθ∗ ;
MLE P
Σ = Cov (X) = E (X − E [X]) (X − E [X]) . • θ̂n −−−−→
n→∞
Maximum Likelihood Estimator LetX1 , . . . , Xn be an i.i.d. sample associated This is a matrix of size d × d.
with a statistical model E, (Pθ )θ∈Θ and let L be the corresponding likelihood. √ MLE (d)
n θ̂n − θ ∗ Nd 0, I −1 (θ ∗ ) w.r.t. Pθ∗ .

If X ∈ Rd and A, B are matrices: • −−−−→
Definition The maximum likelihood estimator of θ is defined as n→∞
| |
MLE Cov (AX + B) = Cov (AX) = A Cov(X)A = AΣX A
θ̂n = argmax L (X1 , . . . , Xn , θ) , The Method of Moments
θ∈Θ
The Multivariate Gaussian Distribution If (X, T )| is a Gaussian vector then its
provided it exists. PDF depends on 5 parameters: Moments
Log-likelihood Estimator In practice, we use the fact that E [X] , Var(X), E[Y ], Var(Y ), and Cov(X, Y ). Let X1 , . . . , Xn be an i.i.d. sample associated with a statistical model
E, (Pθ )θ∈Θ . Assume that E ⊆ R and Θ ⊆ Rd , for some d ≥ 1.

MLE
θ̂n = argmax log L (X1 , . . . , Xn , θ) , A Gaussian vector X ∈ Rd is completely determined by its expected value and h i
θ∈Θ covariance matrix Σ: Population Moments Let mk (θ) = Eθ X1k , 1 ≤ k ≤ d.
X ∼ Nd (µ, Σ) .
n
Concave and Convex Functions It has PDF over Rd given by: k = 1
X k
Empirical Moments Let m̂k = Xn n Xi , 1 ≤ k ≤ d.
A twice-differentiable function h : Θ ⊂ R → R is said to be concave if its second

1 1 | −1
i=1
derivative satisfies f (x) = 1
exp − (x − µ) Σ (x − µ)
00 ((2π)d det (Σ)) 2 2 From LLN,
h (θ) ≤ 0, ∀θ ∈ Θ. P/a.s.
00 m̂k −−−−→ mk (θ)
It is said to be strictly concave if the inequality is strict: h (θ) < 0. Moreover, h is d n→∞
The Multivariate CLT Let X1 , . . . , Xn ∈ R be independent copies of a random
said to be (strictly) convex if −h is (strictly) concave, i.e. h00 (θ) ≥ 0 (h00 (θ) > 0). More compactly, we say that the whole vector converges:
vector X such that E [X] = µ, Cov (X) = Σ, then
Multivariate Concave Functions More generally, for a multivariate function:
√ (d) P/a.s.
h : Θ ⊂ Rd → R, d ≥ 2, define the n Xn − µ −−−−→ Nd (0, Σ) (m̂1 , . . . , m̂d ) −−−−→ (m1 (θ), . . . , md (θ))
n→∞ n→∞
Moments Estimator 1.5
Asymptotic test Assume that m = cn and n → ∞
Using Slutsky’s theorem, we also have
Let
1 X n − Y m − (∆d − ∆c ) (d)
M :Θ→R
d s −−−−→ N (0, 1)
n→∞
bd2
σ bc2
σ
θ 7→ M (θ) = (m1 (θ), . . . , md (θ)) +
0.5 n m
n m
Assume M is one-to-one: 1 X 2 1 X 2
bd2 =
where σ Xi − X n bc2 =
and σ Yi − Y m
−1 0 n − 1 i=1 m − 1 i=1
θ=M (m1 (θ), . . . , md (θ)) −2 −1 0 1 2

We get the following test at asymptotic level α:

Moments estimator of θ: MLE is an M-estimator Assume that E, (Pθ )θ∈Θ is a statistical model associated  
with the data.

 

 
MM −1 
 X −Y 
θbn = M (m
b 1, . . . , m
b d) n m

Theorem Let M = Θ and ρ(x, θ) = − log L1 (x, θ), provided the likelihood is Rψ = s > qα
positive everywhere. Then, 2 2
σ σ
 
provided it exists.  bd 
∗ ∗ + c

 b 

µ =θ ,  
n m
Generalized Method of Moments where P = Pθ∗ (i.e., θ ∗ is the true value of the parameter).
Statistical Analysis The χ2 Distribution
Applying the multivariate CLT and Delta method yields:
• Define µ̂n as a minimizer of Definition For a positive integer d, the χ2 distribution with d degrees of freedom is
Theorem iid
√ MM n the law of the random variable Z12 + · · · + Zd2 , where Z1 , . . . , Zd ∼ N (0, 1).
(d)

1 X
n θbn − θ −−−−→ N (0, Γ(θ)) , Qn (µ) := ρ (Xi , µ) . Properties If V ∼ χ2k , then
n→∞
n i=1
• E [V ] = E Z12 + · · · + E Zd2 = d
" #| " #
∂M −1 ∂M −1
where Γ(θ) = M (θ) Σ(θ) M (θ) ∂ 2 Q(µ) • Var(V ) = Var(Z12 ) + · · · + Var(Zd2 ) = 2d
∂θ ∂θ • Let J(µ) = .
∂µ∂µ| 1X
n 2 n
1X 2 2
MLE vs. Moment Estimator " # Sample Variance Sn = Xi − X n = X − Xn
∂ 2 ρ(X1 , µ) n i=1 n i=1 i
• Under some regularity conditions, J(µ) = E
• Comparison of the quadratic risks: In general, the MLE is more accurate. ∂µ∂µ| iid
Cochran’s Theorem If X1 , . . . , Xn ∼ N µ, σ 2 , then

• MLE still gives good results if the model is misspecified.
∂ρ(X1 , µ)

• Let K(µ) = Cov • Xn ⊥⊥ Sn , for all n.
• Computational issues: Sometimes, the MLE is intractable but MM is easier ∂µ nSn
(polynomial equations). • Remark: In the log-likelihood case, • ∼ χ2n−1
σ2
J(θ) = K(θ) = I(θ) (Fisher information) We often prefer the unbiased estimator of σ 2 :
M-Estimation n
∗
Asymptotic Normality Let µ ∈ M (the true parameter). Assume the following: 1 X 2 n
• Let X1 , . . . , Xn be i.i.d. with some unknown distribution P in some sample S
en = Xi − X n = Sn
n − 1 i=1 n−1
space E (E ⊆ Rd for some d ≥ 1). 1. µ∗ is the only minimizer of the function Q,
• No statistical model needs to be assumed (similar to ML). 2. J(µ) is invertible for all µ ∈ M, Student’s T Distribution
∗
• The goal is to estimate some parameter µ associated with P, e.g. its mean, 3. A few more technical conditions. Definition For a positive integer d, the Student’s T distribution with d degrees of
variance, median, other quantiles, the true parameter in some statistical Z
Then, µ̂n satisfies freedom (denoted by td ) is the law of the random variable p , where
model, etc. V /d
• µ̂n
P
−−−−→ µ∗ Z ∼ N (0, 1), V ∼ χ2d and Z ⊥ ⊥ V.
• We want to find a function ρ : E × M → R, where M is the set of all n→∞
possible values for the unknown µ∗ , such that √
Student’s T test (one-sample, two-sided)
(d)
n (µ̂n − µ∗ ) N 0, J(µ∗ )−1 K(µ∗ )J(µ∗ )−1

• −−−−→ iid
Let X1 , . . . , Xn ∼ N µ, σ 2 where both µ and σ 2 are unknown. We want to test:

Q(µ) := E [ρ (X1 , µ)] n→∞

H0 : µ = 0 vs. H1 : µ 6= 0
achieves its minimum at µ = µ∗ . Hypothesis Testing Test statistic:
Examples (1) √ Xn − µ
Parametric Hypothesis Testing √ Xn n
Tn = nq = s σ
• If E = M = R and ρ(x, µ) = (x − µ)2 , for all x, µ ∈ R: µ∗ = E [X]. Hypotheses S
en S
en
H0 : ∆ c = ∆ d vs. H 1 : ∆ d > ∆c
• If E = M = Rd and ρ(x, µ) = kx − µk22 , for all x, µ ∈ Rd : σ2
µ∗ = E [X] ∈ Rd . Since the data is Gaussian by assumption, we don’t need the CLT.
√ Xn S
en χ2n−1
∗
! ! Since n ∼ N (0, 1) (under H0 ) and 2 ∼ are independent by
• If E = M = R and ρ(x, µ) = |x − µ|, for all x, µ ∈ R: µ is a median of P. σ2 σ2 σ σ n−1
X n ∼ N ∆d , d and Y m ∼ N ∆c , c Cochran’s theorem, we have
n m
Example (2) If E = M = R, α ∈ (0, 1) is fixed and ρ(x, µ) = Cα (x − µ), for all Tn ∼ tn−1 .
x, µ ∈ R: µ∗ is a α-quantile of P. Then, Student’s test with (non-asymptotic) level α ∈ (0, 1):
Check Function X n − Y m − (∆d − ∆c ) n o
( s ∼ N (0, 1) ψα = 1 |Tn | > q α ,
2
Cα =
−(1 − α)x if x < 0 σd2 σc2
+ where q α is the 1 − α

αx if x ≥ 0. n m 2 2 -quantile of tn−1 .
Student’s T test (one-sample, one-sided) Likelihood function Categorical Likelihood
n d
H0 : µ ≤ µ 0 vs. H1 : µ > µ0 Ln : R × R → R • Likelihood of the model:
n N N N
Y Ln (X1 , . . . , Xn ; p) = p1 1 p2 2 . . . pKK
Test statistic: (x1 , . . . , xn ; θ) 7→ pθ (xi )
√ X n − µ0 i=1 where Nj = # {i = 1, . . . , n : Xi = aj } .
Tn = n q ∼ tn−1 (under H0 )
S
en The likelihood ratio test in this set-up is of the form • Let p
b be the MLE:
Nj

Ln (x1 , . . . , xn ; θ1 )
p
bj =
, j = 1, . . . , K.
Student’s test with (non-asymptotic) level α ∈ (0, 1): ψC = 1 >C n
Ln (x1 , . . . , xn ; θ0 ) b maximizes log Ln (X1 , . . . , Xn , p) under the constraint.
p
ψα = 1 {Tn > qα } √
where C is a threshold to be specified. χ2 test If H0 is true, then n p
b − p0 is asymptotically normal, and the following

where qα is the (1 − α)-quantile of tn−1 . A test based on the log-likelihood Consider an i.i.d. sample X1 , . . . , Xn with holds:
statistical model E, (Pθ )θ∈Θ , where Θ ⊆ Rd (d ≥ 1). Suppose the null hypothesis

Two-sample T-test Theorem Under H0 :
X n − Y m − (∆d − ∆c ) has the form 2
s ∼ tN (0) (0)
H0 : (θr+1 , . . . , θd ) = θr+1 , . . . , θd , n b j − p0j
p (d)
bd2 bc2
X 2
σ σ Tn = n −−−−→ χK−1
+ (0) (0) p0j n→∞
n m for some fixed and given numbers θr+1 , . . . , θd . j=1

Welch-Satterthwaite formula Let CDF and empirical CDF Let X1 , . . . , Xn be i.i.d. real random variables. The CDF
!2 θbn = argmax `n (θ) (MLE) of X1 is defined as
θ∈Θ
bd2
σ bc2
σ F (t) = P [X1 ≤ 1] , ∀t ∈ R.
+ and
n m c It completely characterizes the distribution of X1 .
N = ≥ min(n, m) θbn = argmax `n (θ) (constrained MLE)
bd4
σ bc4
σ θ∈Θ0 The empirical CDF of the sample X1 , . . . , Xn is defined as
+ n o
n2 (n − 1) m2 (m − 1) where Θ0 = θ ∈ Θ : (θr+1 , . . . , θd ) =
(0)
θr+1 , . . . , θd
(0) n
1 X
Fn (t) = 1 {Xi ≤ 1}
Test statistic: n i=1
Wald’s Test
c
Tn = 2 `n θ̂n − `n θ̂n . #{i = 1, . . . , n : Xi ≤ t}
A test based on the MLE Consider an i.i.d. sample X1 , . . . , Xn with statistical = , ∀t ∈ R.
Wilk’s Theorem Assume H0 is true and the MLE technical conditions are satisfied. n
model E, (Pθ )θ∈Θ , where Θ ⊆ Rd (d ≥ 1) and let θ0 ∈ Θ be fixed and given. θ ∗

is the true parameter. Then, Consistency By the LLN, for all t ∈ R,
(d) 2
Consider the following hypotheses: Tn −−−−→ χd−r a.s.
n→∞ Fn (t) −−−−→ F (t).
n→∞
∗ ∗ Likelihood ratio test with asymptotic level α ∈ (0, 1):
H0 : θ = θ 0 vs. H1 : θ 6= θ0
Glivenko-Cantelli Theorem (Fundamental theorem of statistics)
ψ = 1 {Tn > qα } ,
Let θbnMLE be the MLE. Assume the MLE technical conditions are satisfied. sup |Fn (t) − F (t)|
a.s.
−−−−→ 0
where qα is the (1 − α)-quantile of χ2d−r . t∈R n→∞
If H0 is true, then
√ 1
MLE 2 MLE
(d)
Goodness of Fit Tests Asymptotic normality By the CLT, for all t ∈ R,
n I θb θbn − θ0 −−−−→ Nd (0, Id ) Let X be a r.v. We want to know if the hypothesized distribution is a good fit for the √
n→∞ (d)
data. n (Fn (t) − F (t)) −−−−→ N (0, F (t) (1 − F (t)))
n→∞
Wald’s test Key characteristic of Goodness of Fit tests: no parametric modeling.
| Donsker’s Theorem If F is continuous, then
(d)

MLE
Tn := n θbn − θ0
MLE
I θbn
MLE
θbn − θ0 −−−−→
2
χd Discrete distribution Let E = {a1 , . . . , aK } be a finite space and (Pp )p∈∆ be √ a.s.
K n sup |Fn (t) − F (t)| −−−−→ sup |B(t)| ,
n→∞
the family of all probability distributions on E. t∈R n→∞ 0≤t≤1

Wald’s test with asymptotic level α ∈ (0, 1):

 
 XK  where B(t) is a Brownian bridge on [0, 1].
• ∆K = p = (p1 , . . . , pK ) ∈ (0, 1)K : pj = 1
ψ = 1 {Tn > qα } ,  
j=1 Kolmogorov-Smirnov Test
where qα is the (1 − α)-quantile of χ2d . • For p ∈ ∆K and X ∼ Pp , √
Let Tn = sup n |Fn (t) − F (t)|. By Donsker’s theorem, if H0 is true, then
Wald’s Test in 1 dimension In one dimension, Wald’s test coincides with the t∈R
Pp [X = aj ] = pj , j = 1, . . . , K. (d)
two-sided test based on the asymptotic normality of the MLE. Tn −−−−→ Z, where Z has a known distribution (supremum of the absolute value of
n→∞
iid
Let X1 , . . . , Xn ∼ Pp , for some unknown p ∈ ∆K , and let p0 ∈ ∆K be fixed. a Brownian bridge).
Likelihood Ratio Test
We want to test: KS test with asymptotic level α:
iid 0 0
H0 : p = p vs. H1 : p 6= p
Basic Form of the Likelihood X1 , . . . , Xn ∼ Pθ∗ , and consider the
Ratio Test Let KS
δα = 1 {Tn > qα }
associated statistical model E, (Pθ )θ∈Rd . Suppose that Pθ is a discrete probability with asymptotic level α ∈ (0, 1).
distribution with pmf given by pθ . The Probability Simplex in K Dimensions The probability simplex in RK , denoted where qα is the (1 − α)-quantile of Z.
by ∆K , is the set of all vectors p = [p1 , . . . , pK ]| such that Let X(1) ≤ X(2) ≤ · · · ≤ X(n) be the reordered sample. The expression for Tn
In its most basic form, the likelihood ratio test can be used to decide between two
| reduces to
hypotheses of the following form: p · 1 = p 1 = 1, pi ≥ 0 for all K
√

i − 1

0 i 0
∗ ∗ Tn = n max max − F X(i) , − F X(i) .
H0 : θ = θ 0 vs. H1 : θ = θ1 where 1 denotes the vector 1 = (1, . . . , 1)| i=1,...,n n n
Four patterns • Let X1 , . . . , Xn be a sample of n random variables.
Fn (X(i) ) 1. heavy tails • Denote by Ln (·|θ) the joint PDF of X1 , . . . , Xn conditionally on θ, where
1 Normal Q−Q Plot θ ∼ π.
FX

6
• Remark: Ln (X1 , . . . , Xn |θ) is the likelihood used in the frequentist
3 approach.

4
4

2
Sample Quantiles
• The conditional distribution of θ given X1 , . . . , Xn is called the posterior
distribution. Denote by π(·|X1 , . . . , Xn ) its PDF.

0
2
4

−2
2
4
− F (X(2) ) Bayes’ formula

−4
1 π (θ|X1 , . . . , Xn ) ∝ π(θ)Ln (X1 , . . . , Xn |θ), ∀θ ∈ Θ

−6
4
−2 −1 0 1 2

Theoretical Quantiles

0 Bernoulli experiment with a Beta prior

X(1) X(2) X(3) X(4) • p ∼ Beta(a, a):
2. right skewed
a−1 a−1
Normal Q−Q Plot π(p) ∝ p (1 − p) , p ∈ (0, 1)

5
iid
Pivotal Distribution Tn is called a pivotal statistic: If H0 is true, the distribution of • Given p, X1 , . . . , Xn ∼ Ber(p), so
Tn does not depend on the distribution of the Xi ’s.

4
Pn Pn

Sample Quantiles
Ln (X1 , . . . , Xn |p) = p i=1 Xi (1 − p)
n− i=1 Xi .

3
Other Goodness of Fit Tests

2
• Hence,
Kolmogorov-Smirnov

1
a−1+ n
P Pn
d (Fn , F ) = sup |Fn (t) − F (t)| π (p|X1 , . . . , Xn ) ∝ p i=1 Xi (1 − p)
a−1+n− i=1 Xi

0
t∈R
−2 −1 0 1 2

Theoretical Quantiles • The posterior distribution is

Cramér-Von Mises
Z n n
!
2 2 X X
d (Fn , F ) = [Fn (t) − F (t)] dF (t) Beta a+ Xi , a + n − Xi conjugate prior
R
3. left skewed
Normal Q−Q Plot
h i i=1 i=1
2
= E |Fn (X) − F (X)|

0
X∼F Non-informative Priors

−1
Anderson-Darling • We can still use a Bayesian approach if we have no prior information about the

Sample Quantiles
parameter.

−2
[Fn (t) − F (t)]2
Z
2
d (Fn , F ) dF (t) • Good candidate: π(θ) ∝ 1, i.e., constant PDF on Θ.

−3
R F (t) (1 − F (t))
• If Θ is bounded, this is the uniform prior on Θ.

−4
Kolmogorov-Lilliefors Test • If Θ is unbounded, this does not define a proper PDF on Θ.
−2 −1 0 1 2

We want to test if X has a Gaussian distribution with unknown parameters. In this Theoretical Quantiles
• An improper prior on Θ is a measurable, non-negative function π(·) defined
case, Donsker’s theorem is no longer valid. Instead, we compute the quantiles for the on Θ that is not integrable:
test statistic
4. light tails
Z
sup Fn (t) − Φµ̂,σ̂2 (t)

t∈R Normal Q−Q Plot π(θ)dθ = ∞.
where µ̂ = X n , σ̂ 2 = Sn
2
and Φµ̂,σ̂2 (t) is the CDF of N µ̂, σ̂ 2 .

1.5

• In general, one can still define a posterior distribution using an improper prior,
1.0
They do not depend on unknown parameters. using Bayes’ formula.
0.5
Sample Quantiles

Quantile-Quantile (QQ) plots

0.0

Jeffreys Prior and Bayesian Confidence Interval

−0.5

• Provide a visual way to perform goodness of fit tests.

Jeffreys prior is an attempt to incorporate frequentist ideas of likelihood in the
−1.0

• Not a formal test but quick and easy check to see if a distribution is plausible. Bayesian framework, as well as an example of a non-informative prior:
−1.5

• Main idea: We want to check visually if the plot of Fn is close to that of F or, −2 −1 0 1 2
q
equivalently, if the plot of Fn−1 is close to F −1 . Theoretical Quantiles
πJ (θ) ∝ det I(θ)
• Check if the points
where I(θ) is the Fisher information matrix of the statistical model associated with
Bayesian Statistics

−1 1 −1 1 −1 −1 n−1
F (n ), Fn ( n ) , . . . , F ( n−1
n ), Fn ( n )
X1 , . . . , Xn in the frequentist approach (provided it exists).
Examples
are near the line y = x.
• Fn is not technically invertible but we define
Introduction to Bayesian Statistics • Bernoulli experiment: πJ (θ) ∝ p
1
, p ∈ (0, 1): the prior is
Prior and Posterior p(1 − p)
−1
i
Fn ( n ) = Xi , Beta( 12 , 1
2)
• Consider a probability distribution on a parameter space Θ with some PDF
the ith largest observation. π(·): the prior distribution. • Gaussian experiment: πJ (θ) ∝ 1, θ ∈ R, is an improper prior
Jeffreys prior satisfies a reparametrization invariance principle: If η is a Probabilistic Analysis Let X and Y be two r.v. (not neccessarily independent) with Closed Form Solution Assume that rank(X) = p. Then,
reparametrization of θ (i.e., η = φ(θ) for some one-to-one map φ), then the PDF π̃(·) two moments and such that Var (X) > 0. The theoretical linear regression of Y on X
is the line x 7→ a∗ + b∗ x, where b = X| X −1 X| Y.

of η satisfies: β
q
˜
π̃(η) ∝ det I(η), ∗ ∗
h
2
i
a , b = argmin E (Y − a − bX) Geometric Interpretation of the LSE Xβb is the orthogonal projection of Y onto the
˜ (a,b)∈R2 subspace spanned by the columns of X:
where I(η) is the Fisher information of the statistical model parametrized by η
instead of θ.
which gives Xβ
b = P Y,
Bayesian confidence regions For α ∈ (0, 1), a Bayesian confidence region with | −1 |
level α is a random subset R of the parameter space Θ, which depends on the sample ∗ Cov (X, Y ) where P = X (X X) X .
b =
X1 , . . . , Xn , such that Var (X) Statistical Inference To make inference, we need more assumptions.
∗ ∗ Cov (X, Y )
P [θ ∈ R|X1 , . . . , Xn ] = 1 − α. a = E[Y ] − b E[X] = E[Y ] − E[X] • The design matrix X is deterministic and rank(X) = p.
Var (X)
• The model is homoscedastic: ε1 , . . . , εn are i.i.d.
Note that R depends on the prior π(·).
Noise The points are not exactly on the line x 7→ a∗ + b∗ x if Var(Y |X = x) > 0. • The noise vector ε is Gaussian:
Bayesian confidence region and confidence interval are two distinct notions. The random variable ε = Y − (a∗ + b∗ X) is called noise and satisfies
2
∗ ∗
ε ∼ Nn 0, σ In
Bayesian estimation Y = a + b X + ε,

with E[ε] = 0 and Cov(X, ε) = 0 for some known or unknown σ 2 > 0.

• Posterior mean: θb(π) =
R
Θ
θπ (θ|X1 , . . . , Xn ) dθ
Statistical Problem In practice, a∗ , b∗ need to be estimated from data. Properties of LSE
• MAP (maximum a posteriori): θb MAP = argmax π(θ|X1 , . . . , Xn )
θ∈Θ Least Squares The least squares estimator (LSE) of (a, b) is the minimizer of the • LSE = MSE
It is the point that maximizes the posterior distribution, provided it is unique. sum of squared errors: • Distribution of β:
b
n
X
(Yi − a − bXi ) .
2 b ∼ Np β ∗ , σ 2 X| X −1
β
Linear Regression i=1

Then, • Quadratic Risk of β:

b
h i
Modeling Assumptions (Xi , Yi ), i = 1, . . . , n, are i.i.d. from some unknown joint b − βk2 = σ 2 tr X| X −1
E kβ
distribution P. P can be described entirely by (assuming all exist): XY − X Y 2
b̂ = 2
X2 − X
• either a joint PDF h(x, y) • Prediction Error: h i
â = Y − b̂X b 2 = σ 2 (n − p)
E kY − Xβk
R 2
• the marginal density of X, h(x) = h(x, y)dy and the conditional density
Multivariate Regression • Unbiased estimator of σ 2 :
h(x, y)
h(y|x) = b 2
kY − Xβk 1 X 2
n
h(x) We have a vector of explanatory variables or covariates: 2 2
σ
b = = ε
n − p i=1 i
b
n−p
 (1) 
h(y|x) answers all our questions. It contains all the information about Y given X. Xi
 .  p Significance Tests
Partial Modeling We can also describe the distribution only partially, e.g. using Xi =  .  ∈R .
 . 
(p)
Xi • Test whether the j th explanatory variable is significant in the linear regression.
• the expectation of Y : E [Y ]
• H0 : βj = 0 v.s. H1 : β 6= 0
• the conditional expectation of Y given X = x: E [X = x]. The function The response or dependent variable is Yi with
| ∗ • If γj (γj > 0) is the j th diagonal coefficient of (X| X)−1 :
Z Yi = Xi β + εi , i = 1, . . . , n
x 7→ f (x) := E [Y |X = x] = yh(y|x)dy bj − βj
β
and β ∗
1 is called the intercept. p ∼ tn−p
b2 γj
σ
is called regression function. Least Squares Estimator The least squares estimator of β ∗ is the minimizer of the
sum of squared errors β
• other possibilities: n • Let Tn(j) = p
bj
.
X | 2
β
b = argmin Y i − Xi β b 2 γj
σ
– the conditional median: m(x) such that β∈Rp i=1
• Test with non-asymptotic level α ∈ (0, 1):
Z m(x) 1
h(y|x)dy = LSE in Matrix Form n o
(j)
−∞ 2 Rj,α = Tn > q α (tn−p )
• Let Y = (Y1 , . . . , Yn )| ∈ Rn . 2

– conditional quantiles α
• Let X be the n × p matrix whose rows are X| | where q α (tn−p ) is the (1 − 2 )-quantile of tn−p .
– conditional variance (not information about location) 1 , . . . , Xn . X is called the design 2
matrix.
Bonferroni’s test Test whether a group of explanatory variables is significant in the
Linear Regression We focus on modeling the regression function • Let ε = (ε1 , . . . , εn )| ∈ Rn , the unobserved noise. Then, linear regression.
∗ ∗
f (x) = E [Y |X = x] . Y = Xβ + ε, β unknown. • H0 : βj = 0 ∀j ∈ S v.s. H1 : ∃j ∈ S, βj 6= 0 where S ⊆ {1, . . . , p}.
• Bonferroni’s test:
Restrict to simple functions. The simplest is • The LSE β
b satisfies [
b = argminkY − Xβk2 .
β RS,α = Rj, α , where k = |S|
f (x) = a + bx linear (or affine) function β∈Rp
2
j∈S
k
Generalized Linear Model In GLM, we have Y |X = x ∼ distribution in exponential family. Then, Recommended Resources
|
E [Y |X = x] = f X β
Generalization A generalized linear model (GLM) generalizes normal linear
regression models in the following directions: Link function β is the parameter of interest. A link function g relates the linear • Probability and Statistics (DeGroot and Schervish)
predictor X | β to the mean parameter µ,
1. Random component: Y |X = x ∼ some distribution • Mathematical Statistics and Data Analysis (Rice)
|
X β = g(µ) = g (µ(X)) .
2. Regression function: • Fundamentals of Statistics [Lecture Slides] (http://www.edx.org)
|
g (µ(x)) = x β g is required to be monotone increasing and differentiable
where g is called link function and µ(x) = E [Y |X = x] is the regression −1 | Please share this cheatsheet with friends!
µ=g X β
function.
Canonical Link The function g that links the mean µ to the canonical parameter θ is
Exponential Family called canonical link:
g(µ) = θ.
A family of distribution {Pθ : θ ∈ Θ}, Θ ⊂ Rk is said to be a k-parameter
exponential family on Rq , if there exist real-valued functions Since µ = b0 (θ), the canonical link is given by
0 −1
• η1 , . . . , ηk and B(θ) g(µ) = (b ) (µ).
q
• T1 , . . . , Tk , and h(y) ∈ R If φ > 0, the canonical link function is strictly increasing.
such that the density function of Pθ can be written as Example Bernoulli distribution

" k # y 1−y p
X p (1 − p) = exp y log + log(1 − p)
fθ (y) = exp ηi (θ)Ti (y) − B(θ) h(y) 1−p

i=1 θ
= exp yθ − log(1 + e )
Examples of discrete distributions The following distributions form discrete
p
exponential families of distributions with PMF: Hence, θ = log and b(θ) = log 1 + eθ .
1−p
• Bernoulli (p): py (1 − p)1−y , y ∈ {0, 1}
eθ

0 µ
λy −λ b (θ) = =µ ⇐⇒ θ = log
• Poisson (λ): e , y = 0, 1, . . . 1 + eθ 1−µ
y!
The canonical link for the Bernoulli distribution is the logit link.
Examples of continuous distributions The following distributions form continuous
exponential families of distributions with PDF: Model and Notation
1 y Let (Xi , Yi ) ∈ Rp × R, i = 1, . . . , n be independent random pairs such that the
• Gamma (a, b): y a−1 e− b conditional distribution of Yi given Xi = xi has density in the canonical exponential
Γ(a)ba
family:
β α −α−1 − β
yi θi − b(θi )

• Inverse Gamma (α, β): y e y fθi (yi ) = exp + c(yi , φ)
Γ(α) φ
s !
σ2 σ 2 (y − µ)2 Back to β: Given a link function g, note the following relationship between β and θ:
• Inverse Gaussian (µ, σ 2 ): 3
exp − 2
2πy 2µ y

0 −1 0 −1 −1 | |
θi = (b ) (µi ) = (b ) g (Xi β) ≡ h Xi β
One-parameter Canonical Exponential Family
where h is defined as
0 −1 −1 0 −1

yθ − b(θ)
h = (b ) ◦g = (g ◦ b ) .
fθ (y) = exp + c(y, φ)
φ If g is the canonical link function, h is the identity g = (b0 )−1 .
for some known functions b(θ) and c(y, φ). Log-likelihood The log-likelihood is given by
X Yi θi − b(θi )
• If φ is known, this is a one-parameter exponential family with θ being the `n (Y, X, β) = + constant
canonical parameter. i
φ
• If φ is unknown, this may/may not be a two-parameter exponential family. X Yi h X | β − b h X | β

i i
= + constant
• φ is called dispersion parameter. i
φ
Expected value Note that When we use the canonical link function, we obtain the expression
Y θ − b(θ) X Yi X | β − b X | β

`(θ) = + c (Y ; φ) , i i
`n (Y, X, β) = + constant
φ φ
i
which leads to
0
E [Y ] = b (θ). Strict concavity The log-likelihood `(θ) is strictly concave (if rank(X) = p) using
the canonical function when φ > 0. As a consequence, the maximum likelihood
Variance estimator is unique.
00
Var(Y ) = b (θ) · φ
On the other hand, if another parametrization is used, the likelihood function may
not be strictly concaving leading to several local maxima.

Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Stat1012 Cheatsheet Double-Sided
100% (1)
Stat1012 Cheatsheet Double-Sided
2 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
7. Heteroscedasticity: y = β + β x + · · · + β x + u
100% (1)
7. Heteroscedasticity: y = β + β x + · · · + β x + u
21 pages
Homework 2
100% (1)
Homework 2
14 pages
Community Medicine Trans - Epidemic Investigation 2
100% (1)
Community Medicine Trans - Epidemic Investigation 2
10 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
Correlation & Regression
100% (1)
Correlation & Regression
53 pages
Homework 2
100% (1)
Homework 2
12 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Quiz Feedback1 - Coursera
100% (1)
Quiz Feedback1 - Coursera
7 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Correlation and Regression - The Simple Case
100% (2)
Correlation and Regression - The Simple Case
106 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Course Title: Data Pre-Processing and Visualization
100% (2)
Course Title: Data Pre-Processing and Visualization
11 pages
Logistic Regression
100% (1)
Logistic Regression
17 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Risk Return Summery
100% (1)
Risk Return Summery
85 pages
Statistics Formulas Cheatsheet
100% (1)
Statistics Formulas Cheatsheet
2 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Heteroskedasticity
100% (1)
Heteroskedasticity
23 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Practical Problems in Statistic
100% (1)
Practical Problems in Statistic
8 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
64 pages
Hypothesis Test
100% (1)
Hypothesis Test
1 page
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
100% (1)
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
614 pages
Poly
100% (1)
Poly
108 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
LLSPS - INT - 2831 - Predicting Life Expectancy Using Machine Learning
100% (1)
LLSPS - INT - 2831 - Predicting Life Expectancy Using Machine Learning
36 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
1
100% (1)
1
385 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
60 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Forecasting of Stock Prices Using Multi Layer Perceptron
100% (1)
Forecasting of Stock Prices Using Multi Layer Perceptron
6 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
The Enigma of Desire Noga ARIEL GALOR
No ratings yet
The Enigma of Desire Noga ARIEL GALOR
2 pages
Year-End Report
No ratings yet
Year-End Report
2 pages
Encouraging Customer Loyalty
No ratings yet
Encouraging Customer Loyalty
3 pages
Research Project - Annotated Bibliography
No ratings yet
Research Project - Annotated Bibliography
5 pages
SPWLA 2002 V43n2a4 - Archie Parameter Determination by Analysis of Saturation Data
No ratings yet
SPWLA 2002 V43n2a4 - Archie Parameter Determination by Analysis of Saturation Data
36 pages
Lexi Capital in Talks With Larrin & Kristen Marrisett To Acquire Wake Forest Montessori Preschool
No ratings yet
Lexi Capital in Talks With Larrin & Kristen Marrisett To Acquire Wake Forest Montessori Preschool
2 pages
Building and Enhancing New Literacies Across The Curriculum
No ratings yet
Building and Enhancing New Literacies Across The Curriculum
1 page
Dissertation Dedication Page
100% (2)
Dissertation Dedication Page
7 pages
Capitalist Peace: A History of American Free-Trade Internationalism Thomas W. Zeiler - Quickly download the ebook in PDF format for unlimited reading
100% (1)
Capitalist Peace: A History of American Free-Trade Internationalism Thomas W. Zeiler - Quickly download the ebook in PDF format for unlimited reading
55 pages
Jayden Smith - TransitionWords
No ratings yet
Jayden Smith - TransitionWords
1 page
Llurth Dreier
No ratings yet
Llurth Dreier
7 pages
Developmental Differences in A
No ratings yet
Developmental Differences in A
15 pages
Sps Dalion v. CA
No ratings yet
Sps Dalion v. CA
1 page
Vasquez vs. Court of Appeals: 10/1/20, 3:59 PM Supreme Court Reports Annotated Volume 138
No ratings yet
Vasquez vs. Court of Appeals: 10/1/20, 3:59 PM Supreme Court Reports Annotated Volume 138
9 pages
Choice of Address Terms in Conversational Setting I Jhs
No ratings yet
Choice of Address Terms in Conversational Setting I Jhs
22 pages
ĐÁP ÁN ĐỀ THI THỬ SỐ 11 (2019-2020)
100% (2)
ĐÁP ÁN ĐỀ THI THỬ SỐ 11 (2019-2020)
7 pages
Reaseach - Self-Esteem and Academic Performance
No ratings yet
Reaseach - Self-Esteem and Academic Performance
6 pages
Fatima's Cove
No ratings yet
Fatima's Cove
2 pages
Laminitis Sheet
No ratings yet
Laminitis Sheet
2 pages
The Business Plan: Creating and Starting The Venture
100% (1)
The Business Plan: Creating and Starting The Venture
7 pages
Study of Common Adulterants in Fat, Oil, Butter, Sugar, Turmeric Powder, Chilli Powder and Pepper
88% (8)
Study of Common Adulterants in Fat, Oil, Butter, Sugar, Turmeric Powder, Chilli Powder and Pepper
20 pages
Warehousing Fundamentals, Inventory Management
No ratings yet
Warehousing Fundamentals, Inventory Management
18 pages
Section-2 - Group-13 - Talent Acquisition at HCL
No ratings yet
Section-2 - Group-13 - Talent Acquisition at HCL
12 pages
Investigation of Different Properties of As Synthesized Silicon Nanowires and Its Hybrid: Terahertz Emitter and Photodetector
No ratings yet
Investigation of Different Properties of As Synthesized Silicon Nanowires and Its Hybrid: Terahertz Emitter and Photodetector
20 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
House Taken Over Summery
No ratings yet
House Taken Over Summery
21 pages
Razon v. Inciong, GR No. L-51809
No ratings yet
Razon v. Inciong, GR No. L-51809
4 pages
CA Caspule For SBI PO Mains Part 1 PDF
No ratings yet
CA Caspule For SBI PO Mains Part 1 PDF
129 pages
Oral Care Products
No ratings yet
Oral Care Products
34 pages
Session 05 Exercise Production Planning v1 STUDENTS
No ratings yet
Session 05 Exercise Production Planning v1 STUDENTS
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

18.6501x Fundamentals of Statistics

Uploaded by

18.6501x Fundamentals of Statistics

Uploaded by

18.

6501x Fundamentals of Statistics The Gaussian Distribution Properties

• Symmetry: If X ∼ N 0, σ 2 , then −X ∼ N 0, σ 2 . If x > 0,

We get the following test at asymptotic level α:

Wald’s test with asymptotic level α ∈ (0, 1):

0 Bernoulli experiment with a Beta prior

Theoretical Quantiles • The posterior distribution is

Quantile-Quantile (QQ) plots

Jeffreys Prior and Bayesian Confidence Interval

• Provide a visual way to perform goodness of fit tests.

with E[ε] = 0 and Cov(X, ε) = 0 for some known or unknown σ 2 > 0.

Then, • Quadratic Risk of β:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.