0% found this document useful (0 votes)
48 views28 pages

Summary of The First Ten Chapters

Uploaded by

elgaard2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views28 pages

Summary of The First Ten Chapters

Uploaded by

elgaard2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Summary of the first ten chapters:

1. What are statistics:


Statistics is a way from collecting data and afterwards use them to measure. It’s a way to get
information from data:

Important terms:
Descriptive statistics: For at redgøre og analysere ud fra tal lige nu:
Infernital statisics: Fremtidlige data næmmere rettet til som sandsynlighed for et næste begivenhed
Exit polls: Exit polls oversættes til dansk som "valgstedsmålinger" eller "udgangsmålinger". Disse målinger er
surveys eller meningsmålinger, der gennemføres med vælgere umiddelbart efter, de har afgivet deres
stemme ved et valg. Formålet med valgstedsmålinger er at indsamle data om, hvordan folk har stemt, samt
deres demografiske egenskaber og anden relevant information. Disse målinger bruges ofte til at forudsige
valgresultater og analysere vælgernes adfærd.
Parameter: A parameter has different meanings depending on the context in which it is used. It could be the
mean, median, STD and so on:
Population: Is the whole what we are trying to get information from
Sample: its a small cut from the population and meesaure om the sample instead:

2. Graphical descriptive techniques


Important terms:

Chapter summary:
Descriptive techniques is a easy way to show information from nominal data: Nominal data is where
“rækkefølgen” ikke har nogle betydning:
Here we can use bar charts, pie charts and distributions to summarize the single set of nominal data. Here
we can show the frequency and proportion of each category.

If the dataset have 2 nominal variables, we can show cross-classification table og bar charts:

3. Graphical descriptive techniques


Chapter summary:

We also described the difference between times series data and cross-sectional data. Se the picture below:

Histograms are used to describe single sets of interval data: To analyze the relationship between two
interval variables we draw a scatter diagram: We look for the linear relationship/correlation.

Scatter diagram:
4. Numerical descriptive techniques
The chapter extended our discussion about descriptive statistics which deals with methods of
summarizing and presenting the essential information contained in a set of data: Now we can numerical
descriptive our dataset with numerical methods: The three popular measures of the central location is
mean, median and mode:
But the don’t say what how much the data vary. Here to describe the information regarding the variability
of interval data is conveyed by such numerical measures as the range, variance and standard deviation.

Range is the diff between the biggest and small number in the dataset
Definition: Variance is a measure of how much individual data points deviate from the mean (average) of
the dataset. It quantifies the average of the squared differences between each data point and the mean.
standard deviation.The standard deviation is widely used because it provides a measure of dispersion that is
in the same units as the original data. It is easier to interpret than variance, and a smaller standard
deviation indicates less variability.

In summary, the range measures the spread by considering the difference between the highest and lowest
values, while the variance and standard deviation provide more detailed information by considering how all
data points deviate from the mean. The standard deviation is particularly useful because it is both
interpretable and commonly used in statistics to assess the variability within a dataset.

praksis bruges standardafvigelsen ofte mere end variansen, fordi den giver en mere meningsfuld måling af
variationen, der er lettere at tolke og sammenligne med de oprindelige datapunkter. Variansen kan dog
være nyttig i nogle statistiske beregninger og analyser, selvom den normalt ikke er det mest intuitive mål for
spredning.

For the special case in which a sample of measure ments has a mound-shaped distribution, the Empirical
Rule provides a good approximation of the percentages of measurements that fall within one, two and
three standard deviations ofthe mean. Chebysheff's Theorem applies to all sets of data no matter the shape
ofthe histogram. Measures of relative standing that were presented in this chapter are percentiles and
quartiles. The linear relationship between two interval variables is measured by the covariance, the
coefficient of correlation, the coefficient of determination and the least squares line.

Important terms:
5. Data collection and sampling
Summary:
- Because most populations are very large, it is extremely costly and impractical to investigate each
member of the population to determine the values of the parameters. As a practical alternative, we
take a sample from the population and use the sample statistics to draw inferences about the
parameters. Care must be taken to ensure that the sampled population is the same as the target
population.
- We can choose from among several different sampling plans, including simple random
sampling, stratified random sampling and cluster sampling. Whatever sampling plan is
used, it is important to realize that both sampling error and non-sampling error will occur
and to understand what the sources of these errors are.

Important terms:
6. Probability
Summary:

Important terms:
Sample: its a small cut from the population and meesaure om the sample instead:

7. Random variables and discrete probability distributions


Sample: its a small cut from the population and meesaure om the sample instead:

CHAPTER SUMMARY: There are two types of random variables. A discrete random variable is one
whose values are countable. A continuous random variable can assume an uncountable number of
values. In this chapter, we discussed discrete random variables and their probability distributions.
We defined the expected value, variance and standard deviation of a population represented by a
discrete probability dis tribution. Also introduced in this chapter were bivariate discrete
distributions on which an important application in finance was based. Finally, the two most
important dis crete distributions— the binomial and the Poisson— were presented.

Important terms:
Sample: its a small cut from the population and meesaure om the sample instead:
8. Continuous probability distributions
Important terms:

This chapter dealt with continuous random variables and their distributions. Because a continuous
random variable can assume an infinite number of values, the probability that the random variable
equals any single value is zero. Consequently, we addressed the problem of computing the
probability of a range of values. We showed that the prob ability of any interval is the area in the
interval under the curve representing the density function. We introduced the most important
distribution in statistics and showed how to compute the probability.

That a normal random variable falls into any interval. Additionally, we demonstrated how to use
the normal table backwards to find values of a normal random varia ble given a probability. Next,
we introduced the exponen tial distribution, a distribution that is particularly useful in several
management science applications. Finally, we presented three more continuous random variables
and their probability density functions. The Student t, chi-squared and Fdistributions will be used
extensively in statistical inference
9. Sampling distributions
The sampling distribution of a statistic is created by repeated sampling from one population. In this
chapter, we introduced the sampling distribution of the mean, the proportion and the difference
between two means. We described how these distributions are created theoreti cally and
empirically.

Important terms:
10.Introduction to estimation
Sample: its a small cut from the population and meesaure om the sample instead:

This chapter introduced the concepts of estimation and the estimator of a population mean when the
population variance is known. It also presented a formula to calculate the sample size necessary to estimate
a population mean

Important terms:
11 Introduction to hypothesis testing

A criminal trail is an example of hypothesis testing without statictis

 H0: The defendant is innocent

The alternative hypothesis or research hypothesis is:


 H1: The defendant is guilty

 H0. 400.000 kr
 H1 ligmed eller <>(mere eller mindre)

There are two possible errors.


- A Type I error: When u reject a true null hypothesis
- occurs when we reject a true null hypothesis. That is, a Type I error occurs when the jury
convicts an innocent person.
- A Type error 2, is when u are not rejecting a false null hypothesis
- Occurs when a guilty person walks off the trail( is guilty but walks off innocent)

Type 1 error is the Greek letter alfa.


Type 2 error is the Greek letter beta

See below

There are two hypotheses. One is called the null hypothesis and the other the alternative or
research hypothesis. The usual notation is:

H0: — the ‘null’ hypothesis


H1: — the ‘alternative’ or ‘research’ hypothesis
1. There are two hypothesis. One is called null-hypothesis and the second is
called research hypothesis.
2. The testing procedure begings with the assumption that the null-hypothesis
is true
3. The goal of the process is to determine whether there is enough evidence to
infer that the alternative hypothesis is true.
4. There are two possible decisions
 To conclude there is enough evidence to support the alternative hypothesis.
 To conclude there is not enough evidence to support the alternative
hypothesis
5. Two possible errors can be made in any test. AType I error occurs when we
reject a true null hypothesis, and aType II error occurs when we don't reject
a false null hypothesis.The probabili ties ofType I andType II errors are:

A firm producers 350 units pr hour:

Null hypothesis: H0:µ = 350

Therefore our research hypothesis becomes:

H1:µ ≠ 350

Conclude that there is enough evidence to support the alternative hypothesis


(also stated as: rejecting the null hypothesis in favor of the alternative)

Conclude that there is not enough evidence to support the alternative hypothesis
- (also stated as: not rejecting the null hypothesis in favor of the alternative).

NOTE: we do not say that we accept the null hypothesis…

Once the null and alternative hypotheses are stated, the next step is to randomly sample the
population and calculate a test statistic (in this example, the sample mean).

For example, if we’re trying to decide whether the mean is not equal to 350, a large value of
mean ! (say, 600) would provide enough evidence.

If X is close to 350 (say, 355) we could not say that this provides a great deal of evidence to infer
that the population mean is different than 350.
11.2 TESTING THE POPULATION MEAN WHEN THE POPULATION STANDARD
DEVIATION IS KNOWN

There are two approaches to answer the hypotheosis.


- The first is called rejection region method
- Second is by the p-value approach.

Rejection Region

It seems reasonable to reject the null hypothesis if the mean is relative larger than 170. Let’s say
500. But if the mean is close to 170, lets say 171. Such as 171 is close to 170 and does not make us
to reject the null- hypothesis. Entirely.
- In this samle

The null
H0: u = < 170 Do not install the new system
H1: u > 170 Install the new system

Information
N=400 antal
X=178=mean
μ=170 Mean
σ=65 Standard deviation:
Rejection Region:
The rejection region is a range of values such that if the statistic falls into that range, we decide to
reject the null hypothesis in favor of the alternative hypothesis.

Suppose that we define the value of the sample mean that is just large enough to reject the null
hypothesis Xl . The rejection region is

X > Xl
_
We know from section 9-1, that if the sampling distributions is of X is a normal with mean and STD.

To calculate the rejection region, we need a value of a at the significance level. Suppose that the
manager chose a to be 5%. It follows that za = z005 = 1.645. We can now calculate the value of xL:
The sample mean was computed to be 178. Because the test statistic (sample mean) is in the rejec
tion region (it is greater than 175.34), we reject the null hypothesis. Thus there is sufficient
evidence to infer that the mean monthly account is greater than €170.

Therefore we reject the null hypothesis and in favor of the research hypothesis.

11-2b Standardized Test Statistic

An easier method specifies that the test statistic be the standardized value of x; that is, we use the
standardized test statistic:

The rejection consists of all values of Z that are greater than Z0( se below)

Because 2.46 is greater than 1.645 we reject the null hypothesis and conclude that there is enough
evidence to infer that the mean monthly account is greater than 170.

I will refer the standardized test statistic simply as test statistic.


_
We can also conclude that the same result came from both methods from using test statistic X and
the standardized test statistic Z are identical.

11-2c p-Value
The smaller the p-value, the more statistical evidence exists to support the hypothesis

If the the p-value is less than 1%, strong evidence.


If the the p-value is between 1% and 5% there is strong evidence
If the the p-value is between 5 and 10 % there is a weak evidence
If the p-value exceeds 10%, there is no evidence that supports the alternative hypothesis.

We observe a p-value of 0.0069, hence there is overwhelming evidence to support H1: µ > 170
Compare the p-value with the selected value of the significance level:

If the p-value is less than α, we judge the p-value to be small enough to reject the null hypothesis.

If the p-value is greater than α, we do not reject the null hypothesis.

Since p-value = 0.0069 < α = 0.05, we reject H0 in favour of H1

If we reject the null hypothesis, we conclude that there is enough evidence to infer that the
alternative hypothesis is true.

If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to
infer that the alternative hypothesis is true.

Remember: The alternative hypothesis is the more important one. It represents what we are
investigating

11-2d interpreting the P-value


The sampling distribution allows us to make probability state¬ ments about a sample statistic
assuming knowledge of the population parameter. Thus the prob¬ ability of observing a sample
mean at least as large as 178 from a population whose mean is 170 is 0.0069, which is very small.

In some anther words, its VERY unlikely that a mean is over 178.
The p-value of a test provides valuable information because it is a measure of the amount of statis
tical evidence that supports the alternative hypothesis.

Chapter 12 - Statistics for Management and Economics, Gerald Keller and


Nicoleta Gaciu
Inference about a population
https://www.moodle.aau.dk/pluginfile.php/3139785/mod_resource/content/1/Lecture%203.pdf
PP TIL chapter 12

The second two chapters were about inference statistics with estimation and hypothesis. They
were the STD(standard deviation known and often the STD is unknown. So they are a bit
unrealistic.

In Section 12-1, we describe how to make inferences about the population mean under the more
realistic circumstance when the population standard deviation is unknown.

In Section 12-2, we continue to deal with interval data, but our parameter of interest becomes the
population variance.
the z-statistic is replaced by the t-statistic, where the number of “degrees of freedom” ν, is n–1.

Section 12-3 discusses inference about the proportion p.

In Section 12-4, we present an important application in marketing: market segmentation.

12- 1: INFERENCE ABOUT A POPULATION MEAN WHEN THE


STANDARD DEVIATION IS UNKNOWN
https://www.youtube.com/watch?v=vrod7OScpC4
Confidence Intervals about the Mean, Population Standard Deviation Unknown:

https://www.youtube.com/watch?v=tI6mdx3s0zk
Using the t Table to Find the P-value in One-Sample t Tests

Se disse vidoer hvis du kommer I tvivl:

In this section, we take a more realistic approach by acknowledging that if the population mean is
unknown, then so is the population standard deviation.

Instead, we substitute the sample standard deviation s in place of the unknown population
standard deviation σ.

The result is called T-statistic:


The formal is down below:

We will no longer use the z-statistic and the z-estimator of μ. All future inferential problems
involving a population mean will be solved using the t-statistic and t-estimator of μ shown in the
preceding boxes.

Opskrift( start med at finde x: Det gør vi via summet/antal(altså gennemsnit)

Herefter S2, og så putte det i kvadratrod for at få s. Til sidst skal vi finde T, det gør vi via vores
degress of fredom. Til sidst put tallene ind i formlen.
For at finde ens signifance level 1% skal vi kigge i en T tabel, hvor vi skal finde degresses of fredom.
Det gør vi ved at sige antal-1= degresss of freedom: F.eks en N på 148.(i bogen starten af kapitalet,
12,1) der er det 148-1=147. Så skal vi finde det i T-tabellen:

147 er tættere på 150 end 140, så vi går med den: 2,351.

Herefter skal vi finde samle mean. Først skal vi starte med at finde summen af observationerne
som er tegnet ∑x. Herefter skal vi finde ∑x2. Det er at sætte observationerne i anden(^2)

Efter tallene for oven, skal vi finde mean( Det gør vi via summen ∑x og dividere det med antallet af
oberservationer) (148)

Herefter skal vi finde s2. Det gør vi ved følgende formel:

Da det står i S2, skal vi have det om til s. Det gør vi via kvadratroden.
Nu har vi fundet s(std) og nu kan vi sætte tallene ind i vores formel.

Vi har tallet u, da det er nul-hypotesen (Ho: μ = 2.0)

Herefter får vi tallet 2,23

2,23 er ikke højere end 2,351 og det betyder, at vi at vi ikke kan afvise nul hypotesen til fordel for
alternative hyp. Se nedenfor for englesk:

Because 2.23 is not greater than 2.351, we cannot reject the null hypothesis in favour of the
alternative.

12-2 INFERENCE ABOUT A POPULATION VARIANCE

In Section 12-1, where we presented the inferential methods about a population mean, we were
interested in acquiring information about the central location of the population. As a result, we
tested and estimated the population mean. If we are interested instead in drawing inferences
about a population's variability, the parameter we need to investigate is the population variance σ

In section 12-1 we did the inferential methods about a population mean and where the middle
was. Now we want to lock at the variety of the populations variability.

We can use it In an example illustrating the use of the normal distribution in Section 8-2, we
showed why variance is a measure of risk with stocks and Quality techni cians attempt to ensure
that their company's products consistently meet specifications and so on. You name it:

We begin by identifying the best estimator. That estimator has a sampling distribution, from which
we produce the test statistic and the interval estimator.

The statistic s2 has the desirable characteristics presented in Section 10-1; that is, s2 is an
unbiased, consistent estimator of σ2.
Statisticians have shown that the sum of squared deviations from the mean ^£(x, - x)2 [which is
equal to (n - 11s2] divided by the population variance is chi-squared distributed with p = η - 1
degrees of freedom provided that the sampled population is normal. The statistic:

This is called chi-squared statistic: (X^2 statistic)

EXAMPLE: 12,3
Husk at læse bogen rigitigt.

For at finde 13,85, så gør vi følgende: Vi skal have 95% confidence lvl.
Det gør vi ved at kigge på 0,95 ved 24 og herfra finder vi 13,8
TABEL 5 I APENDEX NEDERST VED SIDE.

X^2 ER 1-0,95=0,05 OG DA DER ER 24 DEGRESS OF FREDOM(25-


1)=24
X1^2 ER 1-0,05=0,95 OG DA DER ER 24 DEGRESS OF FREDOM(25-
1)=24
We can estimate that the variance of fills is a number that lies between
0.3333 and 1.537.

Important terms:

Chapter 13: Inferring about comparing 2 populations.


https://www.moodle.aau.dk/pluginfile.php/3143826/mod_resource/content/1/Lecture%204.pdf
PP til kap 13
: Chapter 15 - Chi-Squared tests ( FORLÆSER HAR IKKE VALGT AT
INDRAGE KAP 14:)
https://www.moodle.aau.dk/pluginfile.php/3144732/mod_resource/content/1/Lecture%205.pdf
Link til PP

https://www.youtube.com/watch?v=rpKzq64GA9Y
video hvis man bliver I tvivl hvad det er

There is 4 different types of chi squared test

HVIS T ER STØRRE END VORES CRITICAL VALUE VI AFVISER NULHYPOSETEN

HVIS P ER STØRRE END ALFA ACPTERERER VI H0

HVIS P ER MINDRE END ALFA v7iæpøVISER VI H0

If the p-value is greater than alpha, you accept the null hypothesis. If it is less than alpha, you
reject the null hypothesis.

- Teststatistikken i en statistisk test giver en måling af, hvor meget data afviger fra forventet.
- For chi-i-anden-testen angiver den, hvor meget observerede frekvenser afviger fra
forventede.
o Større værdier indikerer større afvigelser. Sammenlignes med en kritisk værdi eller
p-værdi for at vurdere, om afvigelsen er signifikant og om nulhypotesen kan
forkastes.

Chapter 16 Simple linear regression’’


https://www.youtube.com/watch?v=ZkjP5RJLQF4&list=PLIeGtxpvyG-LoKUpV0fSY8BGKIMIdmfCi

God video overfor:

Regression is a model that allows us to see the relationship between 2 or more variables.

With only 1 variable and no other information the best prediction of our next prediction would be
to take out or mean.

Residuals: residuals is the difference between the fit-line and the observed data: They are also
called the error, because that how far the observed data is from the best fit line

The residuals always add up to zero:


If we remember the Standard deviation is that the deviation from the mean and squared them. We
are going to do the except same thing here. But why do we squar the residuals:

First its going to make all numbers positive and the second thing is, its emphasizes larger
devations:

See below:

If we add all the numbers up, we are getting the sum of squared residuals/ or the sum of squared
errors:
When we say, sum of squared errors, it is the square.

The goal of a linear regression model is to minimize the sum of squared erros:

So we are gonna create a different line of the data, when we introduce an independent variable
that reduce the size of these squares and what will be our best fit-line from out data.’

The only problem is, that we are only using the dependent variable, we need a independent
variable.
- When we introduce the independent variable and put it in our regression model, its gonna
eat of some of the sum squared erros.

When we say a regression model is good, we compare its to how much the sum squared of errors
will be reduced by a large amount. So a simple regression model is always a comparison to what
we would have if we only had the dependent variable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy