Lesson 1: Brief History of Statistics
Lesson 1: Brief History of Statistics
Lesson 1: Brief History of Statistics
STATISTICS - first applied to the political science concerned with the facts of a state or community XVIII; all derived
immediately from German statistisch adj., statistik sb.; whence statistician XIX.
Statistics is concerned with exploring, summarising, and making inferences about the state of complex systems. As
summarised in Table 1.1, the development of statistics in Europe was strongly motivated by the need to make sense of
the large amount of data collected by population surveys in the emerging nation states. At the same time, the
mathematical foundations for statistics advanced significantly due to breakthroughs in probability theory inspired by
games of chance (gambling). For more information about the history of statistics refer to the books by Johnson and Kotz
(1998) and Kotz and Johnson (1993).
Lesson 2: Application of Statistics
The field of statistics has numerous applications in business. Because of technological advancements, large amounts of
data are generated by business these days. These data are now being used to make decisions. These better decisions we
make help us improve the running of a department, a company , or the entire economy.
Marketing
As per Philip Kotler and Gary Armstrong marketing “ identifies customer needs and wants , determine which target
markets the organisations can serve best, and designs appropriate products, services and Programs to serve these
markets”
Marketing is all about creating and growing customers profitably. Statistics is used in almost every aspect of creating and
growing customers profitably. Statistics is extensively used in making decisions regarding how to sell products to
customers. Also, intelligent use of statistics helps managers to design marketing campaigns targeted at the potential
customers. Marketing research is the systematic and objective gathering, recording and analysis of data about aspects
related to marketing. IMRB international, TNS India, RNB Research, The Nielson , Hansa Research and Ipsos Indica
Research are some of the popular market research companies in India. Web analytics is about the tracking of online
behaviour of potential customers and studying the behaviour of browsers to various websites.
Use of Statistics is indispensable in forecasting sales, market share and demand for various types of Industrial products.
Factor analysis, conjoint analysis and multidimensional scaling are invaluable tools which are based on statistical
concepts, for designing of products and services based on customer response.
Finance
Uncertainty is the hallmark of the financial world. All financial decisions are based on “Expectation” that is best analysed
with the help of the theory of probability and statistical techniques. Probability and statistics are used extensively in
designing of new insurance policies and in fixing of premiums for insurance policies. Statistical tools and technique are
used for analysing risk and quantifying risk, also used in valuation of derivative instruments, comparing return on
investment in two or more instruments or companies.
Beta of a stock or equity is a statistical tool for comparing volatility, and is highly useful for selection of portfolio of
stocks.
The most sophisticated traders in today’s stock markets are those who trade in “derivatives” i.e financial instruments
whose underlying price depends on the price of some other asset.
Economics
Statistical data and methods render valuable assistance in the proper understanding of the economic problem and the
formulation of economic policies. Most economic phenomena and indicators can be quantified and dealt with
statistically sound logic.
In fact, Statistics got so much integrated with Economics that it led to development of a new subject called Econometrics
which basically deals with economics issues involving use of Statistics.
Operations
The field of operations is about transforming various resources into product and services in the place, quantity, cost,
quality and time as required by the customers. Statistics plays a very useful role at the input stage through sampling
inspection and inventory management, in the process stage through statistical quality control and six sigma method, and
in the output stage through sampling inspection. The term Six Sigma quality refers to situation where there is only 3.4
defects per million opportunities.
Human Resource departments are inter alia entrusted with the responsibility of evaluating the performance, developing
rating systems, evolving compensatory reward and training system, etc. All these functions involve designing forms,
collecting, storing, retrieval and analysis of a mass of data. All these functions can be performed efficiently and
effectively with the help of statistics.
Information Systems
Information Technology (IT) and statistics both have similar systematic approach in problem solving. IT uses Statistics in
various areas like, optimisation of server time, assessing performance of a program by finding time taken as well as
resources used by the Program. It is also used in testing of the software.
Data Mining
In Marketing, Data mining can be used for market analysis and management, target marketing, CRM, market basket
analysis, cross selling, market segmentation, customer profiling and managing web based marketing, etc.
In Risk analysis and management, it is used for forecasting , customer retention, quality control, competitive analysis and
detection of unusual patterns.
In Finance, it is used in corporate planning and risk evaluation, financial planning and asset evaluation, cash flow analysis
and prediction , contingent claim analysis to evaluate assets, cross sectional and time series analysis, customer credit
rating, detecting of money laundering and other financial crimes.
In Operations, it is used for resource planning , for summarising and comparing the resources and spending.
In Retail industry, it is used to identify customer behaviours, patterns and trends as also for designing more effective
goods transportation and distribution policies, etc.
Lesson 3: Basic Statistical Terms
STATISTICS
2 Division of Statistics
1. Descriptive Statistics – it is concerned with organizing and summarizing information about a collection of actual
observations
2. Inferential Statistics – it is concerned about generalizing beyond actual observations
Population – it is the totality of the groups of people, objects, events or things of any form being investigated,
studied or researched.
Parameter – it is a property of the population that can be expressed as a number.
Sample – it is a small group taken from a population
Statistic – it is a property of the sample that can be expressed as a number.
2. 40% of 1,211 students at a particular elementary school got below a 3 on a standardized test.
3. 65% of the 1st year students in BSSW have been involved in outreach activities.
Statistic- describe a sample.
Variables
Kinds of Variables
Independent Variable
It is a variable that can affect another variable. These are the individual variables that may have an effect on the
dependent variable.
- these are called the “explanatory variables”, “manipulated variables” or controlled variables.
Dependent Variable
TYPES OF DATA
Qualitative Data – A set of observations where any single observation is a word or code that represents a class or
category.
Quantitative Data – A set of observations where any single observation is a number that represents an amount or count.
Magnitude. It refers to the ability to know if one score is greater than, equal to, or less than another score.
Equal Intervals. It means that the possible scores are each an equal distance from each other.
Absolute Zero. It refers to a point where a score of zero can be assigned.
LEVELS OF MEASUREMENT
MODULE 2:
Introduction
- In today's time, we cannot deny the fact that vast amount of data is literally around us. We use these data to
understand the facts and ideas that exist and help us developed in different fields. Sourcing these data is at
present became easy with the help of modern technologies. But people tend to look at this vast amount of
numbers to be tedious, boring, and/or overwhelming. Such data need to be properly organized to extract
meaningful information from them.
- In this module, the different learning tools will help you identify the appropriate method of collecting and
presenting data to come up with an important information that can be used to business decision-making.
Sample Size
Effect size
Tolerance for errors in decision (Type 1 or Type 2 error)
Statistical Power
Statistical Test
1. Effect size
measures the size of an effect as it exists in the population, in a way that is independent of certain details of the
experiment such as the sizes of the samples used.
In a very influential book on power analysis, Cohen (1988) defined some standards for interpreting effect sizes:
- A small effect = 0.25
- A medium effect =0.50
- A large effect =0.80
2. Tolerance error
3. Statistical Power
- Likelihood that a study will detect an effect when there is an effect there to be detected.
- Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1 – β.
- If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact,
there is one, goes down.
G*Power software
Software for determining the sample size with the computed.
Effect size
Marginal error
Statistical power
Statistical test
Sampling Techniques
1. Random Sampling
A random sample is a sample obtained from the population in such a manner that all samples of the same size
have equal likelihood of being selected. Any method of obtaining random samples is called random sampling.
2. Systematic Sampling
From a list of members of a population, choose a starting point by chance and then select every nth element on
the list (for some appropriate value of n). The result is a systematic sample.
3. Stratified Sampling
If the population is divided into subpopulations, called strata, and we take a random sample from each stratum,
the resulting sample is a stratified sample.
Stratified Sampling = Proportional Allocation ; Equal Allocation
4. Cluster Sampling
A cluster sample is obtained by selecting some of the strata and then sampling from each of these.
5. Purposive Sampling
A non-probability sampling technique, where the researcher set a criteria or requirements on who will be
included as samples
In summary...
SAMPLING TECHNIQUES
1. Probability Sampling – These sampling techniques make use of “chance process” in selecting the sample of the
study.
1. Random Sampling – Every member of the population is given equal chance of being selected.
2. Stratified Random Sampling – Population is first divided into homogeneous subgroups from which simple random
samples are then drawn.
4. Cluster Sampling – It referred to as “area sampling” because the population is spread out over wide area.
2. Non-Probability Sampling – These sampling techniques do not use “the chance process” in obtaining the
sample of a study.
Convenience Sampling – It simply utilized results which are available.
2. Purposive Sampling – The objects of the study has a unique characteristic, hence, not just anybody can be
included as sample of the study.
A sample is a percentage of the total population in statistics. You can use the data from a sample to make inferences
about a population as a whole. For example, the standard deviation of a sample can be used to approximate the
standard deviation of a population. Finding a sample size can be one of the most challenging tasks in statistics and
depends upon many factors including the size of your original population.
General Tips
Step 1: Conduct a census if you have a small population. A “small” population will depend on your budget and time
constraints. For example, it may take a day to take a census of a student body at a small private university of 1,000
students but you may not have the time to survey 10,000 students at a large state university.
Step 2: Use a sample size from a similar study. Chances are, your type of study has already been undertaken by someone
else. You’ll need access to academic databases to search for a study (usually your school or college will have access). A
pitfall: you’ll be relying on someone else correctly calculating the sample size. Any errors they have made in their
calculations will transfer over to your study.
Step 3: Use a table to find your sample size. If you have a fairly generic study, then there is probably a table for it. For
example, if you have a clinical study, you may be able to use a table published in Machin et. al’s Sample Size Tables for
Clinical Studies, Third Edition.
Step 5: Use a formula. There are many different formulas you can use, depending on what you know (or don’t know)
about your population. If you know some parameters about your population (like a known standard deviation), you can
use the techniques below. If you don’t know much about your population, use Slovin’s formula.
HYPOTHESIS TESTING
- It is a statement with many faceted ideas that are still to be resolved whether they are true or not.
- It is a wise or intellectual guess.
2 types of hypothesis
1. Null Hypothesis (Ho) - it always express the idea of non-significant difference.
2. Alternative Hypothesis (Ha) – it is a negation of the null hypothesis.
Types of Errors
1. Type I ( α–error) – Rejecting the null hypothesis when in fact it is true.
2. Type II (β-error) – Accepting the null hypothesis when in fact is it false.
Types of Tests
1. One-sided (directional) – it is called one-tailed test. It is use when the alternative hypothesis is express/use a
comparative word such as higher, better, bigger, more intelligent, etc.
2. Two-sided (non-directional) – it is called two-tailed test. It is use when the alternative hypothesis use the
following words/phrases: significant difference, not + comparative word, not equal, not the same, etc.
T-test . It is use when the sample standard deviation is given.
Z-test. It is use when the population standard deviation is given.
1. Formulate the null hypothesis (Ho) and the alternative hypothesis (Ha) .
2. Set the level of significance.
*** Choose between 1% to 10% depending on the risk of error the researcher is willing to take. A 1% level of
significance means the researcher is giving 1% error on his decision and that he is 99% confident of his decision to be
right.
3. Identify the statistical test.
*** For z-test, look at the value of ∞ in the lowest portion of the t-value.
2 groups of sample: df = n1 + n2 - 2
6. Decide whether to accept or to reject the null hypothesis.
*** Accept Null Hypothesis if the computed value is less than the tabular value.
*** Reject Null Hypothesis if the computed value is greater than the tabular value.
The discussion below will give you an idea how hypothesis testing is used in business.
Hypothesis testing is a step-by-step process to determine whether a stated hypothesis about a given population is true.
It is an important tool in business development. By testing different theories and practices, and the effects they produce
on your business, you can make more informed decisions about how to grow your business moving forward. Hypothesis
testing can keep you from wasting time on initiatives that have no effect on growing your business, and it can help you
maximize your resources and manpower by focusing them toward measures that can produce the biggest effects. Once
you understand how hypothesis testing works and the steps involved, it is easy to apply it to your business decisions.
Hypothesis testing examines whether a hypothesis about a given population is true. It does so by reframing the
supposition as a pair of opposing hypotheses. The first is called the null hypothesis. The null hypothesis means no effect,
or no change was observed in the population that cannot be explained by random chance. Opposing the null hypothesis
is the alternative hypothesis, which states that any change seen in the population was too improbable to be explained
by random chance. Ultimately, the goal of hypothesis testing (Links to an external site.) is to either accept the null
hypothesis or reject it. This requires working your way through four steps to arrive at a conclusion.
The first step of hypothesis testing is to state your hypothesis as a set of opposing theories so only one can be right. For
example, if you want to know whether Toronto residents’ IQs differ from Canadians in general, you might set your
hypotheses up as follows: The null hypothesis is that no difference exists between Toronto residents’ IQs and Canadians’
IQs that cannot be explained by random chance. The alternative hypothesis is that the two sets of IQs differ. Once you
decide on your hypothesis and frame it in the proper way, you can advance to the testing process.
Once your hypothesis is in place and your parameters are set, you can gather the data you need and begin calculating it.
Suppose your sample population from Toronto consists of 500 randomly selected adults. During step three of the
hypothesis testing process (Links to an external site.), you would gather the IQs of these people and compare them to
the IQs of Canadians at large. Using the statistical significance formula, you would calculate the probability that any
difference seen is due to random chance.
After gathering the necessary data, computing it, and measuring the level of statistical significance, it’s time to analyze
the results by comparing the result to the threshold you set in step two. How your results compare to the parameters
you set determine whether to accept the null hypothesis or reject it. If your result meets or exceeds the required level of
statistical significance, then the null hypothesis is determined to be false. Otherwise, the null hypothesis is determined
to be true.
Hypothesis testing has many uses for helping you develop your business. Suppose you are training your outside sales
force, and want to know whether a specific sales technique results in a higher close ratio than the methods currently
employed by your company. To make this determination, you can take the same steps as outlined above for the Toronto
IQ experiment. Your null hypothesis would be that the new technique has no effect on sales that isn’t explained by
random chance, while your alternative hypothesis would be that the method does have an effect, whether positive or
negative. If you conclude that the technique has an effect, and it is positive, then you can implement the new method
with confidence, knowing it is likely to bring you results. Hypothesis testing sounds complicated, but it is a simple
process when broken down into steps, and it can help you make better business decisions.