Title: Subject: Name of The Student: Roll No: Guide Name
Title: Subject: Name of The Student: Roll No: Guide Name
Title: Subject: Name of The Student: Roll No: Guide Name
The correlation and interactions among different biological entities comprise the biological system.
Although already revealed interactions contribute to the understanding of different existing systems, researchers
face many questions everyday regarding inter-relationships among entities. Their queries have potential role in
exploring new relations which may open up a new area of investigation. In this paper, we introduce a text
mining based method for answering the biological queries in terms of statistical computation such that
researchers can come up with new knowledge discovery. It facilitates user to submit their query in natural
linguistic form which can be treated as hypothesis. Our proposed approach analyzes the hypothesis and
measures the p-value of the hypothesis with respect to the existing literature. Based on the measured value, the
system either accepts or rejects the hypothesis from statistical point of view. Moreover, even it does not find any
direct relationship among the entities of the hypothesis, it presents a network to give an integral overview of all
the entities through which the entities might be related. This is also congenial for the researchers to widen their
view and thus think of new hypothesis for further investigation. It assists researcher to get a quantitative
evaluation of their assumptions such that they can reach a logical conclusion and thus aids in relevant re-
searches of biological knowledge discovery. The system also provides the researchers a graphical interactive
interface to submit their hypothesis for assessment in a more convenient way.
Testing statistical hypotheses is one of the most important areas of statistical analysis. In many situations,
the researchers in the field of data analysis are interested in testing a hypothesis about the population parameter.
In traditional testing, the observations of sample are crisp and a statistical test leads to the binary decision.
However, in real life, the data sometimes cannot be recorded precisely. The statistical hypothesis testing under
fuzzy environments has been studied by many authors. Arnold discussed the fuzzy hypotheses testing with crisp
data. The Neyman–Pearson type testing hypotheses was proposed by Casals and Gil and Son et al. Saade
considered the binary hypotheses testing and discussed the fuzzy likelihood functions in the decision making
process. Casals and Gil considered the Bayesian sequential tests for fuzzy parametric hypotheses from fuzzy
information. In the human sciences, Niskanen discussed the applications of soft statistical hypotheses.The
statistical hypotheses testing for fuzzy data by proposing the notions of degrees of optimism and pessimism was
proposed by Wu. Akbari and Rezaei investigated a bootstrap method for inference about the variance based on
fuzzy data.
Viertl investigated some methods to construct confidence intervals and statistical tests for fuzzy data. Wu
proposed some approaches to construct fuzzy confidence intervals for the unknown fuzzy parameter. Arefi and
Taheri developed an approach to test fuzzy hypotheses upon fuzzy test statistic for vague data. The fuzzy tests
for hypotheses testing with vague data were proposed by Grzegorzewski , Montenegro et al, Baloui Jamkhaneh
and Nadi Ghara and Watanabe and Imaizumi. A new approach to the problem of testing statistical hypotheses
for fuzzy data using the relationship between confidence intervals and testing hypotheses is introduced by
Chachi et al.
In this paper, we propose a new statistical hypothesis testing procedure about population means when the
data of the given two samples are real intervals. We provide the decision rules which are used to accept or reject
the null and alternative hypotheses. In the proposed test, we split the given interval data into two different sets of
crisp data namely, upper level data and lower level data ; then, we find the test statistic values for the two sets of
crisp data and then we obtain a decision about the population means on the basis of the decision rules. In this
testing procedure, we are not using degrees of optimism and pessimism and h-level set. To illustrate the
proposed testing procedure, a numerical example is given. Further, we extend the proposed test to statistical
hypotheses with fuzzy data.
Null Hypothesis: A null hypothesis is a specific baseline statement to be tested and it usually takes such forms
as “no effect” or “no difference.” An alternative (research) hypothesis is denial of the null hypothesis. [7]. We
always make null hypothesis which is of the form like “There is no significant difference between x and y”.
Alternative Hypothesis: An Alternative Hypothesis is denoted by H1 or Ha, is the hypothesis that sample
observations are influenced by some non-random cause. Rejection of null hypothesis leads to the acceptance of
alternative hypothesis e.g. Null hypothesis: “x = y.”
2) Set up a suitable significance level e.g.at 1%, 5%, 10% level of significance etc.
6) Draw conclusions.
If calculated value > Table value then null hypothesis is rejected
Applications: To find significant difference between two sample proportions P1 and P2.
Z= 1 1
√𝑃𝑄( + )
𝑛1 𝑛2
P= , Q=1-P
𝑥− 𝜇
Z= s When population S.D. is not known
Z= 2 2
When population S.D. is not known
√s1 +s2
𝑛1 𝑛2
Z= 2 2
When population S.D. is not known
√ s1 + s2
2𝑛1 2𝑛2
It is an important test amongst various tests of significance and was developed by Karl Pearson in 1900. It is
based on frequencies and not on the parameters like mean, S.D. etc.
Applications: Chi Square test is used to compare observed and expected frequencies objectively. It can be used
(i) as a test of goodness of fit and (ii) as a test of independence of attributes. Conditions for applying
χ2 Test:-
(ii) No expected cell frequency should be smaller than 10. If this type of problem occurs then difficulty is
overcome by grouping two or more classes before calculating (O-E).
(a) Chi Square Test As a test of goodness of fit:-
Chi square test enables us to see how well does the assumed theoretical distribution (such as Binomial
distribution, Poisson distribution or Normal distribution) fit to the observed data.
Formula: χ2 = Σ
Oij = observed frequency of the cell in ith row and jth column.
Eij = expected frequency of the cell in ith row and jth column.
Degree of freedom=
e.g: It may help in finding whether a new drug is effective in curing a disease or not.
(𝑂 − 𝐸)2⁄
Formula: χ2=∑ 𝐸
where ‘O’ represents the observed frequency. E is the expected frequency under the null hypothesis and
𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 ∗ 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
computed by E=
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
For the goodness-of-fit test, a theoretical relationship is used to calculate the expected frequencies. For
the test of independence, only the observed frequencies are used to calculate the expected frequencies.
Student’s t-test:
t statistic was developed by William S. Gossett and was published under the pseudonym Student.
Applications: t-test is used to test the significance of sample mean, difference of two sample means or two
related sample means in case of small samples when population variance is unknown.
Applications:- It is used to test whether the mean of a sample deviates significantly from a stated value when
variance of population is unknown.
𝑥− 𝜇
Formula: t= 𝐬 when S.D is given
When S.D is not given then find s by using the formula 𝑆 2 = ∑(xi-x)2
Applications: It is used to compare the mean of two samples of size n1 and n2 when population variances are
Formula: t= 1 1
when s.d. of two samples is known
𝑠√ +
𝑛1 𝑛2
𝑛1𝑆12 +𝑛2𝑆22
And S2 =
In modern manufacturing plants, people still seldom attach importance to hypothesis testing, which they believe
is merely a matter of theory. However, the application of hypothesis testing in quality management should be
promoted. Both parametric test (t-test and z-test) and nonparametric test (sign test and Wilcoxon rank-sum test)
are appropriate for use in a manufacturing environment.
Data collection establishes the foundation for appraising quality of a product or service. But without correct data
processing, it becomes challenging to make an objective conclusion. Sometimes, the observation is wrongly
For instance, suppose that the fallout rate of samples drawn from two different groups is 15% and 10%,
respectively. It would be a partial judgment saying that one is better than the other. On this occasion, hypothesis
testing is instrumental in explanation of phenomena. Unfortunately, in many manufacturing facilities people
tend to merely focus on descriptive statistics such as arithmetic mean and range. Simply put, application of
hypothesis testing is indispensable to better understand quality data and provide guidance to production control.
To estimate the population mean, confidence interval is introduced because the mean value of samples is not
equal to that of population. For spot checks in the manufacturing process, two risks of making a wrong
conclusion appear: Type I and Type II risks. Type I risk (α) is the probability of rejecting qualified products (for
producer); Type II risk (β) is the probability of accepting nonconforming products (for customer).
With the correct use of above discussed tests, valid results can be found. So precaution should be taken while
selecting the tests of hypothesis for large and small sample tests otherwise one get invalid results. That is why
selection of a correct statistical test is much important.
[1] https://www.ijsr.net/archive/v4i5/SUB153997.pdf
[2] https://www.researchgate.net/publication/321003737_HYPOTHESIS_TESTING
[3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=
[4] Minhaz Fahim Zibran , CHI-Squared Test of Independence, Department of Computer Science , University of
Calgary, Alberta,Canada
[5] Bhattacharya, Dipak Kumar Research Methodology, New Delhi, Excell books.
[6] Pannerselvam, R. 2014: Research Methodology, New Delhi.Prentice Hall of India Pvt Ltd.
[7] Bali N.P., Gupta P.N., Gandhi C.P., 2008: Quantitative Techniques, New Delhi, and University Science
[8] Cochram W.G., 1963: Sampling Techniques, New York, John Wiley & Sons.
[9] Chance, William A.1975: Statistical Methods for Decision Making, Bombay, D.B. Taraporevala sons & Co.
Pvt. Ltd.