Statistics

=====Significance=====
{{main|Statistical significance}}

Statistics rarely give a simple Yes/No type answer to the question under analysis. Interpretation often comes down to the level of statistical significance applied to the numbers and often refers to the probability of a value accurately rejecting the null hypothesis (sometimes referred to as the [[p-value]]).

[[File:P-value in statistical significance testing.svg|upright=1.8|thumb|right|In this graph the black line is probability distribution for the [[test statistic]], the [[Critical region#Definition of terms|critical region]] is the set of values to the right of the observed data point (observed value of the test statistic) and the [[p-value]] is represented by the green area.]]

The standard approach<ref name="Piazza"/> is to test a null hypothesis against an alternative hypothesis. A [[Critical region#Definition of terms|critical region]] is the set of values of the estimator that leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true ([[statistical significance]]) and the probability of type II error is the probability that the estimator does not belong to the critical region given that the alternative hypothesis is true. The [[statistical power]] of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false.

Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms. For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small beneficial effect, such that the drug is unlikely to help the patient noticeably.

Although in principle the acceptable level of statistical significance may be subject to debate, the [[significance level]] is the largest p-value that allows the test to reject the null hypothesis. This test is logically equivalent to saying that the p-value is the probability, assuming the null hypothesis is true, of observing a result at least as extreme as the [[test statistic]]. Therefore, the smaller the significance level, the lower the probability of committing type I error.

Some problems are usually associated with this framework (See [[Statistical hypothesis testing#Criticism|criticism of hypothesis testing]]):
* A difference that is highly statistically significant can still be of no practical significance, but it is possible to properly formulate tests to account for this. One response involves going beyond reporting only the [[significance level]] to include the [[p-value|''p''-value]] when reporting whether a hypothesis is rejected or accepted. The p-value, however, does not indicate the [[effect size|size]] or importance of the observed effect and can also seem to exaggerate the importance of minor differences in large studies. A better and increasingly common approach is to report [[confidence interval]]s. Although these are produced from the same calculations as those of hypothesis tests or ''p''-values, they describe both the size of the effect and the uncertainty surrounding it.
* Fallacy of the transposed conditional, aka [[prosecutor's fallacy]]: criticisms arise because the hypothesis testing approach forces one hypothesis (the [[null hypothesis]]) to be favored, since what is being evaluated is the probability of the observed result given the null hypothesis and not probability of the null hypothesis given the observed result. An alternative to this approach is offered by [[Bayesian inference]], although it requires establishing a [[prior probability]].<ref name=Ioannidis2005>{{Cite journal | last1 = Ioannidis | first1 = J.P.A. | author-link1 = John P.A. Ioannidis| title = Why Most Published Research Findings Are False | journal = PLOS Medicine | volume = 2 | issue = 8 | pages = e124 | year = 2005 | pmid = 16060722 | pmc = 1182327 | doi = 10.1371/journal.pmed.0020124 | doi-access = free }}</ref>
* Rejecting the null hypothesis does not automatically prove the alternative hypothesis.
* As everything in [[inferential statistics]] it relies on sample size, and therefore under [[fat tails]] p-values may be seriously mis-computed.{{clarify|date=October 2016}}