Editor, Editors, USER, admin, Bureaucrats, Check users, dev, editor, founder, Interface administrators, member, oversight, Suppressors, Administrators, translator
11,119
edits
Line 154: | Line 154: | ||
{{qnq|But it’s not all so obvious...}} | {{qnq|But it’s not all so obvious...}} | ||
...because epistemology evolves continually, even in medicine: | ...because epistemology evolves continually, even in medicine:<blockquote>'''P-value''': In medicine, for example, we rely on statistical inference to confirm experimental results, specifically the {{Tooltip|P-value|2=The p-value represents the probability that observed results are due to chance, assuming the null hypothesis <math> H_0 </math> is true. It should not be used as a binary criterion (e.g., <math> p < 0.05 </math>) for scientific decisions, as values near the threshold require additional verification, such as cross-validation. ''p-hacking'' (repeating tests to achieve significance) increases false positives. Rigorous experimental design and transparency about all tests conducted can mitigate this risk. Type I error increases with multiple tests: for <math> N </math> independent tests at threshold <math> \alpha </math>, the Family-Wise Error Rate (FWER) is <math> FWER = 1 - (1 - \alpha)^N </math>. Bonferroni correction divides the threshold by the number of tests, <math> p < \frac{\alpha}{N} </math>, but can increase false negatives. The False Discovery Rate (FDR) by Benjamini-Hochberg is less conservative, allowing more true discoveries with an acceptable proportion of false positives. The Bayesian approach uses prior knowledge to balance prior and data with a posterior distribution, offering a valid alternative to the p-value. To combine p-values from multiple studies, meta-analysis uses methods like Fisher's: <math> \chi^2 = -2 \sum \ln(p_i) </math>. In summary, the p-value remains useful when contextualized and integrated with other measures, such as confidence intervals and Bayesian approaches.}}, a "significance test" that assesses data validity. Yet, even this entrenched concept is now being challenged. A recent study highlighted a campaign in the journal "Nature" against the use of the P-value.<ref name=":1" /> Signed by over 800 scientists, this campaign marks a "silent revolution" in statistical inference, encouraging a reflective and modest approach to significance.<ref name=":2" /><ref name=":3" /><ref name=":4" /> The American Statistical Association contributed to this discussion by releasing a special issue of "The American Statistician Association" titled "Statistical Inference in the 21st Century: A World Beyond p < 0.05." It offers new ways to express research significance and embraces uncertainty.<ref name="wasser" /></blockquote> | ||
{| | {| | ||
|- | |- | ||
| | | | ||
*'''P-value''': In medicine, for example, we rely on statistical inference to confirm experimental results, specifically the {{Tooltip|P-value|2=The p-value represents the probability that observed results are due to chance, assuming the null hypothesis <math> H_0 </math> is true. It should not be used as a binary criterion (e.g., <math> p < 0.05 </math>) for scientific decisions, as values near the threshold require additional verification, such as cross-validation. ''p-hacking'' (repeating tests to achieve significance) increases false positives. Rigorous experimental design and transparency about all tests conducted can mitigate this risk. Type I error increases with multiple tests: for <math> N </math> independent tests at threshold <math> \alpha </math>, the Family-Wise Error Rate (FWER) is <math> FWER = 1 - (1 - \alpha)^N </math>. Bonferroni correction divides the threshold by the number of tests, <math> p < \frac{\alpha}{N} </math>, but can increase false negatives. The False Discovery Rate (FDR) by Benjamini-Hochberg is less conservative, allowing more true discoveries with an acceptable proportion of false positives. The Bayesian approach uses prior knowledge to balance prior and data with a posterior distribution, offering a valid alternative to the p-value. To combine p-values from multiple studies, meta-analysis uses methods like Fisher's: <math> \chi^2 = -2 \sum \ln(p_i) </math>. In summary, the p-value remains useful when contextualized and integrated with other measures, such as confidence intervals and Bayesian approaches.}}, a "significance test" that assesses data validity. Yet, even this entrenched concept is now being challenged. A recent study highlighted a campaign in the journal "Nature" against the use of the P-value.<ref>{{cita libro | *'''P-value''': In medicine, for example, we rely on statistical inference to confirm experimental results, specifically the {{Tooltip|P-value|2=The p-value represents the probability that observed results are due to chance, assuming the null hypothesis <math> H_0 </math> is true. It should not be used as a binary criterion (e.g., <math> p < 0.05 </math>) for scientific decisions, as values near the threshold require additional verification, such as cross-validation. ''p-hacking'' (repeating tests to achieve significance) increases false positives. Rigorous experimental design and transparency about all tests conducted can mitigate this risk. Type I error increases with multiple tests: for <math> N </math> independent tests at threshold <math> \alpha </math>, the Family-Wise Error Rate (FWER) is <math> FWER = 1 - (1 - \alpha)^N </math>. Bonferroni correction divides the threshold by the number of tests, <math> p < \frac{\alpha}{N} </math>, but can increase false negatives. The False Discovery Rate (FDR) by Benjamini-Hochberg is less conservative, allowing more true discoveries with an acceptable proportion of false positives. The Bayesian approach uses prior knowledge to balance prior and data with a posterior distribution, offering a valid alternative to the p-value. To combine p-values from multiple studies, meta-analysis uses methods like Fisher's: <math> \chi^2 = -2 \sum \ln(p_i) </math>. In summary, the p-value remains useful when contextualized and integrated with other measures, such as confidence intervals and Bayesian approaches.}}, a "significance test" that assesses data validity. Yet, even this entrenched concept is now being challenged. A recent study highlighted a campaign in the journal "Nature" against the use of the P-value.<ref name=":1">{{cita libro | ||
| autore = Amrhein V | | autore = Amrhein V | ||
| autore2 = Greenland S | | autore2 = Greenland S | ||
Line 174: | Line 173: | ||
| DOI = 10.1038/d41586-019-00857-9 | | DOI = 10.1038/d41586-019-00857-9 | ||
| OCLC = | | OCLC = | ||
}} Mar;567(7748):305-307.</ref> Signed by over 800 scientists, this campaign marks a "silent revolution" in statistical inference, encouraging a reflective and modest approach to significance.<ref>{{cita libro | }} Mar;567(7748):305-307.</ref> Signed by over 800 scientists, this campaign marks a "silent revolution" in statistical inference, encouraging a reflective and modest approach to significance.<ref name=":2">{{cita libro | ||
| autore = Rodgers JL | | autore = Rodgers JL | ||
| titolo = The epistemology of mathematical and statistical modeling: a quiet methodological revolution | | titolo = The epistemology of mathematical and statistical modeling: a quiet methodological revolution | ||
Line 187: | Line 186: | ||
| DOI = 10.1037/a0018326 | | DOI = 10.1037/a0018326 | ||
| OCLC = | | OCLC = | ||
}} Jan;65(1):1-12.</ref><ref>{{cita libro | }} Jan;65(1):1-12.</ref><ref name=":3">{{cita libro | ||
| autore = Meehl P | | autore = Meehl P | ||
| titolo = The problem is epistemology, not statistics: replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions | | titolo = The problem is epistemology, not statistics: replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions | ||
Line 200: | Line 199: | ||
| DOI = | | DOI = | ||
| OCLC = | | OCLC = | ||
}}</ref><ref>{{cita libro | }}</ref><ref name=":4">{{cita libro | ||
| autore = Sprenger J | | autore = Sprenger J | ||
| autore2 = Hartmann S | | autore2 = Hartmann S |
edits