Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

n2doc

(47,953 posts)
Tue Mar 10, 2015, 11:47 AM Mar 2015

Goodbye P value: is it time to let go of one of science’s most fundamental measures?

How should scientists interpret their data? Emerging from their labs after days, weeks, months, even years spent measuring and recording, how do researchers draw conclusions about the results of their experiments? Statistical methods are widely used but our recent research in Nature Methods reveals that one of the classic science statistics, the P value, may not be as reliable as we like to think.

Scientists like numbers, because they can be compared with other numbers. And often these comparisons are made with statistical analyses, to formalise the process. The broad idea behind all statistical analyses is that they allow the researcher to make somewhat objective assessments of the results of their experiments.

Which drug is more effective?

Scientists often conduct experiments to investigate whether there is a difference between two conditions: do people get better more quickly after taking the blue pill (condition one) or the red pill (condition two)? The most common method for assessing if the pills differ in their effectiveness is to undertake statistical analysis of where some patients were given the blue pill and some the red, and from this determine whether there is strong evidence that one colour is more effective than the other.

To assess experimental results, scientists very often use a “P value” (P is for probability). This is used to show how convincing these results are: if the P value is small, they think that the findings are real and not just a fluke. In our pill example, if P is small this is considered good evidence that there is a difference in effectiveness of the two colours of pill.

more

https://theconversation.com/goodbye-p-value-is-it-time-to-let-go-of-one-of-sciences-most-fundamental-measures-38057

8 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies

phantom power

(25,966 posts)
1. What this tells me is that people using p-values didn't learn what they are
Tue Mar 10, 2015, 12:28 PM
Mar 2015

Nothing in that article should be news to anybody who actually took an undergraduate stats class, and paid attention.

bananas

(27,509 posts)
3. In the real world, as the article points out,
Thu Mar 12, 2015, 06:01 AM
Mar 2015
... studies with low P values are thought to be convincing, and so are not often repeated to be sure the results are correct. This might seem reasonable because there is limited money and time in science – results from a study that seem very clear perhaps do not warrant double-checking when there are new discoveries out there to be made.

<snip>

This weakness could well explain why famous scientific findings from the past, central to the foundations of many disciplines, are not being confirmed now that the original studies are finally being re-examined.

These include a lack of reproducibility in cancer research, ...

<snip>



phantom power

(25,966 posts)
2. And another thing. If people are *really* reporting p-values without noting magnitude differences..
Tue Mar 10, 2015, 12:31 PM
Mar 2015

they must either be exceptionally stupid, or deliberately attempting to avoid reporting that the results statistically significant, but not large enough to be useful.

bananas

(27,509 posts)
5. They repeat the experiment until they get a low p value, then they submit for publication.
Thu Mar 12, 2015, 06:22 AM
Mar 2015

Negative results don't get published, they don't even get submitted for publication.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3917235/

Negativity towards negative results: a discussion of the disconnect between scientific worth and scientific culture

Natalie Matosin,1,2,* Elisabeth Frank,1,2 Martin Engel,1,2 Jeremy S. Lum,1,2 and Kelly A. Newell1,2

Abstract

Science is often romanticised as a flawless system of knowledge building, where scientists work together to systematically find answers. In reality, this is not always the case. Dissemination of results are straightforward when the findings are positive, but what happens when you obtain results that support the null hypothesis, or do not fit with the current scientific thinking? In this Editorial, we discuss the issues surrounding publication bias and the difficulty in communicating negative results. Negative findings are a valuable component of the scientific literature because they force us to critically evaluate and validate our current thinking, and fundamentally move us towards unabridged science.

“What gets us into trouble is not what we don’t know, it’s what we know for sure that just ain’t so.” – Mark Twain.

The impact of negative findings

Increasingly, there is pressure on scientists to choose investigative avenues that result in high-impact knowledge. This challenge has, in many cases, swayed scientists to pursue paths of investigation that are not necessarily logical or hypothesis-driven. Rather than approaching a research question in a systematic manner, it seems that scientists are encouraged to pursue non-linear lines of investigation in search of significance, and many that have the luxury are known to tuck away negative findings (the ‘file-drawer’ effect) and focus on their positive outcomes (Scargle, 1999). This behaviour likely stems from an ever-heightening hurdle that scientists need to jump: high publication output with a high citation rate in order to win competitive grants to drive their research, move up the rungs and pay the bills.

Published a few years ago in PLoS ONE, Daniele Fanelli states that “Papers are less likely to be published and to be cited if they report ‘negative’ results” (Fanelli, 2010). Because scientists are involuntarily finding themselves engaged in competition for positions and funding, many are choosing not to proceed with their non-significant findings (those that support the null hypothesis) that yield less scientific interest and fewer citations. Consequently, the amount of non-significant data reported is progressively declining (Fanelli, 2012). Although it could be argued that this is due to an increasing quality of science, it is more likely attributable to the selectiveness of ‘high impact’ journals that, in our opinion, might as well have a bold statement in the submission form: negative results are not accepted. However, there seems to be a gap between results that are positive and results that are high impact. Logically there is no connection, but it seems scientific culture assumes that they are analogous. Why aren’t negative results considered to be of the same value?

<snip>

Examples from psychiatric research

At a recent conference, two colleagues discovered that they had both unsuccessfully attempted to alter depression-like behaviour in the CD1 mouse strain (a widely used animal model for toxicology studies) with a variety of classical antipsychotics. These findings were surprising in light of the many studies demonstrating efficacy of antipsychotic drugs in different experimental models. Realising that this had occurred in two separate labs, they considered that this might not have been a lack of experiments performed using CD1 mice, but rather a lack of publications on negative findings. They have since corroborated their findings with others. Because the results were unpublished, research groups had continued to follow the same lines of thought and the same paths of investigation, only to all fail in the same way, ultimately wasting time and resources. To our knowledge, these results remain unpublished.

<snip>


phantom power

(25,966 posts)
6. Oh man, that is a perfect way to pollute your p-values
Thu Mar 12, 2015, 09:30 AM
Mar 2015


Interesting fact, there is a way to correct for this problem, but it assumes you realize you have a problem, and/or you aren't doing it on purpose:
http://en.wikipedia.org/wiki/Bonferroni_correction#Informal_introduction

bananas

(27,509 posts)
4. "The fickle P value generates irreproducible results"
Thu Mar 12, 2015, 06:07 AM
Mar 2015

You'd think it was published in the Journal of Irreproducible Results, but nope!

http://www.nature.com/nmeth/journal/v12/n3/full/nmeth.3288.html

The fickle P value generates irreproducible results

Lewis G Halsey, Douglas Curran-Everett, Sarah L Vowler & Gordon B Drummond

Nature Methods 12, 179–185 (2015)
doi:10.1038/nmeth.3288

Published online
26 February 2015

The reliability and reproducibility of science are under scrutiny. However, a major cause of this lack of repeatability is not being considered: the wide sample-to-sample variability in the P value. We explain why P is fickle to discourage the ill-informed practice of interpreting analyses based predominantly on this statistic.

<snip>


Jim__

(14,082 posts)
7. Scientists unknowingly tweak experiments
Wed Mar 18, 2015, 09:31 AM
Mar 2015

From Phys.org

...

The study used text mining to extract p-values - a number that indicates how likely it is that a result occurs by chance - from more than 100,000 research papers published around the world, spanning many scientific disciplines, including medicine, biology and psychology.

"Many researchers are not aware that certain methods could make some results seem more important than they are. They are just genuinely excited about finding something new and interesting," Dr Head said.

"I think that pressure to publish is one factor driving this bias. As scientists we are judged by how many publications we have and the quality of the scientific journals they go in.

...

Dr Head said the study found a high number of p-values that were only just over the traditional threshold that most scientists call statistically significant.

...



Sancho

(9,070 posts)
8. Hmmm...statistical significance is a very old debate.
Thu Mar 19, 2015, 08:40 AM
Mar 2015

All the way back to the days of Fisher there were discussions of "alpha levels" and p-value. Of course, the .05, .01, and .1 were the values commonly published for decades in paper tables for use in hypothesis testing with common parametric and non-parametric statistical distributions. People simply got used to using what was in the back of the text that they were taught in statistic courses.

Many years ago (I saw it in the 70's), researchers were teaching about replication, CI's, avoiding the automatic p<.05, etc.

I don't think journals now are as caught up in a particular "p value", but reviewers expect the results to support the conclusions, and the type of analysis used to be appropriate.

The conversation article misses the idea that the researcher drew an incorrect conclusion (Type I and II errors, rival hypotheses, threats to validity, bad measures, lack of power, etc.). The problem is not use of probability, but incorrect conclusions that are widely disseminated and it takes years before anyone pays attention to replications that disagree.

Even though it has been true that results that fail to reject the null (non-significant) may be interesting, but not attractive to journals...the author has the responsibility to describe why not finding an effect is important!


Latest Discussions»Culture Forums»Science»Goodbye P value: is it ti...