A study that demonstrated a disappointing lack of reproducibility in studies in psychology has recently received much publicity in scientific journals as well as popular media, including the New York Times. Without going into the details of the study, only 39% of the effect sizes in 100 articles could be reproduced. In other words, an admittedly overly simple interpretation is that 39% of the findings in psychological science journals could be reproduced [citation is at the end of this post]
There are all sorts of potential explanations. However, one of the tenets of scientific research is that the findings in a study should be reproducible (or replicable). If this is not the case, then the validity of the study is called into question. Putatively, sometimes something as simple as an error in data entry or faulty use of a statistical package such as STATA, SAS, or SPSS could be at fault.
How much research in scientific psychology is in fact “true?” More generally, how much research in medical, epidemiologic, and social science research is reproducible? In clinical studies, the ability to replicate findings using the actual original data is still difficult, but it is becoming easier because Congressionally mandated NIH funded data collection needs to be shared. A data sharing plan is mandatory in NIH proposals. However, replicability is all too rare and cumbersome.
The public is frequently confused when apparently conflicting results in clinical studies are publicized in the press. There are all sorts of examples, ranging from the relative contributions of diet and exercise in prevention of cardiovascular disease; the relative benefits of medication and therapy in treating affective disorders such as depression; and whether or not vitamin D supplementation attenuates the risk of diseases ranging from coronary heart disease to multiple sclerosis to affective disorders. Because vitamin D may have some effect on immune modulation, some studies have demonstrated that vitamin D deficiency may attenuate the immune response to a variety of pathogens. I am sure that all of you can think of examples.
The most downloaded paper in the PLoS journals, published years before the replicability study in Science is titled “Why most research findings are false”. [citation below] This paper is written mostly about clinical research findings–of fundamental importance in medicine and public health–but it can be extended to many of the empirical biological and social sciences. Some of the reasons for false findings include inadequate study design and statistical power, sampling and study bias, chasing statistical significance rather than a priori specifying the study power and therefore calculating, a priori, the necessary sample size, and low pre and post test probabilities for positive predictive values( aptly called by my colleague Noel Weiss, “the predictive value of a positive test”).
Despite the fundamental need for replication, it is all too difficult to accomplish. My own experience, and the experiences of innumerable colleagues in the social sciences, epidemiology, and clinical sciences is that it is difficult to obtain funding to replicate prior research, and it is equally difficult to publish research that replicates earlier research. I have seen this as a member of many NIH study sections (funding panels), National Academy of Sciences and Institute of Medicine committees, and membership on journal editorial boards. I have experienced this in submitting research papers for publication, and most colleagues with whom I have spoken repeat the same story. A damning critique of research, sometimes regarded as a fatal flaw, is that the research is “derivative” and unoriginal. In other words, it has been done before. This bias makes replication virtually impossible, even though it is fundamentally important in science and social science.
It is equally dismaying that a subpopulation of researchers who are wedded only to open qualitative interviews even deny the importance of replicability and generalizability, stating that the whole point of the effort is not to generalize, but to understand the subjective experiences of perhaps 10-20 research respondents. While I do not deny that there is much to be learned by in depth understanding of subjective experience, a denial of the importance of generalizability is almost proudly publicized as being virtually 0. This will be the subject of a subsequent post.
If 1) much psychological science is not reproducible; 2) much to most research findings are wrong; 3) it is very difficult to obtain funding for “derivative research” or to publish such research, this makes me wonder if our very understanding is misplaced. In clinical epidemiology and in clinical trials, how many of the findings are so false that treatment at the individual levels, or community based public health interventions are fundamentally incorrect? How many medications are administered based upon faulty research, how much surgery is performed out of misplaced trust in the scientific basis for the surgery, and how much public health policy and intervention is based upon findings that cannot be replicated and that may, in fact, be “false”? How much evidence based medicine is based upon incorrect and faulty evidence?
This questioning might be extended to the policy sciences in general. For example, economic and fiscal policies strive to based upon trusted understanding of the factors underlying inflation, unemployment, and economic growth?
In short, how much of our understanding in general is incorrect? Unfortunately that remains unknown.
[Open Science Collaboration, Estimating the reproducibility of psychological science. Science 2015;349:943; DOI: 10.1126/science.aac4716]
[Ioannidis JPA (2005) Why most published
research findings are false. PLoS Med 2(8): e124]