Thoughts on Recent Critiques of Psychology Research
There have been a lot of stories from very credible sources recently that are discussing challenges in psychology research and whether results from these studies can be trusted. For example, Vox recently posted a great article that addresses the situation. (If you read the article, please read all the way to the end). This comes after different articles in the NYT and Science have talked about the topic as well. It's certainly a discussion worth having, but one that tends to paint psychology in an unfavorable (albeit mildly optimistic) way. As someone involved in psychology research, I wanted to give my own opinion on the situation.
Disclaimer: The views I express in this post are my own, and do not represent the views of any of my colleagues or the institutions with which I am affiliated.
The Challenges of Research
Many of these articles discuss psychology research specifically, though they generally underplay the fact that many of these challenges apply to other fields as well. In research, there are a lot of assumptions that need to be made, and there is often limits to what can be controlled. This may be more true of psychology research than most fields, but it is by no means unique to psychology.
It's also important to understand the complexity of research, especially when humans are involved as participants, which these articles have not done in much depth. To help start this discussion, it's helpful to go through some of the more prominent difficulties we face in our research.
In research, we are generally trying to understand a population. Psychology looks at the behavior and cognitive processes of humans, medicine looks more at the biological functioning, and so on. But it's not realistic to study every person in the population (whether that be of a state, a country, or the world). To get around this problem, we use a sample of the population.
Ideally, sampling happens via a random process. That is, all members of the population are equally likely to be selected as participants. If that is the case, then theoretically the sample will be fairly equivalent to the population (if the sample has enough people included). For example, if the population is 50% female, then the random sample would also be expected to be 50% female just by the process of random selection.
Unfortunately random sampling is almost never the case. Samples are made up of people who are willing to participate in research, which is only a portion of the population. Even then, other restrictions may narrow down who is included in the study: location, transportation, time/availability, and so forth. This puts a limit on how representative our samples are (especially location). National studies provide some hope to increase how much samples represent the population, though these are very difficult to actually undertake. In addition, it's incredibly difficult to get a large number of participants, requiring a lot of resources that are becoming harder to obtain for research studies.
Because of these challenges, we know that specific studies tend to really only tell us about a specific group. That is, the results only suggest something about the group that they were able to have participate. Because of this fact, we wouldn't necessarily expect the same results to happen in another group (which we'll come back to). It's a problem, certainly, but it's a limitation we almost always mention in our publications.
When conducting statistical analyses, there are further assumptions that need to be made. Without going into too much depth, let's just say that statistical analyses work best (i.e., are theoretically most accurate) when the data are "neat" and fit expected patterns (e.g., being normally distributed). When it comes to humans, things are rarely simple and straightforward. The data are all there and may be collected properly, but whether the analyses we use are working as intended is a fairly big assumption to make. An assumption that many know is problematic, but psychology research would be difficult to conduct if not assumed to be true for the purposes of conducting analyses.
This "messiness" is because there are so many different components to the lives of each person. Random sampling can help to cancel out many of those differences, but there are often still some remaining factors that influence the data.
Over time, more sophisticated analyses have been developed to help us capture more of what's happening and to be more capable of handling data that are inconsistent. But with these developments come new ways of learning analyses, they require interpretation that introduces some subjectivity, and they are harder to communicate in a way that is easy to understand. That's not to say these shouldn't be pursued, just that there are challenges to be figured out, and that are currently being worked on.
Another component of the statistics that is often mentioned is "p-hacking." Many readers of this post may have heard of the mystical "p < .05" that researchers hope to obtain (even though the .05 cut-off is admittedly arbitrary). The p-value represents the likelihood of the results being obtained by chance, if the relationship being tested does not actually exist. It's a distinction that can be easy to forget, but it's important. A lower p-value does not mean "more significant," and there are a lot of ways to influence the p-value. For example, large samples can have significant results for very small differences, which in reality do not have clinical importance.
In response to the challenges of the p-value, some researchers are advocating for alternative approaches to determining significance (like the Bayesian approach). These alternatives will likely benefit psychology (and other fields!) going forward, though it will require a huge shift in how we analyze and think about our data.
As other articles have mentioned, the challenge is the perceived need to have significant results. Papers have historically been easier to get published if the results are statistically significant, so researchers would hunt for p < .05. It's a huge problem for science as a whole, and not something that only occurs in psychology by any means. Thankfully there has been a growing emphasis on accepting papers based on methods rather than results, so this will hopefully become less of an issue going forward.
As I briefly mentioned above, there are countless things happening at the same time, all influencing one another to various extents. To help focus research, the ideal is to control everything except for the variables of interest. This is possible in lab settings that deal with very controlled substances, but not so much with humans. All participants have their own histories, their own genetics, their own culture, and so forth. We also can't control for things like diet, media exposure, social interactions, light exposure, education, and on and on. At best, we can control their situation during the experiment. Beyond that, we statistically "control" for other variables by taking them into account during the analyses, or we assume they are "controlled" for by the random sampling/assignment process (i.e., they'll be equally represented across the participants/groups). It's better than nothing, but is not a true solution to the problem.
Along with challenges related to statistical control not really controlling for the other variables in the same way as controlling for them in the study itself, there are limits to how many variables can be included in analyses. Too many variables, and their overlapping effects begin to reduce the numbers to noise. That's because we are only able to estimate how much each variable explains, so there is error that builds up as more and more variables are introduced. So at best we can control a few things as part of the study design, control a few more statistically, but then we don't know how the others are playing a role.
Which leads us to one of the biggest things that I feel like is being missed in these other articles criticizing psychology research...
Replication vs. Generalization
Replication of research is reproducing a study with a different sample to see if it continues to be true. If it does, then the relationship in question is considered to be better supported. However, replication is too often looked at as validation. That is, if something fails to replicate then it is assumed false.
Failed replication can certainly mean the original results are false, because we always have a risk of a false positive (and in fact, statistically some of the positive published results must be false positives). But what is less often discussed is that failed replication may just be a failure for something to generalize. That is, it may be that the original results are in fact true, but only for the group that was sampled in that original study, as I mentioned above.
If that's the case, then failed replication should be viewed as an opportunity to better understand diversity and what other factors are important, rather than a "who's right and who's wrong?" battle. Replication studies are subject to the same weaknesses as other studies, so it may simply be that something that applied to the original study no longer applies to the newer study (e.g., culture, experiences of people in the area).
And once more, this is not something unique to psychology. Many fields are subject to these same difficulties.
What's the Answer?
One solution to some of these challenges is having a more open (and collaborative) system of research. Open Science advocates for making as much of the research process publicly available as possible, including notes and analyses that are conducted. This is similar to Open Access, which supports having all published research articles freely available to the public. By having things more publicly accessible, we open the process up to more scrutiny and input that can hopefully result in corrected mistakes, better problem solving of methods, and so forth. An additional benefit is being able to more easily discover minor differences between studies, so we can get a sense of why something may not have replicated and what is suggests about the variables of interest.
While I support both movements, there is one component of Open Science that worries me for some areas of research. In fields of science that work with human participants, there is a need to take privacy/confidentiality into account. If researchers share data publicly, or even potentially some notes, they would certainly be "de-identified." That is, things like names and birth dates would be removed. But even if participants are represented by random ID numbers, there's still a lot of information there. In an era where it's becoming easier for people to match small bits of information together to identify someone, we'll need to show that it's safe to share the data before making it available. I'm not sure how that will be done yet, but it's just a hurdle that I expect psychology/medicine/etc. will run into at some point in the near future.
What Can You Do?
As all of this information is coming to light, it is becoming increasingly important for people to be aware of how to think critically when hearing about research findings. No matter what field the findings are from, we should never believe the results of a single study to be conclusive. Even if there are trends of many studies showing the same thing, we still run into problems of not knowing how many other studies failed. Things like meta-analyses, that look at a group of studies together, try to take this into account in a systematic way. But really it's up to us, as readers of scientific findings (or articles based on those findings) to always remain skeptical.
I will say that this skepticism is something that may be more unique to psychology. Yes, there is a lot of misinformation being spread about things related to health/medicine as well. However, many of the "psychology" articles I see online are based on opinion, beliefs, personal experiences, and so forth. These are not representative of psychology as a science. Even if something seems like "common sense," we are learning all the time that "common sense" things can actually be false. The human mind is definitely good at finding a way to make sense of information it receives, even if that information is wrong or random. Be aware of that when reading things online, be critical, and don't assume it's true just because it seems like it is.
(And yes, that last part is backed up by research)
Having written all of this, here is my main point: the difficulties being highlighted within psychology research are not unique to psychology. These challenges apply to other fields, and researchers are figuring out ways to overcome these challenges. Just because there have been some failed replications does not mean a large portion of psychology research is false, or that we are in a crisis. Instead, it demonstrates the limitations of our past research, and prompts us to continue improving our methodologies.
In the mean time, consider all research with skepticism. Try to challenge the results logically, and don't assume they are true just because a journal published the study. The same should be true of articles not based on research findings. Look at trends across findings, remain skeptical of even those, and always have an attitude of flexibility as new findings come up.
So, what are your thoughts on this topic? Do you have any recommendations for what can help to improve this situation? Let me know in the comments!