Title: Is this _____ evidence-based? Rethinking how we evaluate evidence
Authors: Nathaniel Hansford & Rachel Schechter, Ph.D.
There is a need to improve how scientific research in education is reported, especially for the purpose of informing teacher practices. Practices are often labeled as either evidence-based (research about this particular practice’s effectiveness) or evidence-informed (also described as research-based, using other related research to inform practices). However, there are many factors that affect how rigorous the evidence is for an individual practice, including research quality, quantity, and magnitude of the impact. All of these factors should be considered when examining the evidence base for a program or instructional practice. Evaluation of a set of studies should be viewed on a continuum using these factors rather than an either/or binary review. For example, we have strong evidence that phonics instruction works, in terms of research magnitude, quality, and quantity. Comparatively, we have limited, but high-quality research on sound articulation training, which has shown mixed results. Building their evaluation skills will help educators increase their confidence in navigating the research and using data-driven decision-making for their students’ instructional plans.
Dr. John Hattie brought the idea of evidence-based learning to the forefront when he wrote his book Visible Learning in 2008. John Hattie focused on analyzing the outcomes of meta-analyses, by their magnitude of effect. By doing this he showed what pedagogical factors had the highest impact, across a multitude of studies. He hypothesized that factors with a low magnitude of effect were not evidence-based and that factors with a high magnitude of effect were. By only looking at meta-analyses and their impact, he controlled for two important research factors: study quantity and impact. However, John Hattie’s work has been criticized for a multitude of reasons, perhaps most vociferously by the late Dr. Robert Slavin, who believed that Hattie was ignoring the factor of study quality. This is important because low-quality studies tend to show a higher magnitude of effects.
In response, Dr. Slavin’s Center for Research and Reform in Education modeled research quality criteria off of guidelines from the What Works Clearinghouse to help bring information about quality to educators. These research clearinghouses (check out this article to learn a little more about these and others) designate what pedagogical factors and programs are evidence-based, by first and foremost focusing on study quality. Within their model, something is often seen as evidence-based if it has a really high-quality study showing a positive outcome. This model is problematic for three reasons.
Firstly, the vast majority of studies show a positive effect, because studies that don’t show positive effects are rarely published. For example, Fountas and Pinnell’s Leveled Literacy Intervention (LLI) have two Tier 1 ESSA studies that show a mean effect size of .13. (https://www.evidenceforessa.org/programs/reading/fountas-pinnell-leveled-literacy-intervention-lli) However, Cohen’s guide for interpreting effect sizes indicates that anything below .20 is so low it’s statistically negligible. Meaning, that high-quality studies show that LLI had no meaningful impact.
Secondly, the majority of studies in education are qualitative or case studies (no control groups) that do not meet the design requirements for high-quality scientific research, which completely excludes them from this model. Most meta-analyses already correct for this problem by excluding studies that do not have control groups. However, ESSA and WWC take this a step further, by excluding still most of the remaining studies that do have control groups due to design problems. While this model does mean that the resulting research is of very high quality (a noble idea), it also means we have excluded far more than 90% of the scientific research as irrelevant.
Lastly, this ESSA/WWC model of study is very expensive. This means the vast majority of companies are excluded from being allowed to demonstrate the efficacy of their products via scientific research. And only the most wealthy companies can afford to execute a study to meet the WWC guidelines.
While John Hattie and Robert Slavin both produced interesting models for interpreting if something is evidence-based, they are not the only ones reporting within this field. Newspapers and magazines also often report on science and attempt to label pedagogies as evidence-based or not evidence-based. However, newspapers and magazines often report on the latest study, regardless of study quality, quantity, or impact. This leaves the false impression that scientists are constantly flip-flopping, which can hurt the public’s relationship with the research community. (If you want a perfect example of this, google search for newspaper articles claiming milk is healthy or unhealthy.)
Frequently pedagogies and pedagogical programs are labeled as evidence-based or not, with a dichotomous approach that considers the concept via a binary lens. For something to be truly considered evidence-based, with great certainty, we would likely need to consider three primary research factors: quantity, quality, and impact of the research. Meaning, ideally, there should be many studies, systematically examined/replicated, of high quality, showing a high level of impact. Truthfully, there are very few types of instruction in education research that meet all of these criteria, some are:: phonemic awareness, phonics, repeated reading, morphology, vocabulary, comprehension, and explicit instruction.
For most other pedagogical constructs, we tend to be missing some of the above three factors. And truthfully the more specific of pedagogy or pedagogical idea, we examine, the less likely this research will meet all three criteria. For example, as discussed here: ( https://www.teachingbyscience.com/a-meta-analysis-of-language-programs ) language programs based on the principles of structured literacy seem to (on average, not always) outperform language programs based on the theories of balanced literacy. However, there are multiple characteristics that separate balanced literacy programs from structured literacy programs. Structured literacy programs usually use decodable texts over leveled texts, place a greater emphasis on phonics/PA instruction, include more multi-sensory instruction, and don’t use three-cueing instruction.
While the data show that a structured literacy approach leads to more positive outcomes, there is not enough research to pinpoint which components contributed to those outcomes. For example, to the best of our knowledge, there are only 6 studies with control groups on the topic of decodable texts (for more information, click here: https://www.teachingbyscience.com/decodables). Moreover, only 2 of these studies had a particularly good design and they both showed a low (but positive magnitude of effect) To the question, “are decodables SOR?” it is unfair to say yes or no. There is a strong theoretical argument for decodable texts. We also have strong indirect evidence for their use, given that programs that include them tend to produce higher research outcomes. Therefore, it might make better sense to evaluate the scientific validity of decodable texts, not based on a binary model of evidence-based or not, but rather on a continuum of overall research strength. Unfortunately, most constructs in education research do not have a vast preponderance of high-quality, high-magnitude research, so it makes the most sense to evaluate most ideas on a continuum rather than in a binary.
This really brings us to our central point. Educators and decision-makers need to boost their research skills to increase their ability to analyze, interpret, and evaluate scientific research. The evidence-based or not evidence-based model is simple, it’s easier for marketing, and it’s easier to understand. But it’s also less scientifically valid. The goal of Pedagogy Non-Grata has always been to communicate scientific research to teachers in an accessible way. However, for anyone who is a science communicator, striving to make science accessible there is always a trade-off between scientific validity and clarity and/or applicability for teachers. Truthfully, the easier a science article is to read, the more likely it becomes that the article has overly simplified the science part and removed necessary nuance. This was certainly true of early Pedagogy Non-Grata articles.
For those trying to communicate or discern scientific research for the general public, we would advocate for including a discussion of the quantity, quality, and impact of research. While we still think it’s more than fair to call some pedagogies evidence-based (such as phonics, repeated reading, and morphology), we think that most factors should be classified in a more nuanced way, for example, There is strong theoretical evidence for decodable books, strong indirect evidence for decodable books, and a small number of direct studies that have demonstrated a small but positive effect.
This model for science discourse is not too complicated for the average person. Science reporting should share key findings and build an understanding of the process of research. The alternative is a sage on the stage model: the intelligent and charismatic leader who benevolently communicates science to the public. However, this model is problematic, because the scholars who are successful in promoting themselves as experts, often tend to excel in marketing, not communicating science with fidelity and integrity.
Last Edited 11/21/2022