Morphology Instruction: A Secondary Meta-Analysis


Over the past decade, we have seen a rise in evidence-based literacy instruction, driven by the dyslexia advocacy community and focused on phonics instruction. It might be fair to say that this movement in many ways characterizes themselves as outsiders, or rebels, revolting against mainstream Whole Language instruction, for the welfare of dyslexic children. However, recently, I have noticed a new growing sect, within this movement, of parents and teachers who reject both traditional Whole Language Instruction and Phonics instruction, who instead believe that morphology instruction coupled with phonological instruction is the best way to help struggling readers learn how to read. 


I wanted to review the topic of morphological instruction and to address the question of whether or not morphological instruction (MI) is evidence-based. I recently interviewed Dr. Peter Bowers on the topic and I believed that he gave an overwhelmingly compelling argument that in “English orthography, morphology provides the primary organizing structure, as phonemes are written with graphemes, and graphemes only happen within morphemes.” I do want to clarify that while Dr. Bowers argues that morphology provides the primary organizing structure, he does not argue for a morphology-only approach, rather he argues that both morphology and phonology are inextricably linked. Indeed he states that “In theory, isolated phonics could teach the available graphemes for English phonemes. But without referencing morphology or Etymology, there is no way to teach which grapheme to pick out of a set of possibilities.”

Despite finding Dr. Bowers’s conceptual argument compelling, I was originally quite skeptical of whether or not MI was practical enough of an instructional strategy to actually be efficacious in the classroom. Especially, because the mainstream consensus appears to be that a phonics-only approach is the best approach to literacy instruction. And while I will explore the research in depth below, having now read this research, I think that it is hard to argue that the current meta-analysis evidence does not suggest that MI is a high yield strategy.

I also wanted to examine the question of if MI or phonological instruction (PI), was better for language instruction. In part, because some of the critics of Dr. Pete Bowers, have argued that there is no need for MI because PI is superior. However, to be clear, it is not the position of Dr. Pete Bowers, or myself (having concluded this research) that MI should be done in isolation.  My starting research hypothesis and bias were that PI would be superior for earlier grades and that MI would be superior for later grades. However, the evidence I examined for this article did not confirm my bias or hypothesis. That being said, I do not think the evidence is sufficient at this time to fully answer this question. When discussing this literature, one thing that has to be taken into consideration is how much more time, research, and expertise has gone into developing phonics instruction. The results of this secondary meta-analysis, generally show phonics and morphology instruction to have equivalent results, but morphology instruction is in many ways in its infancy of development. While I do not think there are enough studies on this topic, that should not be used as an excuse to dismiss the legitimacy of the topic, rather it should spur more research. 


Personally, I would like to see multiple studies which compare MI to PI and then a meta-analysis of those studies. As to the best of my knowledge, no such meta-analysis currently exists and, as there are few experimental studies with this design, I decided to do a review of multiple meta-analyses which were separately focused on phonics and MI, and compare the results, to give us some ability to compare the efficacy of these two different instructional methods. However, I do want to stress that there are very few studies directly comparing these two strategies.

The results of my analysis showed phonics being superior overall by a statistically insignificant amount. However, it also showed that morphological instruction might be superior in specific circumstances. That being said, I am not comfortable at this moment saying that either instructional method is superior, for several reasons. Firstly, the statistical difference between the overall efficacy of each instructional method was similar. Secondly, the meta-analyses looked at were very different, each having some unique strengths and weaknesses. Thirdly, none of the papers included in these meta-analyses were specifically comparing MI to PI. While I do think we can compare pedagogical methods using this approach, I think it is harder, when the overall results are similar. Had the statistical differences been greater, I think I would have been more willing to make a judgement call.

A Discussion of the Morphology Instruction Papers Included:

In terms of morphology, I was able to find four applicable meta-analyses. The first meta-analysis was done by Dr. Peter Bowers, John Kirby, and Helen Deacon, in 2010 for The Review of Educational Research. The second meta-analysis was done by Debrah Reed in 2008, for the Journal of Learning Disabilities Research & Practice. The third meta-analysis was done by Dr. Goodwin, et al in 2010, for Vanderbilt University. Lastly, the fourth meta-analysis was done in 2020 by Galuschka, et al, for the journal of Educational Psychologists. Within my analysis, I have included the data of all papers separately, as I felt they all had unique strengths and weaknesses.

Reed, et al:

The 2008 paper by Reed looked at 7 studies and was significantly underpowered in comparison to the other papers. However, I will say I did like some of the inclusion criteria. Specifically, the Reed paper excluded all papers that were not about English language instruction, whereas the Bowers paper did include 4 (out of 22) studies that were not specifically on English language instruction. Moreover, I liked that the Reed paper, unlike all other meta-analyses done, excluded studies in which the experimental group did not do almost exclusively MI.Comparatively, the Bowers paper only excluded papers, in which MI was less than 30% of total instructional time. I think this could be interpreted as a strength of the Reed paper, as we can be more sure that the results can be directly attributed to MI. Whereas, I think when we are looking at the other three meta-analyses, it is possible, that we might not really be looking at the benefit of MI specifically, but rather of adding MI to normal instruction. That is not to say that these studies are not valuable. However, the Reed paper does give us some very specific insights.

That being said, the Reed paper cannot be used to make any conclusions in regards to emerging readers, while it had Kindergarten to grade 12 in its inclusion criteria, it only actually included studies between grades 3-8. This is particularly problematic, as the NRP paper showed that phonetic instruction is only really effective prior to grade 3. It is therefore not entirely fair to compare the effect sizes found for PI in other meta-studies to the effect sizes found for MI in the 2008 Reed paper. The 2010 Bowers paper and the 2010 Goodwin paper were much more useful in this regard, as the Bowers paper included 6 papers with experiment groups below grade 3 and the Goodwin paper included 13 studies with students in grade 3 or lower. Tangentially, before I conducted this analysis, I predicted that PI would show better results for emerging readers than MI. However, the effect size for MI in primary grades was substantially higher than the effect size for PI in primary grades, within the Bowers paper.

One problem with the 2008 Read paper was in how the effect sizes were calculated. In a true meta-analysis the raw data is pooled and the effect sizes are recalculated to weight for sample size. When the raw data is unavailable one could take the mean effect sizes, however, this does not weigh for sample size. But in the 2008 paper, the author instead provides the range of effect sizes. While perhaps one could make the argument that this is a more fair way to display effect sizes, it makes it very difficult to compare the effect sizes of her paper to other meta-analyses. For this reason, I went through the individual results tabulated in her paper and manually calculated the mean effect size for each factor. 


Bowers, et al:

The 2010 Bower paper on the other hand did provide much more detailed effect sizes, as well as more information about how it calculated those effect sizes, which makes it more easily interpreted. One interesting thing, which the 2010 Bower paper did was it tabulated the overall effect size and the effect size of studies that included alternative/equivalent training for the control groups. This means that if the experiment group received additional training in MI, the control group received training in a different pedagogical method for an equivalent period of time. This is done to ensure that the results of the experiment are for the results of the specific intervention, and not just of an intervention, in other words, it controls for the effect of placebo intervention. 


Most of the effect sizes for the factors that were calculated only with studies that used this experiment design were not statistically significant, with two important exceptions. The alternative training (AT) morphological instruction effect size for pre-kindergarten to grade 2 was 1.25, which is very significant and significantly higher than the NRP phonics effect size, for this age group. Interestingly, the AT morphological instruction effect size for less able students was also 1.25. This effect size is very significant and much higher than the NRP phonics effect size for students with reading disabilities. However, I’m not sure that this is a fair comparison, as the definitions are similar but not the same. That being said, the Bowers paper suggests possibly much higher effect sizes for MI, both for primary students, and struggling readers, which are the two types of students that PI is most supposed to help.

One thing to consider when looking at these AT effect sizes, is that studies with more rigorous experiment designs almost always have lower effect sizes than studies with less rigorous experiment designs, so it might be unfair to compare these effect sizes with other meta-analysis data that did not also have the same level of rigor. 


Galuschka et al: 

The Galuschka paper looked at 28 studies, of which 19 studies were RCT and 10 were not; however, all studies had a control group. Interestingly his paper specifically focused on students with dyslexia. While his inclusion criteria included earlier grades, he does state that the majority of included studies were not primary age students, which makes it hard to compare to phonics instruction. Interestingly, Galuschka looks at both phonics and morphology. One weakness with this paper was not only did it look at languages other than English, but the largest effect sizes also came from studies done on German instruction, not English instruction.


Goodwin, et al:

The Goodwin paper looked at 27 different studies. All studies had a control group. All studies were done in English. All studies were published after 1980. Goodwin’s paper included the most recent data and included the most papers on students in grade 3 or lower, with 13 primary-aged studies. All of the Goodwin effect sizes were calculated with a Hedges g effect size calculation.

A Discussion of the Phonics Instruction Meta-Analysis Data:


For several of my phonics effect sizes, I used John Hattie’s research. I personally love John Hattie’s research, because it takes such a large sample size. For example, the overall phonics effect size I provided in this analysis is from John Hattie, and it is based on 1102 studies. Comparatively, we only have 22 studies in the 2010 Bowers study, and 7 studies in the 2008 Reed study. That being said, Hattie has often been criticized for how he does his meta-analysis. He is not recalculating the raw data, as I discussed earlier, but taking the mean effect size of all experiments done on the topic. This provides us with a very representative sample; however, it also can make unfair comparisons. The effect sizes being averaged out often used different types of effect size calculations used different age groups, and used different experiment designs. This can be problematic, as these comparisons are not always reasonable. For example, it would be unfair to compare a RCT study design to a simple pre-test post-test design, as these different types of experiment designs usually yield very different results

. However, Hattie’s results are usually similar to those of other more specific meta-analyses; indeed his overall phonics effect size is actually lower than the effect size found in the NRP paper. Likely, because it includes both the most rigorous and least rigorous study designs, the outlier effect sizes on either spectrum end up cancelling each other out. While I do not think his research can be used as a sole source to determine efficacy, I think it is a phenomenal starting place for getting a generalized understanding of the efficacy of an intervention.

For all other phonics effect sizes, I either cited the National Reading Panel (NRP) meta-analysis of phonics instruction or, Dr. Linnea’s secondary meta-analysis of the NRP paper. This paper has been criticized by Dr. Jeffrey Bowers for both some of the studies that it included and excluded. Rather than try to discuss the specific criticism of this paper in length, I will list Dr. Bowers’s paper critiquing it, in the references. However, I will note most of the effect sizes listed in the NRP paper were lower than the effect sizes listed by Dr. Pete Bowers in the 2010 paper. It is also important to note that the NRP paper is the largest paper ever written on the topic, and is often cited as the gold standard, on the topic.

Final Thoughts

While overall, the effect size found by Hattie and the NRP paper for phonics and analytic phonics is higher than the effect size for morphology instruction found by Dr. Bowers, Dr. Goodwin, Dr. Galuschka, and Dr. Reed, some of the more specific effect sizes found in the meta-analysis paper, specifically in regards to the effect of MI on less able students, and primary students are exceptionally impressive, when compared to the equivalent effect sizes for phonics instruction, according to the NRP paper. Moreover, the differences found between the overall effect size for phonics instruction, and morphology instruction are not really all that statistically significant. That being said, it is hard to draw firm conclusions, when we consider the differences in statistical power. The phonics effect sizes are drawing from literally over a thousand studies, whereas the morphology instruction meta-analyses are drawing from less than 30 studies. Moreover, the most impressive morphology instruction effect sizes are drawn from even less studies. For example, the largest effect size found came from only 6 of these studies.

Ultimately, while I think it is difficult to conclude whether or not morphological instruction is better than phonics instruction, I think it is clear that morphological instruction is a high yield strategy, is at least equivalent to phonics instruction in terms of efficacy, and may potentially be better, for early instruction, and for Reading Disabled students. That being said, considering the current evidence, I think the current best practice recommendations should likely be to include both phonological and morphological instruction in the primary grades, for reading disabled students, and that morphological instruction should likely continue longer than phonological instruction. However, morphological instruction, similar to phonological instruction, appears less useful in later grades, likely as students are more fluent readers by these ages. That all being said, the jury is clearly not out and we need further studies done on morphological instruction, comparing morphological instruction with phonics instruction, and comparing phonics instruction with morphological instruction paired with phonological instruction. 


Written by Nathaniel Hansford

Special thanks to the brilliant Dr. Pete Bowers, who not only reviewed this article, but helped provide additional research for it. 


Bowers, Peter & Kirby, John & Deacon, Hélène. (2010). The Effects of Morphological Instruction on Literacy Skills: A Systematic Review of the Literature. Review of Educational Research - REV EDUC RES. 80. 144-179. 10.3102/0034654309359353.

Reed, Deborah. (2008). A Synthesis of Morphology Interventions and Effects on Reading Outcomes for Students in Grades K–12. Learning Disabilities Research & Practice. 23. 36 - 49. 10.1111/j.1540-5826.2007.00261.x.

Bowers, J.S. Reconsidering the Evidence That Systematic Phonics Is More Effective Than Alternative Methods of Reading Instruction. Educ Psychol Rev 32, 681–705 (2020).

Linnea, et al. (2001). Systematic Phonics Instruction Helps Students Learn to Read: Evidence From the National Reading Panel’s Meta-Analysis. Ontario Institute for Studies in Education. Retrieved from <>. 

J, Hattie. (2021). Visible Learning Metax. Retrieved from <>.