top of page

Program Description:

SpellRead is a Halifax structured literacy program. The original founder of Spell Read was Dr. Kay MacPhee, who discovered her son was born profoundly deaf. In 1994, Kay launched SpellRead, a reading intervention program. Today the program is stewarded by Sarah Arnold and the team at Halifax Learning with decades long delivery and research in the science of reading field. HLC has three operating lines: In person student clinic, online student clinic, and in person and virtual teacher training programs. According to its creators “SpellRead is an explicit, intensive, and comprehensive science-based reading intervention that integrates the five essential elements of reading instruction: phonemic awareness, phonics, fluency, vocabulary, and comprehension. The program is divided into three phases. Each session includes phonemic and phonetic activities, language-based reading, and writing. Phase A teaches each of the 44 sounds of the English language. Phase B teaches the secondary vowels and consonant blends, and takes students to the two-syllable level. Phase C teaches the clusters, verb endings and syllabication to a polysyllabic level.”


Previous Reviews:

Evidence for ESSA previously examined the SpellRead program. Evidence for ESSA rated the research as “strong” and calculated a mean effect size of .90, based on a study by (Torgesen, 2007.) WWC also reviewed SpellRead and found the program research to be “promising”. However, they did not calculate an effect size.

Figure 1.  Evidence for ESSA Results.

Spell Read ESSA.png

Analysis Method:

Systematic Search:

Pedagogy Non Grata was sent research by SpellRead directly, as they had contacted Pedagogy Non Grata for the review. Five papers were sent for review. We also conducted a systematic search on their company website and the Education Source data-base. No additional relevant studies were found in our search. In order to accept a study without reservations, we require it to compare a treatment group to a control group, have a minimum of 20 participants, and have sufficient reporting for us to calculate a Cohen’s d or Hedge’s g effect size. Two studies reviewed met this criterion (Rashotte, 2001 & Torgesen, 2007).


Figure 2. Prisma Analysis

SpellRead Prisma.png

Interpreting Study Results:

The first study (Metsala, 2020) calculated the effect size by subtracting the pre-test scores from the post test scores and dividing that difference by the pooled standard deviation, between students. This calculation method was done, because there was no control group for the authors to compare with. Typically, effect sizes are interpreted based on Cohen’s guide, as can be seen in figure 3. However, the (Metsala, 2020) effect sizes are not experimental. In order to interpret these results, we should instead refer to the (Plonskys, 2014) guidelines, which can be seen in figure 4.

Figure 3. Cohen’s guide to interpreting effect sizes.

Cohen's Guide.png

Figure 4. Non-experimental effect size interpretation guidelines.

Plonsky 2014.png

For the second study (Torgesen, 2007), the authors conducted an RCT, with 4 treatment groups and four control groups. The authors calculated their effect sizes with the following formula (Treatment post test standard score – control group post test standard score)/ standardized test SD. In our opinion, this methodology likely deflated the results of the study. For this study, it makes more sense to interpret the effect size based on the Cohen’s guide, as can be found in figure 3. For the third study, it appears that the authors calculated effect sizes, using Hedge’s g. However, the authors used the difference in gains and not the difference in post-test scores. In this situation, the effect size method used might have biased the results slightly in favor of the control group. This study should likely be interpreted using the interpretation guideline, found in figure 3. It is standard practice for Pedagogy Non Grata to verify effect size calculations, for studies being analyzed. However, only the third study had sufficient reporting to re-calculate the effect sizes found. Moreover, for the third study effect sizes had to be calculated using Cohen’s average, as there was insufficient detail reported on sample sizes, by the original author.


Synthesis Methodology:

In order to estimate the magnitude of effect for SpellRead an average effect size was calculated across the two experimental studies (Rashotte, 2001 & Torgesen, 2007). A weighted average was not calculated as there was insufficient sample size reporting to do so. As there were only two studies examined in the synthesis, confidence intervals were not calculated.

Studies Included:


Metsala, 2020

This case study looked at 137 struggling or dyslexic readers, who were taking the SpellRead tutoring program. On average participants were 10 years old and received 120 hours total of instruction, over 6.5 months. Results were measured using the standardized Woodcock Johnson reading test. The authors of this study found a mean effect size of .61 (excluding standard scores), with 95% confidence intervals of [.34, .88]. The outcomes of this study across moderator variables can be found in figure 5.


Figure 5. Metsala 2020 Effect Size Results.

Metsala 2020.png

Torgesen, 2007

This study randomly selected participants from a pool of 729 grade 3 and 5 students. All students were placed in one of four treatment groups: SpellRead, Wilson Word Reading System, Failure Free Reading, and Corrective Reading or the control group. Participants received instruction over 90 hours for approximately, 20 weeks. Progress was evaluated based on the multiple standardized tests, including: TOWRE, AIMSWEB, GRADE, and the Woodcock Johnson. The authors found a mean effect size of .11, with 95% confidence intervals of [.02, .19], the mean outcome across moderator variables can be found in figure 6. The authors also calculated effect sizes, for results 1 year after the original intervention, as can be found in figure 7. For longitudinal results the original authors found a mean effect size of .11, with 95% confidence intervals of [.02, .19]. While these effect sizes were on average negligible, they suggest that the success found, was maintained, over long periods of time.


Figure 6. Torgesen, 2007 Effect Size Results.

Torgesen 2007.png

Figure 7. Torgesen, 2007 Longitudinal Results

Torgesen 2007 Longitudinal.png

Rashotte, 2001:

This study randomly selected students to be apart of one of two groups. In the first group students received 35 hours of instruction via the SpellRead program, in groups of 3-5, over an 8-week time period. In the second group, students received regular classroom-based instruction. The Woodcock Johnson, GORT, and CTOPP standardized tests were used to measure results. The authors reported their own effect sizes as can be seen in figure 8. The authors reported that they used Hedge’s g and it appears they calculated the effect size by using the formula g = (treatment gains – control gains)/weighted, pooled SD. The results of this analysis can be seen in Figure 8. This methodology of effect size calculation can be used to lessen the impact of non-equivalence, when the treatment group started far ahead or behind the control group. However, it runs the risk of biasing the results to the lower group, as they have more potential progress that can be made. In this case, the formula would likely benefit the control group, as the treatment group started ahead of the control group.


Typically, Pedagogy Non Grata effect sizes are calculated as follows d = (treatment post test- control post test)/ pooled SD. However, these effect sizes could not be fully verified, as the authors did not report the sample sizes for each subsection of the treatment and control group. However, the authors did report on both their mean scores and standard deviations. In order to provide some form of verification for the Rashotte, 2001 results. Cohen’s average was calculated. Cohen’s average is calculated as follows: (treatment post test – control post test)/ mean standard deviation. This methodology, is essentially an unweighted effect size and does not control for sample size or deviations between the groups in standard deviations. The results of this analysis can be seen in figure 9. With the original authors effect size calculation methodology, a mean effect size of 1.10 was found. However, with our re-analysis, a mean effect size of .75 was found. Both results would suggest a strong magnitude of effect for the SpellRead program.

Figure 8: Rashotte, 2001 results.

Rashotte 2001.png

Figure 9: Rashotte 2001 Re-Analysis

Synthesis Results:

Figure 10. SpellRead Mean Results

Qualitative Grade: 9/10

The program contains the following essential types of instruction: explicit, individualized, systematic phonics, phonemic awareness, vocabulary, fluency, spelling, and comprehension.

Final Grade: A Two experimental studies showed a mean effect size of above .40, on standardized tests.  

The program is research based. However, there are no mean experimental effect sizes showing results above .30.


Written by: Nathaniel Hansford

Last edited on 2023-09-28




Evidence for ESSA. (n.d). SpellRead. John Hopkins University.


IES. (n.d). SpellRead. What Works Clearing House.

Metsala, J & David, M. (2020). Improving English reading fluency and comprehension for children with reading fluency disabilities. Wiley. DOI: 10.1002/dys.1695

Torgesen, J,. Schirm,A,. Castner,L,. Vartivarian,S,. Mansfield,W,. Myers, D,. Stancavage, F,. Durno, D,. Javorsky, R & Haan, C. (2007). Volume II: Closing the reading gap: Findings from a randomized trial of four reading interventions for striving readers. University of Florida Language Institute.


Rashotte, C. A., MacPhee, K., & Torgesen, J. K. (2001). The Effectiveness of a Group Reading Instruction Program with Poor Readers in Multiple Grades. Learning Disability Quarterly, 24(2), 119-134.

bottom of page