The Implications of Fluency Form Effects in Organisational Policy and Planning
Oral reading fluency is increasingly relied upon to assess literacy in primary grades. Oral reading fluency is typically measured using ‘correct words per minute’ (cwpm), which assesses how quickly and accurately a student can read a passage. In fluency assessments, an assessor listens to a student read a passage aloud for one minute, and subtracts any skipped or incorrect words from the total number of words read by the student.
Oral reading fluency assessments are useful for a variety of reasons. First, they are fast and easy to administer. An invigilator can test one pupil every two to three minutes. Second, fluency scores are highly correlated with a range of other outcomes of interest, including reading comprehension (which can be more challenging and time-consuming to measure). Third, they can be compared against fluency outcomes in other contexts, since cwpm is often used to evaluate programmes and education systems. Although as we will see below, these comparisons should be approached with caution. Fourth, and perhaps most importantly, they are tangible and easy to understand. Policymakers, researchers, teachers, parents, and pupils can easily envision what a score means in real life. Reading 3 words per minute is very slow while reading 200 words per minute is very fast.
Fluency assessments are also playing a significant role in efforts to contextualise learning outcomes across country contexts. Patrinos and Angrist (2018) use oral reading fluency, along with reading comprehension, in order to calculate harmonised learning outcomes for an expanded group of countries, including developing countries less likely to administer assessments like TIMMS or PISA. These efforts are even more relevant when we consider that Ministries of Education (Kenya and Liberia, for instance) are increasingly using fluency benchmarks to establish literacy targets.
Passage Fluency and Form Effects
Policymakers and operators make important decisions every day based on fluency data (for example, the Tusome programme was evaluated in part using reading fluency, and these results informed the expansion of the programme in Kenyan government schools). There is an underlying assumption that different forms (i.e. passages) of oral reading fluency assessments are equivalent. This assumption allows us to compare oral reading fluency outcomes across contexts or in the same context at different points in time, even if different forms were used. But a recent body of literature on form effects suggests otherwise. This, in turn, calls into question efforts to contextualise fluency outcomes and measure progress over time.
Chaparro et al. (2017) explore passage equivalency across three different Primary 2 passages administered to 157 pupils. They find large and significant form effects (that is, pupils score differently on different passages due to variations across passages). These differences are as large as 22 cwpm. In other words, a pupil who reads two different grade-level passages might read 25 cwpm on the first passage, and 47 cwpm on the second passage! Francis et al., 2008 find similar results in their evaluation of form effects across 6 different passages. They find variation as large as 26 cwpm across passages of the same difficulty.
There are ways to create equated scaled scores (ESS). DIBELS (p. 89), for example, explains how ESS are generated to account for form effects across different passages. But DIBELS ESS (which are on a scale of 400, with a standard deviation of 40) are far less ‘tangible’ than correct words per minute, which can be easily understood and visualised by a range of stakeholders. In addition, calculating ESS requires pupils to complete multiple passages, which increases the time cost to administer these assessments at scale.
Fluency Assessments at NewGlobe
At NewGlobe, all primary teachers administer bi-termly fluency assessments to evaluate oral reading fluency in their classroom. Teachers use a different passage during each exam cycle. Fluency outcomes are used at a system level to appropriately level textbooks and learning materials. Outcomes are also communicated to parents and pupils on termly report cards. In this way, fluency assessments play a foundational role in organisational decision-making and also occupy a central role in school culture and parent engagement.
If form effects persist in a NewGlobe context (as predicted by research in other settings), this has important implications for organisational policy and also for teachers and families who interpret and respond to the results. Form effects might contribute to inaccurate decision-making on course levelling and instructional design. If we rely on a single measurement of fluency, that might be upwardly or downwardly biased depending on the difficulty of the passage. In addition, parents and teachers might interpret fluctuations in fluency scores as negative growth. A standard linear model predicts incremental growth over time. This would no longer be valid if form effects result in variations as large as 20–30 cwpm for a single pupil reading different passages. In practical terms, this means that a parent might see that their child is reading 47 words per minute at term 1 midterm exams, and 32 words per minute at term 1 end term exams. How are they to interpret these results?
Measuring Form Effects at NewGlobe
To better understand form effects in our own context, we set out to collect oral reading fluency data across three different contexts: two government school partnership programmes in two states in Nigeria, and one private community school programme in Kenya. In all three contexts, NewGlobe field officers assessed pupils in Primary 4, Primary 5, and Primary 6 using four DIBELS passages: two Primary 2-level passages (the standard passage level, when the same passage is used to assess pupils from multiple grades) and two grade-level passages. We used this data to determine whether we find form effects when pupils read two different passages of the same reading level?
We find form effects of a similar magnitude compared with Chaparro et al. (2017) and Francis et al. (2008). Between the common Primary 2 passages and between the grade-level passages, we find roughly the same difference: approximately 21 cwpm. These differences were by no means programme-specific. We find similar differences by passage across all three contexts. The smallest observed mean difference was 11 cwpm, and the largest observed mean difference was 22 cwpm.
Why Do These Results Matter?
What are the implications for these findings, both at NewGlobe and for other educational organisations using fluency data to communicate with stakeholders and make decisions?
First, we must be mindful of form effects when framing and communicating the results of fluency assessments to parents and teachers. Parents and teachers understandably expect incremental improvement in a child’s oral reading fluency over time. The ‘noisiness’ of these estimates might cause confusion and concern among parents and teachers, if the child’s score drops unexpectedly, Appropriate framing must be provided to parents and teachers to ensure that they are equipped to interpret, discuss, and respond to the results of fluency assessments.
Second, we should be cautious not to over-interpret oral reading fluency outcomes, especially those collected at a single point in time or those collected at the baseline and end-line of an evaluation. In both cases, these estimates can be noisy, and reporting raw results or raw differences can be misleading. An aggregate of frequent fluency assessments over time (i.e. bi-termly fluency assessments) will produce a much more accurate and reliable estimate. In addition, if possible we should assess fluency at any given time using two or more passages in order to produce a more accurate average score. Finally, equated scaled scores can be used to control for form effects when interpreting and reporting the outcomes.
Finally, special attention must be paid to passage selection (especially if multiple fluency assessments are used during a single evaluation or to measure fluency over time). The criteria for selection cannot simply be that the passages are at the same grade level. Experts must carefully analyse sentence structures, vocabulary, and length to ensure that the passages are as similar as possible, in order to reduce these form effects across passages. In addition, multiple forms should be robustly piloted to determine which link most closely.
Our findings, considered alongside the literature on form effects, paint a clear and consistent narrative. Fluency varies widely across different passages, even passages at the same grade level. Given the extent to which fluency scores are used at various levels of the decision-making process, we must invest in framing these results to consumers of the data, including policymakers, school leaders, teachers, parents, and pupils themselves. This framing must challenge the traditional notion of incremental growth, and emphasise a more holistic interpretation of oral reading fluency, which takes into consideration fluctuations in performance on a given assessment. For instance, we should present both fluency results and also fluency benchmarks as a range, rather than a point estimate, in order to account for the ‘noisiness’ of fluency data. We should also further investigate ways to mitigate these form effects, either through the use of multiple passages during testing, through more frequent testing, through more robust test equating, or through more careful selection of passages. In this way, we can continue to invest in and rely on fluency assessments as an important and accessible indicator of literacy performance.
About the author: Tim Sullivan, Director of Learning Innovation at NewGlobe