The Implications of Fluency Form Effects in Organisational Policy and Planning

Oral reading fluency is increasingly relied upon to assess literacy in primary grades. Oral reading fluency is typically measured using ‘correct words per minute’ (cwpm), which assesses how quickly and accurately a student can read a passage. In fluency assessments, an assessor listens to a student read a passage aloud for one minute, and subtracts any skipped or incorrect words from the total number of words read by the student.

Oral reading fluency assessments are useful for a variety of reasons. First, they are fast and easy to administer. An invigilator can test one pupil every two to three minutes. Second, fluency scores are highly correlated with a range of other outcomes of interest, including reading comprehension (which can be more challenging and time-consuming to measure). Third, they can be compared against fluency outcomes in other contexts, since cwpm is often used to evaluate programmes and education systems. Although as we will see below, these comparisons should be approached with caution. Fourth, and perhaps most importantly, they are tangible and easy to understand. Policymakers, researchers, teachers, parents, and pupils can easily envision what a score means in real life. Reading 3 words per minute is very slow while reading 200 words per minute is very fast.

Fluency assessments are also playing a significant role in efforts to contextualise learning outcomes across country contexts. Patrinos and Angrist (2018) use oral reading fluency, along with reading comprehension, in order to calculate harmonised learning outcomes for an expanded group of countries, including developing countries less likely to administer assessments like TIMMS or PISA. These efforts are even more relevant when we consider that Ministries of Education (Kenya and Liberia, for instance) are increasingly using fluency benchmarks to establish literacy targets.

Passage Fluency and Form Effects

Chaparro et al. (2017) explore passage equivalency across three different Primary 2 passages administered to 157 pupils. They find large and significant form effects (that is, pupils score differently on different passages due to variations across passages). These differences are as large as 22 cwpm. In other words, a pupil who reads two different grade-level passages might read 25 cwpm on the first passage, and 47 cwpm on the second passage! Francis et al., 2008 find similar results in their evaluation of form effects across 6 different passages. They find variation as large as 26 cwpm across passages of the same difficulty.

There are ways to create equated scaled scores (ESS). DIBELS (p. 89), for example, explains how ESS are generated to account for form effects across different passages. But DIBELS ESS (which are on a scale of 400, with a standard deviation of 40) are far less ‘tangible’ than correct words per minute, which can be easily understood and visualised by a range of stakeholders. In addition, calculating ESS requires pupils to complete multiple passages, which increases the time cost to administer these assessments at scale.

Fluency Assessments at NewGlobe

If form effects persist in a NewGlobe context (as predicted by research in other settings), this has important implications for organisational policy and also for teachers and families who interpret and respond to the results. Form effects might contribute to inaccurate decision-making on course levelling and instructional design. If we rely on a single measurement of fluency, that might be upwardly or downwardly biased depending on the difficulty of the passage. In addition, parents and teachers might interpret fluctuations in fluency scores as negative growth. A standard linear model predicts incremental growth over time. This would no longer be valid if form effects result in variations as large as 20–30 cwpm for a single pupil reading different passages. In practical terms, this means that a parent might see that their child is reading 47 words per minute at term 1 midterm exams, and 32 words per minute at term 1 end term exams. How are they to interpret these results?

Measuring Form Effects at NewGlobe

We find form effects of a similar magnitude compared with Chaparro et al. (2017) and Francis et al. (2008). Between the common Primary 2 passages and between the grade-level passages, we find roughly the same difference: approximately 21 cwpm. These differences were by no means programme-specific. We find similar differences by passage across all three contexts. The smallest observed mean difference was 11 cwpm, and the largest observed mean difference was 22 cwpm.

Why Do These Results Matter?

First, we must be mindful of form effects when framing and communicating the results of fluency assessments to parents and teachers. Parents and teachers understandably expect incremental improvement in a child’s oral reading fluency over time. The ‘noisiness’ of these estimates might cause confusion and concern among parents and teachers, if the child’s score drops unexpectedly, Appropriate framing must be provided to parents and teachers to ensure that they are equipped to interpret, discuss, and respond to the results of fluency assessments.

Second, we should be cautious not to over-interpret oral reading fluency outcomes, especially those collected at a single point in time or those collected at the baseline and end-line of an evaluation. In both cases, these estimates can be noisy, and reporting raw results or raw differences can be misleading. An aggregate of frequent fluency assessments over time (i.e. bi-termly fluency assessments) will produce a much more accurate and reliable estimate. In addition, if possible we should assess fluency at any given time using two or more passages in order to produce a more accurate average score. Finally, equated scaled scores can be used to control for form effects when interpreting and reporting the outcomes.

Finally, special attention must be paid to passage selection (especially if multiple fluency assessments are used during a single evaluation or to measure fluency over time). The criteria for selection cannot simply be that the passages are at the same grade level. Experts must carefully analyse sentence structures, vocabulary, and length to ensure that the passages are as similar as possible, in order to reduce these form effects across passages. In addition, multiple forms should be robustly piloted to determine which link most closely.

Our findings, considered alongside the literature on form effects, paint a clear and consistent narrative. Fluency varies widely across different passages, even passages at the same grade level. Given the extent to which fluency scores are used at various levels of the decision-making process, we must invest in framing these results to consumers of the data, including policymakers, school leaders, teachers, parents, and pupils themselves. This framing must challenge the traditional notion of incremental growth, and emphasise a more holistic interpretation of oral reading fluency, which takes into consideration fluctuations in performance on a given assessment. For instance, we should present both fluency results and also fluency benchmarks as a range, rather than a point estimate, in order to account for the ‘noisiness’ of fluency data. We should also further investigate ways to mitigate these form effects, either through the use of multiple passages during testing, through more frequent testing, through more robust test equating, or through more careful selection of passages. In this way, we can continue to invest in and rely on fluency assessments as an important and accessible indicator of literacy performance.

About the author: Tim Sullivan, Director of Learning Innovation at NewGlobe

Tim Sullivan

Talking Education is a Medium Publication all about progress towards achieving Sustainable Development Goal 4: Education for All.