Skip to content Skip to navigation

Too unreliable: Assessing a teacher's value

Linda Darling-Hammond

This piece is extracted from the New York Times' Room for Debate, featuring the voices of eight authors on the merit of evaluating teachers based on value-added tests. The authors include Linda Darling-Hammond; math teacher Vern Williams; Kevin Carey, from the Education Sector; Lance T. Izumi, Pacific Research Institute; Education Trust's Amy Wilkins; author Diane Ravitch; Marcus Winters of the Manhattan Institute; and Jesse Rothstein, from U.C. Berkeley.

Darling-Hammond's piece is below. The full discussion can be found on the New York Times web site.

 

Too Unreliable

Teacher evaluation was a fly-by operation when I was a high school English teacher 30 years ago, and it has improved little in most districts since. So I understand why there is such enthusiasm for evaluating teachers based on their students' test score gains, now that such data are available.

Evaluating and rewarding teachers primarily on the basis of state test score gains creates disincentives for teachers to take on struggling students. 

Unfortunately, as useful as new value-added assessments are for large-scale research, studies repeatedly show that these measures are highly unstable for individual teachers. Among teachers who rank lowest in one year, fewer than a third remain at the bottom the next year, while just as many move to the top half. The top rankings are equally unstable. In fact, less than 20 percent of the variance in teachers' effectiveness ratings is predicted by their ratings the year before. This is why the National Research Council has said that this evaluation system "should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable."

The reasons are simple. Test score gains are caused by many variables in addition to the teacher: students' learning and language background, attendance, supports at home, previous and current teachers, tutors, curriculum materials, class sizes and other school resources. Out-of-school time matters too. Summer learning loss accounts for more than half the achievement differential between high- and low-income students. Thus, researchers have found that the very same teacher looks more "effective" when she is teaching more advantaged students -- and less effective when she teaches more students who are low-income, new English learners, or who have special education needs. 

Tragically, evaluating and rewarding teachers primarily on the basis of state test score gains creates disincentives for teachers to take on struggling students, just as accountability systems that rate doctors on their patients' mortality rates have caused surgeons to turn away patients who are very ill. While scores may play a role in teacher evaluation, they need to be viewed in context, along with other evidence of the teacher's practice.

Better systems exist -- like the career ladder evaluations in Denver and Rochester, the Teacher Advancement Program and the rigorous performance assessments used for National Board Certification, all of which link evidence of student learning to what teachers do in teaching curriculum to specific students. These systems also help teachers improve their practice -- accomplishing what evaluation, ultimately, should be designed to do.