Whether or not to publish rankings will inevitably become more of an issue in states that receive a waiver from some of the most onerous requirements of the federal No Child Left Behind law. As a condition for receiving a waiver, states will be required to establish teacher evaluation systems that take into account “student growth” in every school and district. California is still considering applying for such a waiver.
So far, the Los Angeles Times rankings — which the newspaper devised on its own with the help of a Rand Corporation researcher — are the only ones to have been been published in California.
As a result of requests from multiple media organizations under New York’s Freedom of Information Law, and a failed lawsuit filed by unions to block their release, on February 24th New York city schools released “Teacher Data Reports” for 18,000 4th through 8th grade teachers. Included were so-called “performance rankings” for each teacher.
The release marked a 180 degree reversal of earlier written assurances by senior school officials in 2008 that the rankings would not be released when teachers agreed to participate in the “value-added”ranking system.
But even some vigorous proponents of linking teacher evaluations to test scores are unhappy about making rankings public, especially those based only on test scores.
Last month, Bill Gates came out strongly against the practice.
Publicly ranking teachers by name will not help them get better at their jobs or improve student learning,” Gates wrote in an op-ed article in the Washington Post.” On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.
Developing a systematic way to help teachers get better is the most powerful idea in education today. The surest way to weaken it is to twist it into a capricious exercise in public shaming. Let’s focus on creating a personnel system that truly helps teachers improve.
His foundation has invested heavily in its Measures of Effective Teaching project to come up with comprehensive evaluation systems, which in addition to test scores are supposed to include other factors, such as videotapes of teachers in the classroom, surveys of students to get feedback on their teachers, and surveys of teachers about working conditions in their schools — none of which were present in the New York City evaluations.
Gates wrote the article just days before the release of the rankings in New York, based on the still controversial “value-added” methodology which is supposed to predict student performance after taking into account a range of student and other characteristics. In an op-ed piece in the New York Post, Schools Chancellor Dennis Walcott also attempted to head off the misuse of the rankings by the media.
Teacher Data Reports were created primarily as a tool to help teachers improve, and not to be used in isolation.
I’m deeply concerned that some of our hardworking teachers might be denigrated in the media based on this information. That would be inexcusable. Ultimately, each news organization will make its own choices about how to proceed, and this may result in teacher names appearing in the paper or on media websites.
Although we can’t control how reporters use this information, we will work hard to make sure parents and the public understand how to interpret the Teacher Data Reports.
I hope news organizations will report on the data responsibly and treat our teachers with respect.
Walcott went on to say that:
The most important thing you should know is that these reports don’t tell the full story about a teacher’s performance. They provide one important perspective on how well teachers were doing their most important job — helping students learn — using a method called “value-added” that has been found to predict a teacher’s future success better than any other technique.
But the reports were never intended to judge a teacher’s overall success in the classroom. No single measure can do that — whether it’s value-added data, the results of a classroom observation or anything else.
Most, but not , all of the media heeded Walcott’s entreaties. Within 24 hours the New York Post had singled out the “worst teachers” in New York, focusing specifically on 37-year-old Pauline Mauclair, a 7th grade English as a Second Language teacher in Queens — just the public shaming that Gates had opined against.
To write their story, reporters raced to Mauclair’s parents’ home to try to find her. They then reached Mauclair at her home in what one online post described as a “private housing development,” and rang her doorbell repeatedly. She declined to talk with to the reporters, and according to some online reports had to call the police twice to have the reporters removed.
An article appeared the next day with the headline “They’re doing zero, zilch, zippo, for students” focusing on the 16 teachers with the lowest value-added rankings on the list. Referring to Mauclair, the first sentence read, “When it comes to teaching math, she’s a zero.” This was followed by a Sunday story that described Mauclair as “the city’s worst teacher,” along with a photograph taken from a school yearbook.
The newspaper’s characterization, based on her lowly value-added ranking, was diametrically opposite to how her principal viewed her actual performance. “I would put my own children in her class,” she said.
As Stanford education professor Linda Darling Hammond, recently appointed to the California Teacher Credentialing Commission by Gov. Brown, explained in an article in Education Week:
Mauclair is an experienced and much-admired English-as-a-second-language teacher. She works with new immigrant students who do not yet speak English at one of the city’s strongest elementary schools. Her school, PS 11, received an A from the city’s rating system and is led by one of the city’s most respected principals, Anna Efkarpides, who declares Mauclair an excellent teacher.
Discrepancies like these between what the rankings purportedly show, and the experience of parents and colleagues of a particular teacher, have a bearing on whether the rankings are reliable measures of teachers’ performance, which in turn should be considered when deciding whether they should or should not be published in the first place.
The rankings were designed “to show how much progress individual teachers helped students make in reading and math over the course of a year,” explained New York school administrators on the district’s website. After taking into account a range of characteristics about the student and the school, the value-added methodology is supposed to be able to predict how a particular student should do the following year.
The New York rankings did take into account a number of student characteristics, such as their racial or ethnic background, English learner status, whether they had attended summer school, been retained, or were in special education. The methodology also took into account a number of “classroom characteristics” such as the percentage of low-income students, the number of absences, the number of students retained a grade, and the percentage of new students to the school.
But a paper published in late February and written by several of California’s leading education researchers, including Stanford’s Darling Hammond and her colleague Ed Haertel, contributed to the already substantial critiques of the methodology:
Current research suggests that value-added ratings are not sufficiently reliable or valid to support high-stakes, individual-level decisions about teachers. Other tools for teacher evaluation have shown greater success in measuring and improving teaching, especially those that examine teachers’ practices in relation to professional standards.
As Darling-Hammond and her fellow researchers noted in the Phi Delta Kappan article, a teacher’s effectiveness is determined by numerous school and non-school factors that a “value-added” analysis typically don’t, or can’t, take into account.
Some of California’s leading supporters of linking how students do in the classroom to teacher evaluations expressed doubts about media publication of rankings like those in New York and Los Angeles. Last November, EdVoice, a Sacramento-based advocacy organization, along with several others, filed suit against the Los Angeles Unified School District, to force it to implement teachers evaluations tied to measures of student academic “growth.” The lawsuit alleges the district is violating state law (the little known 1971 Stull Act) by not tying teacher evaluations to measures of “student growth.”
But Bill Lucia, EdVoice’s executive director, said just publishing how teachers ranked based on the test scores of their students represents “an incomplete measure’ of a teacher’s abilities. “It is certainly not clear what the purpose would be to provide one data element in an evaluation,” he said.
So far, the message from the Obama administration as to whether this practice is appropriate or helpful to the cause of improving teacher effectiveness remains fuzzy.
When the Los Angeles Times released its rankings, also using a value-added methodology which is supposed to take into account the influence of a student’s background on his or her academic performance, Secretary of Education Arne Duncan applauded the move, saying of teachers “What’s there to hide?”
The truth is always hard to swallow, but it can only make us better, stronger and smarter,” he asserted in a speech in Little Rock, Ark. shorty after the Los Angeles release. “That’s what accountability is all about — facing the truth and taking responsibility.
He backtracked later, saying that teacher evaluations should be based on “multiple measures” but, nonetheless, how the data was released was a “local decision.” He has yet to take a clear position against the publication of value-added rankings.
But Marshall Smith, former senior counselor to Duncan, and undersecretary of education during the Clinton administration, said If you want teacher evaluation, it should be aimed at getting teacher to improve, not to embarrass people. It should be to give them support. If you have a teacher who is real disaster in a classroom, and the principal hasn’t noticed, then that principal needs to be fired.
Smith said the rankings should be part of a teacher’s formal evaluation. By so doing, districts would not be allowed to release them, because they would be in their personnel files.
Michelle Rhee, the former chancellor of the Washington D.C. public schools, has made new teacher evaluations a major part of the reform agenda of her Sacramento-based StudentsFirst organization. Rhee believes that 50 percent of a teacher’s evaluation should be based on student test scores, using a “value-added” methodology.
In San Jose last week, she argued for having report cards for teachers “so parents can understand the strengths and possible weaknesses of their child’s teachers.”
But she disagreed with making public just one piece of the evaluation, as was done in New York. Like Smith, she said the goal should be to help teachers to improve.
A teacher should be evaluated based on multiple measures, and value-added gains by their children are one important aspect to a teacher’s evaluation, but it doesn’t necessarily make sense to make public just one piece of the puzzle without giving parents and the public a broader context for the performance of that teacher.
Accountability is important, and teachers should want to know how effective they are in getting gains in student performance. That information is important to have and to use. If we frame how it is going to be used punitively, it distracts us from the conversation that we really need to be having, that it is very valuable information for people to have to become better professionals.
In New York, the teacher ratings system continues to stir controversy — and complex responses.
“These ratings should never have been made public,” said Diana Agosta, who teaches English as a Second Language teacher at the Pablo Neruda Academy, a small public high school in the Bronx. She did not get a rating herself, because only 3rd through 8th graders were subjected to this purportedly pilot study. Agosta, who also has a Ph.D. in anthropology, said that the rankings were “very, very flawed.
She has some empathy for Pasquale Mauclaire, the teacher singled out by the New York Post who was also an ESL teacher.
She noted that immigrant students are supposed to take state tests — which in turn will determine the ranking their teachers will get within a year of coming to the United States — when they may not even have been literate in their home country. Another complication is that students can take the test in Spanish, but students from any other language background have to take it in English, using a glossary of words in their own language. She said New York’s ranking system doesn’t take into account these and numerous other factors.
But she also said that if a way was devised to come up “reliable” teacher ratings they should be made public. ”As a parent, I want to know as much about my child’s school and teachers,” she said. But “it can’t just be something administrators have come up with just to have a number” to rank teachers.
Going deeper
What are value-added ratings? Here is an explanation from the Hechinger Report:
Value-added models use complex mathematics to predict how well a student can be expected to perform on an end-of-the-year test based on several characteristics, such as the student’s attendance and past performance on tests. Teachers with students who take standardized math and English tests (usually fewer than half of the total number of teachers in a district) are held accountable for getting students to reach this mark. If a teacher’s students, on average, fall short of their predicted test-scores, the teacher is generally labeled ineffective, whereas if they do as well as or better than anticipated, the teacher is deemed effective or highly effective.
Evaluating Teacher Evaluation, by Linda Darling Hammon, Audre Amrein-Beardsley, Edward Haertel, and Jesse Rothstein, Phi Delta Kappan, March 2012.