Thursday, August 03, 2006

The New Tests Scores Are Here! The New Test Scores Are Here!

Update 9:45 A.M., Friday, August 4th: I see that Gubernatorial candidate John Dendahl has used the Test Scores Card, slamming Big Bill Richardson for the high failure percentage of New Mexico schools. Typical, of course, and obviously if the failure percentage was lower Big Bill would be using large chunks of his millions in campaign funds to gloat about it. I'm not naive enough to think test scores won't be used for political purposes. I just wish some public official or news source would go beyond the microscopically shallow scores bad/scores awful/scores not so awful "analysis", and take a deeper look at the numbers.

The following meandering ditty makes a feeble attempt to do some of that. You'll notice I don't state a strong opinion on whether standardized testing is worthwhile, who is to blame for low scores, or if No Child Left Behind is the mental spawn of an educational Satan or merely a good accountability tool. While I have definite opinions on those subjects (hell, I'm a K-12 teacher), there's plenty of polemics on those topics elsewhere.

Pehaps my naivete is my gut feeling that getting a little deeper into the numbers here might lead to a burgeoning, more thorough dialogue on the subject. I have a dreamy mental picture of Big Bill and John Dendahl at a televised debate arguing about standard deviations and "lower bound confidence intervals". Okay, that's too dreamy...but we number lover guys have dreams, too. Aw heck, let's get on with the ditty.

Standardized Testing.
Where to start? It's one of those overarching, overwhelming, oversomething topics that gets boiled down to far too overly simple in the media. And I can understand that, to an extent.

Amy Miller has a story on how many APS schools "failed" (83), and she does a good job in pinpointing the subtest area responsible for most of the "failures", that of "students with disabilities". You think I use too many quotation marks and parentheses, well wait until you see an "education" post. She also mentions what is to me the initially most startling fact in the report: only one high school (La Cueva) and one middle school (Desert Ridge) came through unscathed.

That means other way-the-Hell-up-in-the-Heights schools like Eisenhower Middle School didn't pass in all areas, particularlly with "students with disabilities". Hoover MS didn't pass. Madison MS didn't pass. Sandia HS didn't pass.

And yes, those facts lead to a certain amount of gleeful laughter by folks living further down the long alluvial fan hill. Especially by teachers at schools whose elevation is not quite 5,200 feet. Which reminds me that even "flighters" escaping to the West Mesa are not immune. LBJ Middle School didn't "pass" either.

I teach at somewhere around 5,000 feet. Why does that matter, you might ask? What does elevation have to do with passing standardized testing? For those to whom the answer is not immediately obvious, I apologize because I'm not going to take the sidetrack to answer that here. I only bring it up: 1. because it's interesting to me from a sociological perspective; 2. it shows one of the approximately 4.5 million intervening variables that get in the way of a truly scientific method for determining the quality of a particular school. I realize 4.5 million is a big number, but I wonder if I'm hyperbolizing.

Which brings me to the test reports themselves. The NM Public Education Department (PED) has a handy-dandy site full of .pdf files for each district (and charter schools). Each school then has its own handy-dandy (and easy to read, btw) single page outlining school status (pass or "Meets AYP", or fail "AYP Not Met"), and below that a statistical breakdown for the school and each subgroup.

Now I admit I'm a bit of a numbers guy. Not too knowledgeable, but really into trying to figure out what the data means. For instance, some of you know that I play this real geeky baseball game called Strat-o-Matic. Again, I don't want to sidetrack, but Strat is a game that, through highly complicated cards and dice rolls, tries to correctly simulate the yearly stats for real baseball players going back to Honus Wagner and such. It's D&D for baseball nerds.

I only bring this up as a prelude to the admission that I can spend hours going through the numbers on these PED reports. No, that's not necessarily a good thing, but If you are at all interested in really understanding this whole "standardized testing" thing you should do the same.

As an alleged "professional educator" who has been forced to sit through several staff meetings on details of such reports, I might have a bit of a leg-up here. But, as a public service, I will spare you from attending those incredibly boring meetings and explain a few things on the reports themselves. Yes that means you gotta actually look at the reports.

I'm assuming those who don't want to look at them stopped reading this post hours ago. Amy Miller at the Journal has the same exact problem regarding the attention span of readers, and I totally understand. I also promise two things: 1. I won't cover everything; 2. I won't give you the company line BS definitions for some of the terms/numbers.

The Meeting Goals Yes/No Part

Actually this top section is pretty straightforward. The results are stated for the entire school population, then broken down into eight sub-groups, ranging from "Caucasian" to "Economically Disadvantaged". Let's focus on the one sub-group of "students with disabilities" because it's the one that caused so many schools to "fail". First, "students with disabilities" means Special Education students. For reasons of obfuscatory political correctness the test reports say "students with disabilties". Stuff like that drives me crazy.

Second, when we say Special Education students here we're not talking about ALL Special Education students. SpEd kids are grouped (whether teachers and administrators care to admit it) into four groups: A, B, C, D. Group "A" SpEd kids need the least help from SpEd teachers, with the needs increasing as the alphabet continues. "D" kids need the most support, with some of them being in Special Education classes all day long.

In terms of standardized tests, "students with disabilities" refers ONLY to SpEd kids from the "C" and "D" levels. One sidenote: In New Mexico "Gifted" is considered part of Special Education (that's not true in most states). Ergo, "Gifted" is a "disability". I cannot count with the most powerful computer in the world how many boring meetings and private discussions I've had on this subject over my 13 years of teaching "Gifted" students. I get depressed just thinking about it. The important thing to keep in mind in terms of standardized tests is that almost all "Gifted" kids are in levels "A" or "B" in the SpEd letter hierarchy. So gifted kids test scores don't count with the "students with disabilities".

So by the time you whittle the "students with disabilities" subgroup down to only "C" and "D" level kids you are talking about a pretty small percentage of the total school population. And it is this very small percentage that has caused many APS schools to "fail" in this year's report. Not that we're holding it against these SpEd kids, or anything. Really, we're not, and you merely insinuating that disgusts me. Seriously, I hope no teachers resent these "students with disabilities", but given human nature it's inevitable that some will. I guess I'm just a negative thinker that way. But let's move on.

In some cases, "students with disabilities" wasn't the "problem", such as Rio Grande HS where the overall school population didn't meet proficiency goals. I count up 27 APS schools as failing either Reading/Math or both to this extent, and many of those were schools like Sierra Alternative and New Futures where students have bigger issues than whether they know how to use a semicolon. If you go through the list, notice how many of these schools are in the South Valley. Harrison MS, Pajarito Elem., Ernie Pyle MS, etc. Then there is Polk MS where a staggeringly small 9.9% of the student body met proficienty in Math. Which gets us to the actual numbers/stats.

The Numbers/Statistics Part

As prelude, let me acknowledge that I am not a professional statistician. My only Stat class was in a graduate program at Evergreen State College in Olympia, Washington in which most of the class refused to learn standard deviation because they were convinced it and other statistical concepts were conspiratorial inventions of large multi-national corporations and over evil-doers. Nevertheless, the teachers were really good, and even I understood it and a few other things by the end of it. Still, I"m nowhere near an expert here, just someone who digs looking into the numbers.

Just at random let's take a line from the statistical analysis for Reginald Chavez Elementary. I pick them because I never taught there, know nothing about the place, don't even know where it is. Now, you've probably figured out that I can't figure out a simple way to copy/paste the .pdf files into this overly long blog entry, so you won't be surprised when I simply retype the line from the report thusly:

Subject: Reading
Group: Hispanic
Number Tested: 123
Percent Proficient: 38.21
"Annual AYP Goal": 45.00
"Lower Bound Confidence Interval for AYP Goal": 34.98

So what the heck does this mean (and yes, I realize many Babble readers had far more Stat work during college than me and are ridiculing me for being so pedantic, condescending and so darn teacherly in general)? Well, simply put, in terms of specified Hispanic students Reginald Chavez Elementary "passed" the standardized test in Reading.

More interestingly, the reason it passed is not because a sufficient percentage of its Hispanic were "proficient" (and by the way, I'm not 100% sure, but I'm pretty sure "proficient" meant "at grade level" in the subject...this gets dinked with every year it seems). The school and its Hispanic population "passed" because the precentage of students demonstrating "proficiency" was slightly higher than the required percentage minus one standard deviation based on the number of students in that sub-group tested.

Okay, now we're in two camps: the "WTF?" camp and the "Duh!" camp. The "WTF?" folks didn't quite get that last sentence because it was either horribly written (a given) or they didn't have that Stat class during college. The "Duh!" folks probably left many paragraphs ago, so let's focus on the "WTF?" folks.

What that gobbledy-gook above means is that the required percentage of students proficient in a sub-group (like "students with disabilities" or "Hispanic") depends on the number of students in that group taking the test. That's because the overall "AYP Goal" (and you'll notice that differs between schools) is based on ALL the students in the school. The smaller the sub-group is, the less statistically sure we are that group should be evaluated identically to a more statistically sufficient number. Through some Statistics mumbo-jumbo, a "lower bound confidence interval" is established that is one standard deviation lower than the required, and this lower number becomes the new required percentage. So...45% of Reginald Chavez Elementary Hispanic students don't need to be proficient. Only 34.98% do. Since they had 38.21% proficient...YEAH!, THEY PASS!

Reginald Chavez Elementary is interesting (and I swear I just picked it for the simple reasons listed above) because not only did the Hispanic Reading score fall into the gap between the "lower bound confidence interval" and the supposedly required percentage, BUT THE ENTIRE SCHOOL DID. To wit, from the report:

Reginald Chaves Elementary Reading: All Students
Number Enrolled in AYP Grades: 184
Number Participated: 182
Number Tested: 136
Percent Proficient: 38.97
Annual AYP Goal: 45.00
"Lower Bound Confidence Interval for AYP Goal": 35.44

Okay, I've introduced a few extra lines here, namely having to do with the number participating and "tested". You'll notice that 182 students participated, but only 136 count as "tested". That's because schools only count the scores of students who attended that school the entire year. This makes sense, but it leads to a very interesting statistical situation, especially in elementary schools.

Elementary schools are smaller than middle and high schools. As such, the statistical "confidence" of a school's score is less. Therefore, the gap between the required percentage and the "lower bound confidence interval" is bigger. Now...add the fact that in Reginald Chavez Elementary's case the number actually counting for statistical analysis is only 136, and that gap balloons to almost 10 percent, EVEN FOR THE ENTIRE SCHOOL.

With gaps that wide, Reginald Chavez fits snugly between the "required" and "lower bound..." and passes with a 38.97%.

So what you ask? Or, bringing the "Duh!" folks back into the mix, you might be asking who cares? I'll close with some weak-ass attempts to answer both of those questions:

  • The shocking thing about Standardized Testing 2005-06 isn't that so many high schools and middle schools failed, it is that so many elementary schools passed.
  • A big reason elementary schools did better was simply because elementary schools are smaller.
  • Throughout the district scores for "students with disabilities" were atrocious, except for schools with fewer than 30 tested "C" and "D" level SpEd kids. These schools didn't have a yes/no next to this sub-group because there weren't enough tested kids to be statistically accountable. Note that in many cases these scores were atrocious as well, they just didn't lead to a big "NO" next to that sub-group
  • Interestingly, Amy Miller's story in the Journal quotes APS testing guru Rose-Ann McKernan as saying that a new "assessment" (that means test) is being created for SpEd kids next year. In other words, this year's test sucked for SpEd kids and we're gonna make it easier to pass. This being the case, should we even have to count this year's scores for SpEd kids? And what does this say about the overall usefulness of these tests as accountability measures if we run "beta" versions out that we later find to be too difficult? Or too easy?
There are easily another 50,000 questions I could pose and then poorly answer on the subject before lunch, but this post has gone on long enough. Also, the Reginald Chavez example above is just one of the literally thousands of interesting little number-crunching nuggets of goodness that can be mined from these test score reports. I strongly suggest anyone interested to arm themselves with massive amounts of caffeine and skepticism and wade through all the APS scores.

I realize that the local media can't take time/space to undertake such a statistical frog-gigging expedition, but anyone who either puts stock in these tests/reports and/or cares about K-12 education should get their mental feet wet awhile checking these out.


Michelle Meaders said...

It reminds me of the way some economic statistics are reported: what matters is how the results compared with expectations, not the results themselves.

Nora said...

Wow! That was really interesting-- thanks for the writeup! I think I need to brush up on my stats, though.

Richard Albury said...

Interesting post.

So, as someone with two kids in elementary school thinking of moving from Tampa to Albuquerque in the next couple of years, how do I go about figuring out if a given school is decent? This given the Catch 22 of trying to find a decent neighborhood.

And how far up the alluvial fan should I live? ;-)

If it helps, Rio Rancho does *not* sound appealing, nor do the Heights, but Nob Hill sounds interesting if a bit full of itself.

Michelle Meaders said...

My opinion about the "quality" of APS schools, which is never mentioned when bashing it: this is a huge school district (5th biggest in the country), with city and suburban, rich and very poor kids. (also kids from homeless, recent immigrant, and teen-parent families) They equalize the operations funding between schools, instead of having triple the spending per capita in the suburbs, like many big cities elsewhere. School performance has more to do with the parents than the schools. So it's unfair to compare such a diverse school district with the suburban-only ones in other places.

frannyzoo said...

All: Thanks for the comments, as well as the other emails I've gotten on this post. Richard...I was going to respond via this comment space, but your question led me to further exploration of the data and that turned into another, more recent, blog post on the subject. I didn't have your email..maybe you can take a look at that other post (and welcome to ABQ , btw).

And Nora: As a Humanities teacher I tend to think I'll never use any stat knowledge, but sometimes it just pops out of the wierdest places.

Richard Albury said...

Many thanks for the response: it was precisely what I was looking for.

Also, many, many years ago - 1970, to be precise - I was a student for six months at MacArthur Elementary: it was a good six months, and we were sorry to leave ABQ.

Now to figure out where to live... somewhere not monochromatic, if you catch my drift. ;-)

stephanie james said...

I really appreciate your post. I have been studying these results since my daughter started kindergarten last year.

Last year, my first concern was that Collet Park hadn't passed. Then I was told that it was only because of the "English as a second language" students.

So this year when it didn't pass again, for the same reason, I've wondered what exactly are all these numbers used for?! What does it mean to me?! Etc., etc.

After reading your post I went back to the AYP results and understand a bit more about what the individual numbers mean but it still doesn't add up to anything useful to me.

One thing popped out to me that I missed before because I didn't understand the numbers. The AYP goal is the same across the board for each category. So, if they expect a lower percentage for the disadvantaged categories why are they measuring them with the same ruler? They already KNOW these kids perform lower. What is this supposed to tell us?

It makes me angry that there is so much money being spent on this nonsense just to end up making parents, students, and teachers feel like they are failing.

Additionally, it seems to me that it is economic suicide for the state to be advertising what seems to be a failing school system.

I am of the opinion that school is what you make of it and nothing more. The community within the public schools is a microcosm of the real world. A more realistic experience than a private institution. So, I am fully behind the public schools. I just wish our government would quit finding senseless ways of spending our education dollars. Our best investment would be in our teachers. Then we wouldn't have to just HOPE our kids get that special teacher that considers their job a labor of love.

Thanks Scot.

frannyzoo said...

Stephanie: Yours is just about the nicest comment I've yet received in the history of 'Burque Babble. Okay, that's not alot of history, but I just want you to know I really appreciate your kind words and meaningful insights.

I don't know if you saw it (and wonder if tackling another 2,000 paralyzingly intricate words on testing is a good idea), but I have a 2nd post on the subject. It's "More Fun With Test Scores" or something like that.

Good luck in all your research and effort to find the best school for your daughter (it was daughter, right?)...if every parent was as involved and thoughtful in this process it's hard to imagine how much better the quality of public schools would be, imho.