If We Must Use Grades, Let's Make them Reliable

by Dr. Kathie F. Nunley

My daughter brought home her report card this week. It wasn't a bad report card, but it wasn' t the one most parents dream about - the one with all A’s running down the column for 2nd quarter. Hers was a bit of a mix.

"How can you keep getting a B in Spanish?" was my first exclamation.

Yes, now that I’ve had time to ponder my reaction, the teacher and psychologist in me realizes I should have first commented on all the courses that she excelled in and how proud I was of the effort she was making, etc., etc. before pouncing on her for what I considered the negative aspects of the report card. But sometimes my Mother brain section overrides my teacher/psychologist brain section.

"Mom, I'm trying, really!" came her standard reply.

"Kahlia, Spanish is your easiest class. There’s no reason not to be making an A in there. Why do you keep getting B's? I thought we discussed all this last term?"

"Mom, be glad I’m not like Gina" (her best friend). "She got a C and she speaks Spanish fluently.” “She never does anything in that class. The only reason she is even passing is that she speaks it fluently, so she aces all the tests.” “ I just forget to do my journals and that brings my grade down."

"I see you got an A in World Studies. That's great. I appreciate the effort you must have put forth for that." (OK, the reasoning, empathetic psychologist brain kicked in here).

My daughter's reply, "Well, not really. It’s an easy class. I do all the projects and they count for most of the grade.” But, hey, I got the A."

Grades. They are the end result of a student's journey through a class. But they are more than just a mark on a report card. Grades are the liaison between schools and the American public. Grades are the measure by which parents and the community outside our walls assume we are doing our job inside the walls.

I contend that the subjectiveness of our grading system though is not just a small flaw in our educational system, it is a gaping wound, oozing forth most of the pus which shocks and disappoints the public as they view their public schools. The community and political arenas assume that school grades are valid and reliable predictors of learning and ability. And the dark secret we educators never talk about is that they really are neither reliable nor valid. Would anyone like to defend our educational grading system to the American public?


One of the first things you learn in statistics is that a measuring device has no intrinsic validity. The validity is determined by its use. In other words, an ACT test alone carries no validity. The validity comes when you look at what it is used to measure or predict. Is the ACT a valid predictor of intelligence? Is it a valid predictor of college success? Is the ACT a predictor of your ability to drive a car? Is it valid in terms of how well you can raise pigs? You can see that the validity for the ACT would vary wildly in these situations. The ACT may be somewhat valid in its ability to predict college success, but not valid at all in its ability to predict successful pig farming.

Yet, the American public assumes that school grades are a valid predictor of learning and ability. They mistakenly believe that grades measure these things somewhat accurately. What disappoints them most is that they see so many blatant examples that indicate otherwise. They see too many students making good, or at least passing grades who actually appear to have learned very little. They see too many students who have the ability to learn and do complex thinking who have failed classes and dropped out of school.


This is not to say that students who make high marks or grades in school are not learning anything or are not gifted young people. For many, if not most, are. But the system is not reliable.

Reliability is the other measure of a test Something is reliable if it is consistent. A test is reliable if it gives you a fairly similar score each time. We can view reliability with some simple examples. If I put a full bag of flour on my scale every day for a week and it always weighs 9 pounds, then I declare my scale to be reliable. If I’m weighing a 10 pound bag of flour, I may not find my scale very valid, but it is nevertheless, reliable. (To be a valid tool for measuring flour, the scale would have to measure it at its true weight of 10 pounds.) I use this same thinking with my kitchen oven. It is reliable, but not valid. It consistently heats 25 degrees hotter than I set it. Because it is consistent, it is not a problem as I know to just set the temperature 25 degrees lower than I want it.

You can easily see how you can have reliability without validity, but you cannot have validity without reliability. If my scale weighs the flour at 7 pounds one day, 9 pounds the next day, 10 pounds on day three and 8 pounds on day four, it is not reliable nor could we consider it a valid tool for measuring bags of flour. So if you don’t have reliability, then you surely cannot achieve validity.

So which is more important, reliability or validity? Perhaps reliability is, because without that, you lose both.

Are grades in school reliable predictors of learning and ability? Does an “A” always mean a student has learned more than could be expected of most students and has an ability in the top sector of his or her school? Always? The vast majority of the time? Most of the time? Sometimes? Maybe?

Take any department at your school and pull out the students who achieved an “A” this term on their grade report. Now pull out those students who received a “C” on their grade report. Can you put all the students with “A's in one group and know for sure, without exception, that they are the brightest of the bright? They all learned more than any of the other students in the English department or the Science or Math department? Can you say with certainty that those “C” students, without exception, are of lesser quality? Are you assured that they actually know less, learned less and have a lesser ability than any of the “A” students? It would be the rare school would could make this claim. For this is the dark secret that we hesitate to share with the public. Our grading system seriously lacks reliability and without that we cannot even hope for validity.

Currently the assigning of grades is often helter-skelter within departments and between schools and can often be a bit of luck-of-the-draw. A student can get a “B” in Mr. Jones’ geometry class because he did absolutely all the assigned class and homework (possibly by questionable means), did loads of available extra credit and barely passed exams. But if Mr. Jones weighs tests rather lightly compared to classwork and always offers lots of extra credit and make-up work, then this student can end up with a “B” in geometry. If however this same student just happened to have been scheduled into Ms. Richards’ class, he would have earned a “D” as Ms. Richards’ offers no extra credit and heavily weights exams. Mr. Jones and Ms. Richards teach the same subject in the same school which issues the same transcripts to the same Universities who use high school grades as one of the factors in acceptance.

The system has been steeped in subjectivity for so long that it will be a lot of work to change it, but certainly not impossible. We need to start by being honest with ourselves, the public and the politicians about our grading scheme. The system we use to publicly declare a young person's success in a course is extremely subjective and varies widely among teachers, departments, schools and districts.

Once we acknowledge this, we can start to address the problem and look for solutions. We must come up with some type of standardization within our schools for evaluating student performance and then form an operational definition for grades that can be shared with the public. Our goal is to produce a “key” to the grades on a report card. Can you operationally define what an A means at your school? Can you take steps toward improving the reliability of that definition? Given the fact that grades are the most important interface we have with our constituents (students, parents, school boards, colleges) it is a critical that we look for ways to make them highly reliable.

Several years ago, I started to address a solution to this issue with the Layered Curriculum model of high school instruction. One of the key components to Layered Curriculum is that student grades are indicators of the depth or level of study rather than subjective marks determined by individual teachers. Layered Curriculum classrooms divide the study of a subject into 3 layers, based on Bloom’s taxonomy - basic knowledge, application of new knowledge to previous knowledge, critical leadership evaluation in that topic. Grades now are attached to those layers as such:

C: This student has added to their bank of general knowledge to a level deemed acceptable by the teacher.(Standards may be established through departments as to demonstrated recall)

B: This student added to their bank of general knowledge as above, plus demonstrated his or her ability to apply that knowledge in a different field or compare it to a different arena. The student demonstrated an ability to use and manipulate the new knowledge in addition to storing it for recall.

A: This student added to their general knowledge bank, and applied or demonstrated use of that knowledge as above plus was able to critically evaluate an issue in the real world which required their ability to combine knowledge with ethics, values, morality and/or sense of global responsibility.

The idea to this grading scheme is to operationally define what a grade means by requiring a particular thought process at each layer. Student grades are determined by the complexity of thinking, not just rote knowledge and recall. Now there is some standardization to grades and a way for them to be consistently interpreted by parents, institutions and businesses outside of our secondary school system.

Is this a valid measure of learning? That all depends on if you agree that more complex thinking is an indicator of learning and ability and is to be valued. It may be valid if we in fact judge learning by a student's ability to use or generalize new knowledge to other areas and by their ability to debate serious topics and form opinions and make decisions as a leader or adult voter.

You may feel that there are better indicators of learning or at least additional indicators, and you may be correct. But what is most important here is not whether or not it is valid, but that it could at least be reliable. Once we get reliable, then we can start to tackle the next step, validity. But we have to start with something reliable.

The current system is seriously flawed. We must start the repair by starting with the reliability issue of grades. Find some operational definition for grades within each department, ideally, within each school, and then build your teaching instruction around those definitions.

If you want to use the simple Layered Curriculum model, just start by having teachers break down each instructional unit into Basic knowledge, application/manipulation, and critical debate issues.

For their C layer, teachers decide what basic information do students need to learn. How can they measure that? What standards will they use to determine successful completion of that C layer?

For their B layer, teachers decide what types of assignments or assignment choices can they offer to allow students to play around with that new learning. Find ways to have students connect new learning to previous knowledge. Some interdisciplinary activities would very well in this layer as do projects, displays and problem solving labs. Teachers need to establish the criteria or standards as to how to determine mastery of this B layer.

Finally, for their A layer, teachers need to identify current issues pertaining to their topic for which there is research to support more than one view. This is simply a matter of thinking about issues in the news that pertains to this subject where there are no right and wrong answers. What issues are leaders and voters currently dealing with? Have teachers offer students the opportunity to research these issues and then form an opinion. Establish criteria for mastery of this A layer.

For parents of children in Layered Curriculum high schools, grades now have predictable meaning. Now the mother can say to the daughter, “I see you have only applied what you learned in Spanish class. Why did you not take the time to involve yourself in a critical thinking issue?” Or, “I see you have gathered quite a bit of basic information and skill in math, but how can we help you take on the application issues in order to bring up your grade?”

At least it is reliable. It is predictable. It is the same across the board. A university looking at a transcript would understand the meaning of an A or a B. A counselor or future employer would be able to really know what a person was capable of by looking at a transcript. Once we get some reliability, we can then take on that real sticky issue, validity. Are these valid measures of intellect, ability and learning?

Until then, I will join the other parents and ask my daughter, “Can't you just do some extra credit?”

Kathie F. Nunley is an educational psychologist, author, researcher and speaker living in southern New Hampshire. Developer of the Layered Curriculum® method of instruction, Dr. Nunley has authored several books and articles on teaching in mixed-ability classrooms and other problems facing today's teachers.
Kathie (at) brains.org




Layered Curriculum is a registered trademark created and owned by Dr. Kathie F. Nunley.
Copyright © 1998 - current year by Kathie F. Nunley.
All Rights Reserved.

See usage guidelines.