We Must Use Grades, Let's Make them Reliable
Dr. Kathie F. Nunley
daughter brought home her report card this week. It wasn't a bad
report card, but it wasn' t the one most parents dream about - the
one with all A’s running down the column for 2nd
quarter. Hers was a bit of a mix.
can you keep getting a B in Spanish?" was my first exclamation.
now that I’ve had time to ponder my reaction, the teacher
and psychologist in me realizes I should have first commented
on all the courses that she excelled in and how proud I was of
the effort she was making, etc., etc. before pouncing on her for
what I considered the negative aspects of the report card. But
sometimes my Mother brain section overrides my teacher/psychologist
I'm trying, really!" came her standard reply.
Spanish is your easiest class. There’s no reason not to
be making an A in there. Why do you keep getting B's? I thought
we discussed all this last term?"
be glad I’m not like Gina" (her best friend). "She
got a C and she speaks Spanish fluently.” “She never
does anything in that class. The only reason she is even passing
is that she speaks it fluently, so she aces all the tests.”
“ I just forget to do my journals and that brings my grade
"I see you got an A in World Studies. That's great. I appreciate
the effort you must have put forth for that." (OK, the
reasoning, empathetic psychologist brain kicked in here).
daughter's reply, "Well, not really. It’s an easy
class. I do all the projects and they count for most of the grade.”
But, hey, I got the A."
They are the end result of a student's journey through a class.
But they are more than just a mark on a report card. Grades are
the liaison between schools and the American public. Grades are
the measure by which parents and the community outside our walls
assume we are doing our job inside the walls.
contend that the subjectiveness of our grading system though is
not just a small flaw in our educational system, it is a gaping
wound, oozing forth most of the pus which shocks and disappoints
the public as they view their public schools. The community and
political arenas assume that school grades are valid and reliable
predictors of learning and ability. And the dark secret we educators
never talk about is that they really are neither reliable nor valid.
Would anyone like to defend our educational grading system to the
of the first things you learn in statistics is that a measuring
device has no intrinsic validity. The validity is determined by
its use. In other words, an ACT test alone carries no validity.
The validity comes when you look at what it is used to measure or
predict. Is the ACT a valid predictor of intelligence? Is it a valid
predictor of college success? Is the ACT a predictor of your ability
to drive a car? Is it valid in terms of how well you can raise pigs?
You can see that the validity for the ACT would vary wildly in these
situations. The ACT may be somewhat valid in its ability to predict
college success, but not valid at all in its ability to predict
successful pig farming.
the American public assumes that school grades are a valid predictor
of learning and ability. They mistakenly believe that grades measure
these things somewhat accurately. What disappoints them most is
that they see so many blatant examples that indicate otherwise.
They see too many students making good, or at least passing grades
who actually appear to have learned very little. They see too many
students who have the ability to learn and do complex thinking who
have failed classes and dropped out of school.
is not to say that students who make high marks or grades in school
are not learning anything or are not gifted young people. For many,
if not most, are. But the system is not reliable.
is the other measure of a test Something is reliable if it is consistent.
A test is reliable if it gives you a fairly similar score each time.
We can view reliability with some simple examples. If I put a full
bag of flour on my scale every day for a week and it always weighs
9 pounds, then I declare my scale to be reliable. If I’m weighing
a 10 pound bag of flour, I may not find my scale very valid, but
it is nevertheless, reliable. (To be a valid tool for measuring
flour, the scale would have to measure it at its true weight of
10 pounds.) I use this same thinking with my kitchen oven. It is
reliable, but not valid. It consistently heats 25 degrees hotter
than I set it. Because it is consistent, it is not a problem as
I know to just set the temperature 25 degrees lower than I want
can easily see how you can have reliability without validity, but
you cannot have validity without reliability. If my scale weighs
the flour at 7 pounds one day, 9 pounds the next day, 10 pounds
on day three and 8 pounds on day four, it is not reliable nor could
we consider it a valid tool for measuring bags of flour. So if you
don’t have reliability, then you surely cannot achieve validity.
which is more important, reliability or validity? Perhaps reliability
is, because without that, you lose both.
grades in school reliable predictors of learning and ability? Does
an “A” always mean a student has learned more than could
be expected of most students and has an ability in the top sector
of his or her school? Always? The vast majority of the time? Most
of the time? Sometimes? Maybe?
any department at your school and pull out the students who achieved
an “A” this term on their grade report. Now pull out
those students who received a “C” on their grade report.
Can you put all the students with “A's in one group and know
for sure, without exception, that they are the brightest of the
bright? They all learned more than any of the other students in
the English department or the Science or Math department? Can you
say with certainty that those “C” students, without
exception, are of lesser quality? Are you assured that they actually
know less, learned less and have a lesser ability than any of the
“A” students? It would be the rare school would could
make this claim. For this is the dark secret that we hesitate to
share with the public. Our grading system seriously lacks reliability
and without that we cannot even hope for validity.
the assigning of grades is often helter-skelter within departments
and between schools and can often be a bit of luck-of-the-draw.
A student can get a “B” in Mr. Jones’ geometry
class because he did absolutely all the assigned class and homework
(possibly by questionable means), did loads of available extra credit
and barely passed exams. But if Mr. Jones weighs tests rather lightly
compared to classwork and always offers lots of extra credit and
make-up work, then this student can end up with a “B”
in geometry. If however this same student just happened to have
been scheduled into Ms. Richards’ class, he would have earned
a “D” as Ms. Richards’ offers no extra credit
and heavily weights exams. Mr. Jones and Ms. Richards teach the
same subject in the same school which issues the same transcripts
to the same Universities who use high school grades as one of the
factors in acceptance.
system has been steeped in subjectivity for so long that it will
be a lot of work to change it, but certainly not impossible. We
need to start by being honest with ourselves, the public and the
politicians about our grading scheme. The system we use to publicly
declare a young person's success in a course is extremely subjective
and varies widely among teachers, departments, schools and districts.
we acknowledge this, we can start to address the problem and look
for solutions. We must come up with some type of standardization
within our schools for evaluating student performance and then form
an operational definition for grades that can be shared with the
public. Our goal is to produce a “key” to the
grades on a report card. Can you operationally define what an A
means at your school? Can you take steps toward improving the reliability
of that definition? Given the fact that grades are the most important
interface we have with our constituents (students, parents, school
boards, colleges) it is a critical that we look for ways to make
them highly reliable.
years ago, I started to address a solution to this issue with the
Layered Curriculum model of high school instruction. One of the
key components to Layered Curriculum is that student grades are
indicators of the depth or level of study rather than subjective
marks determined by individual teachers. Layered Curriculum classrooms
divide the study of a subject into 3 layers, based on Bloom’s
taxonomy - basic knowledge, application of new knowledge to previous
knowledge, critical leadership evaluation in that topic. Grades
now are attached to those layers as such:
This student has added to their bank of general knowledge to a level
deemed acceptable by the teacher.(Standards may be established through
departments as to demonstrated recall)
This student added to their bank of general knowledge as above,
plus demonstrated his or her ability to apply that knowledge in
a different field or compare it to a different arena. The student
demonstrated an ability to use and manipulate the new knowledge
in addition to storing it for recall.
This student added to their general knowledge bank, and applied
or demonstrated use of that knowledge as above plus was able to
critically evaluate an issue in the real world which required their
ability to combine knowledge with ethics, values, morality and/or
sense of global responsibility.
idea to this grading scheme is to operationally define what a grade
means by requiring a particular thought process at each layer. Student
grades are determined by the complexity of thinking, not just rote
knowledge and recall. Now there is some standardization to grades
and a way for them to be consistently interpreted by parents, institutions
and businesses outside of our secondary school system.
this a valid measure of learning? That all depends on if you agree
that more complex thinking is an indicator of learning and ability
and is to be valued. It may be valid if we in fact judge learning
by a student's ability to use or generalize new knowledge to other
areas and by their ability to debate serious topics and form opinions
and make decisions as a leader or adult voter.
may feel that there are better indicators of learning or at least
additional indicators, and you may be correct. But what is most
important here is not whether or not it is valid, but that it could
at least be reliable. Once we get reliable, then we can start to
tackle the next step, validity. But we have to start with something
current system is seriously flawed. We must start the repair by
starting with the reliability issue of grades. Find some operational
definition for grades within each department, ideally, within each
school, and then build your teaching instruction around those definitions.
you want to use the simple Layered Curriculum model, just start
by having teachers break down each instructional unit into Basic
knowledge, application/manipulation, and critical debate issues.
their C layer, teachers decide what basic information do students
need to learn. How can they measure that? What standards will they
use to determine successful completion of that C layer?
their B layer, teachers decide what types of assignments or assignment
choices can they offer to allow students to play around with that
new learning. Find ways to have students connect new learning to
previous knowledge. Some interdisciplinary activities would very
well in this layer as do projects, displays and problem solving
labs. Teachers need to establish the criteria or standards as to
how to determine mastery of this B layer.
for their A layer, teachers need to identify current issues pertaining
to their topic for which there is research to support more than
one view. This is simply a matter of thinking about issues in the
news that pertains to this subject where there are no right and
wrong answers. What issues are leaders and voters currently dealing
with? Have teachers offer students the opportunity to research these
issues and then form an opinion. Establish criteria for mastery
of this A layer.
parents of children in Layered Curriculum high schools, grades now
have predictable meaning. Now the mother can say to the daughter,
“I see you have only applied what you learned in Spanish class.
Why did you not take the time to involve yourself in a critical
thinking issue?” Or, “I see you have gathered quite
a bit of basic information and skill in math, but how can we help
you take on the application issues in order to bring up your grade?”
least it is reliable. It is predictable. It is the same across the
board. A university looking at a transcript would understand the
meaning of an A or a B. A counselor or future employer would be
able to really know what a person was capable of by looking at a
transcript. Once we get some reliability, we can then take on that
real sticky issue, validity. Are these valid measures of intellect,
ability and learning?
then, I will join the other parents and ask my daughter, “Can't
you just do some extra credit?”