How Do We Know?

Unlike more pure subjects like physics and mathematics, foreign language education, with its myriad intangibles, is not easy to measure. Yet we, Tarot fools, stride confidently along the edge of the cliff, confident that the data we use to measure gains in our students is accurate.
Where does this cavalier trust in data come from? Is it warranted? How does the existence of a common assessment based on a book justify, as Grant asked, the passing of a student to the next level via a bottleneck, forcing really talented kids, because they currently care nothing about verb conjugations and object pronoun agreement, into stopping their language study.
Obviously, people who work in the area of assessment at the district level are the ones qualified to ask this question. As a mere classroom teacher with little knowledge of how data works, I appeal to your knowledge and insight. Here is the question again – how do we know that our use of data in foreign language assessment is accurate?



9 thoughts on “How Do We Know?”

  1. I hear what you are saying, Ben. but we have to have data. That’s what drives education now. As a CI teacher, I will be the first person to volunteer to be my site’s chairperson to make district assessments because at least the CI voice will be heard (and I don’t want anybody putting conjugation on a district exam).
    We are doing some very dramatic things in terms of grading in that we have created learning scales for every subject at every level. After much debating (one lady wanted a scale for preterite verbs, irregular verbs, direct objects….), we have created 10 learning scales for foreign language to measure kids based on their ability to use language, not to conjugate. Points have become obsolete and now kids are asking what they can do to meet standard. I think I saw something like this on the Georgia foreign language something or other.
    I know that kids acquire language at different rates and we shouldn’t force them to output, but that isn’t realistic in my district and at my site. So Marzano has taken this crazy concept called the Power Law and applied it to education. Essentially the last assignment in a particular scale has the highest weight and as long as there is a positive trend in the learning the grade increases. Someone can score a 1 then a 2 then a 3 then a 4 in comprehension, for example, and have a 4 at the end of the semester.
    So the 9 scales in my classroom are comprehension, oral vocabulary, oral grammar, written vocabulary, written grammar, oral fluency, written fluency, free write, and independent practice (for dictados and the 30 minutes of outside homework–an idea I picked up on this blog).
    The results have been good this semester. Most kids are approaching standard (meets standard=3.0) and see the positive trend in their learning. They have these little trackers to see how much smarter they are getting as they track their ability to use language in each of those 9 areas (It’s exciting to hear kids say look my vocabulary grade went up). One result is that kids know they have to get 150 words in 10 minutes for their free write–guess what they practice for homework… free writes.
    I know no grading system is perfect–especially one in our subject area–but like I said before, I glad that I had a hand in developing it and it isn’t aligned to a textbook.

  2. I’ll post anything on the flog to help others. Even if if gets bloody. I was kind of thinking about that today. I was doing a Matava story this morning and it got really crazy with a crack house under the sand in Great Sand Dunes – a girl camping in a Shaggin Wagon (I have no idea what that is) etc. And it occurred to me, the crazier the story got, how bloody the perception of CI must be when teachers let their classes get slightly out of control (i.e. exit the Sick Cans). It’s the old rap on TPRS, “If they’re laughing like that they can’t be learning anything.” Anyway, Kelly just came in from North High to hang out for three periods. We’ll go with “The Tent” by Matava and One Words for the level ones. It’s so nice to say, as I wait for the tardy bell to ring, how much fun it is to feel real safe collegiality where there was none before I heard about Blaine and Krashen and Susie. Kelly and I are just going to work together on better teaching. She’s not going to bring any evaluation form with her, and I am going to try my best to communicate the method to her – in this case how to spin a story out of PQA.

  3. O.K. I got a long – but worth the read – response from Drew in response to Grant and Nathan’s desire to know more about what he said above. I will just paste the entire thing here, but also time stamp it for tomorrow in the form of a separate blog entry.
    Reminder of what this is about – we wanted to draw Drew out on a grading system he helped develop based on “10 learning scales for foreign language to measure kids based on their ability to use language, not to conjugate.” Here it is, (repeated tomorrow):

    When I start presenting the scales at training sessions I always start out with we are bringing back the old check/check plus/check minus system. We are communicating to our kids what they need to do get that check mark–to meet standards. The scale is a 4-point scale, so-to-speak. A 3.0 means that the kid has met standard, a 2.0 is approaching standard, and a 4.0 is that the kid has exceeded the standard. It must be said that I am not teaching for a 4.0, I am teaching for a 3.0. Thus, I am not giving out 4.0s all the time. In fact, I can probably count on both of my hands the number of 4.0s I’ve given this semester. Yes, there is a 1.0, far below standard, I give very few of those–there is not, however, a 0.0. Regardless of a student not turning in an assignment (whatever) it is our job as teachers to see what they know. Even if its an informal conversation with the kid we can see where the kid lies 0.5-4.0
    My teaching has not changed very much since I started implementing the learning scales. What has changed is the type of work I collect from kids, how I assess their work, and how they interpret my evaluation of their work.
    Let’s take the comprehension scale. The language of the scale is:
    Score 3.0: Student describes the critical or essential elements of the text or audio source (e.g.: main idea, characters, plot). The student exhibits no major errors or omissions.
    Score 2.0: There are no major errors or omissions regarding the simpler details and processes as the student:
    recognizes or recalls the critical elements such as main idea, character, plot.
    performs basic processes such as recalls or recognizes accurate statements about critical or essential elements of the text or audio source.
    So a comprehension assignment would be exactly what we all probably do in a CI class. We would ask a story or personalize the vocab structures. A student will be creating the quiz. The twist now is that the student is creating the 2.0 questions for the quiz. By definition the 2.0 questions are multiple choice, fill in the blank, true/false. So if the students can answer 5 true/false questions correctly for that day of CI then they get a 2.0. To get a 3.0 the students need to be able to describe the story/plot/details/whatever of what was addressed in class that day. All they do is write a few sentences or a paragraph underneath of their T/F questions. It’s super fast to grade.
    As the students read a Blane Ray reader one of the things we might do as a class is come up with 10 2.0 questions for a chapter and a couple 3.0 questions for a chapter.
    We are reading Mi propio auto in Spanish 2. Some 2.0 questions would be:
    True or false: Ben is going to Costa Rica to help people build houses.
    True or false: Ben only goes because he wants a car.
    A 3.0 question would be: Describe Ben’s motive for going to El Salvador for the summer.
    If I am doing my job as a CI teacher then all of my students should be able to hit that 2.0 for comprehension. Now my grade book is set up in such a way that I can see which students are meeting standard in comprehension. I can see which students are now my target students based on who isn’t meeting standard. I know that in my 3rd period Spanish 2 class the average for comprehension is 2.9 (my class understands Spanish!) and the lowest score for comprehension is 2.39. so Lauren, with her 2.39 is able to answer all T/F questions for what she hears and sometimes is able to use her language to partially answer 3.0 questions.
    Students keep track of their grades on a graph and can observe their learning trends. They have the same data that I have and they use it for goal setting and focus on their independent practice.
    The interesting thing is that the 0 and the Super F don’t exist anymore. Regardless of how the kid did at the beginning of the semester, it is the very last assignment that carries the most weight. Lauren, at the beginning of the semester got a 1.5 on the story “Nicole no sabe aplaudir”, got a 2.0 on “Katie hace fila”, and a 2.5 on “Arin compra un ruso”. That is what makes her current comprehension score a 2.39. The average of the three is a 2.0, the Power Law took her positive learning trend into account and projected her score to be the 2.39. She sees herself getting better too as she tracks her progress, which is an ego-boost in itself. Think about how that 0 would factor into an average grade. It doesn’t reflect what the kid knows or can do–it just pulls the kid so far down into that D or F range that he doesn’t want to work anymore.
    The comprehension quizzes are still unannounced, vocabulary quizzes are still unannounced. Everything just happens.
    When we do vocabulary quizzes they have 10x 2.0 questions (cómo se dice) and they have 4x 3.0 questions, use numbers 2, 4, 5, and 8 in a sentence or paragraph. That score goes into written vocab. Any and all words are fair game from the semester.
    The fluency scales are easy to use as well. At 2.0 a teacher can understand what you are saying. At 3.0 a sympathetic native speaker could understand what you are saying.
    For the most part, the scales are not an arbitrary assigning of points. Students know exactly what they have to do to meet standard. How do they exceed standard? I don’t tell them that, they have to come up with it on their own. The language for a 4.0 on every scale reads: “In addition to Score 3.0, in-depth inference and applications that go beyond what was taught.” Some kids come up with some great vocab words and structures we haven’t learned in class. Some kids are using the two past tenses pretty accurately… It’s pretty cool. If I told them what A 4.0 is, the grade-grubbers would come out of the woods.
    All 9 of the scales take their power law grades and average them together to create the overall grade for the course. The average score for my 3rd period is 2.53. Now we have to be arbitrary again, because I have to give a letter grade, not approaching standard with a 2.53. I have been translating grades like this: A 3.0+-2.41; B 2.4-1.78; C 1.76-1.14; D 1.13-1.03; F 1.02-0. For semester grades, each range will be shifted up by .2 points. Notice how small the D range is.
    Their grade print outs are really cool. They have each of the 9 scales with the assignments listed underneath of them to show them exactly how they have been performing. Now when we have those grade-pow-wows and kids ask why do I have a C, I can say well you’re not meeting standard here, here and here but look at you on comprehension. Let’s start focusing your outside study time on vocabulary…
    We created 12 scales as a district, I use 7 of the ones we created and I created 2 additional ones for my CI class. Pronunciation, directed response, register, and some other ones I thought were silly.
    The 9 scales I am using are: oral vocabulary (3.0=wide range of appropriate vocabulary), oral grammar (3.0=high frequency of grammatical accuracy), oral fluency (3.0=a native speaker can understand you), written vocabulary (3.0=wide range of appropriate vocabulary and 3.0 vocab quizzes [next semester I am adding curriculum vocabulary to use for the vocabulary quizzes]), written grammar (3.0=high frequency of grammatical accuracy), written fluency (3.0 a native speaker can understand you), comprehension, completion (3.0=student addresses all parts of the prompt adequately), free write (.5= 0-50 words, 1.0 50-99 words, 2.0 100-149 words, 3.0 150-175 words, 4.0 175+), independent practice (3.0=student does required work).
    The one essential component is an electronic grade book that has a standards-based grading option with the power law feature built in.
    I’m a weird one who tries new things. Keeps my professional life interesting. I tried TPRS thinking what the heck, it might work; if not, it’s just Spanish. I tried this scale business thinking that I could mess up grades. Well whatever, I’ll just give everyone As; it’s just Spanish.
    Was it worth the change? Ask me in 5 weeks at semester. So far I am seeing better data, 0 Fs, most kids meeting/approaching the standard, and fewer inflated grades (As for kids who don’t deserve them). I’m not giving a kid for breaking and keeping a seat warm in my class. Now they have to do something.

  4. At our school, the administration hasn’t seen an initiative they haven’t liked. I have no idea now whether we have had 5, 10, 15 new initiatives in the past few years? The newest one is standards-based grading, among others. I know that many schools throughout the country either have moved to this or are moving to this, so having this forum to learn about and discuss it will be invaluable.
    I just wish that at our school would listen to its faculty. We recently took a survey that dealt with climate. In the survey, the administration asked for comments that dealt with climate. Man, did they ever get replies.
    You would assume that the administration would reflect on these answers, work on some team-building, maybe listen to the “troops”…Not a chance. We were rewarded with some more initiatives. It reminds me of a bird hunter who hears something in the sky but can’t see anything, so he just aims at the sky and shoots in many directions. Coincidentally, an overhead jet explodes and sheds some metal. When that metal lands in a field near the hunter, he shouts in astonishment, “Wow! I hit something!”
    Thank goodness for the classroom and working with students. That part is still a blast (no jet-engine pun intended). However, part of working with students is assessment, and the standards-based train is on the track and heading my way. I appreciate the ability to be able to ask questions of the professionals on this list.
    Mike W.

Leave a Comment

  • Search

Get The Latest Updates

Subscribe to Our Mailing List

No spam, notifications only about new products, updates.

Related Posts

The Problem with CI

Jeffrey Sachs was asked what the difference between people in Norway and in the U.S. was. He responded that people in Norway are happy and

CI and the Research (cont.)

Admins don’t actually read the research. They don’t have time. If or when they do read it, they do not really grasp it. How could

Research Question

I got a question: “Hi Ben, I am preparing some documents that support CI teaching to show my administrators. I looked through the blog and

We Have the Research

A teacher contacted me awhile back. She had been attacked about using CI from a team leader. I told her to get some research from



Subscribe to be a patron and get additional posts by Ben, along with live-streams, and monthly patron meetings!

Also each month, you will get a special coupon code to save 20% on any product once a month.

  • 20% coupon to anything in the store once a month
  • Access to monthly meetings with Ben
  • Access to exclusive Patreon posts by Ben
  • Access to livestreams by Ben