Robert Harrell on Language Assessment Principles

Source: Language Assessment; Principles and Classroom Practices, H. Douglas Brown and Priyanvada Abeywickrama. Pearson Education, 2010.

There are five principles of language assessment:

  1. Practicality
  2. Reliability
  3. Validity
  4. Authenticity
  5. Washback

Practicality includes:

  • Staying within the budget
  • Doable within time constraints
  • Clarity of directions for administration
  • Appropriate utilization of human resources
  • Not exceeding available material resources
  • Considering time and effort involved in both design and scoring

Reliability includes:

  • Consistency in conditions across two or more administrations
  • Clarity of directions for scoring/evaluation
  • Uniformity of rubrics for scoring/evaluation
  • Consistency of application of rubrics by scorer(s)
  • Unambiguity of items/tasks to the test taker
  • Issues in reliability
    • Student-related reliability issues (e.g. illness, fatigue, anxiety, other physical and psychological factors, test-wiseness)
    • Rater reliability issues (e.g. multiple raters, rater training and experience, fatigue, bias, unclear scoring criteria)
    • Test administration issues (e.g. testing conditions such as noise, lighting, temperature, physical condition of desks and chairs)
    • Test reliability issues (e.g. task design, poorly-written test items, excessive test items, time constraints)

Validity (“an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment”) includes:

  • Genuinely measuring what the test proposes to measure
  • Excluding the measurement of irrelevant or “contaminating” variables
  • Relying as much as possible on empirical evidence (performance)
  • Involving performance that samples the test’s criterion (objective)
  • Offering useful, meaningful information about a test-taker’s ability
  • Support by a theoretical rationale or argument
  • Evidence of validity
    • Content-related evidence (“It actually samples the subject matter about which conclusions are to be drawn, and … it requires the test-taker to perform the behavior that is being measured” p. 30 “If you are trying to assess a person’s ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple-choice questions requiring grammatical judgments does not achieve content validity.” P. 31)
    • Criterion-related evidence (The extent to which the criterion of the test has actually been reached. This is best indicated by comparing test performance with other types of assessment)
    • Construct-based evidence (The test taps into the theory, hypothesis, or model that explains observed phenomena. Linguistic constructs include proficiency, communicative competence, and fluency. Before accepting a test’s construct validity, the constructs have to be defined – e.g. what is oral proficiency? – and then the test compared with that definition)
    • Consequential validity (i.e. the impact of the test on both a macro and a micro level)
    • Face validity (does the test “look right” with a well-constructed format and familiar tasks, tasks that can be completed in the allotted time, items that are clear and uncomplicated, directions that are crystal clear, tasks that have been rehearsed in previous course work, tasks that relate to course work, reasonable difficulty level)

Authenticity includes:

  • Language that is as natural as possible
  • Items that are contextualized rather than isolated
  • Meaningful, relevant, interesting topics
  • Thematic organization such as through a story line or episode
  • Tasks that replicate real-world tasks

Beneficial Washback includes:

  • Positive influence on what and how teachers teach
  • Positive influence on what and how learners learn
  • Opportunity for learners to prepare adequately
  • Feedback that enhances learners’ language development
  • Being more formative than summative
  • Providing conditions for peak performance by the learner

Note from Robert: I believe the three big issues will be Validity, Authenticity, and Washback. Any test that does not assess proficiency and fluency is not valid, if those two constructs are the objectives of instruction. Any test that does not replicate real-world tasks is not authentic. Any test that negatively affects teachers or learners, fails to enhance language development, or is primarily summative does not provide positive washback. Any common assessment needs to be vetted according to these principles of test design and construction. Any assessment that contains discrete grammar items, isolated conjugation tasks, and irrelevant (contaminating) variables is unacceptable. This is where I would work to be certain that the common assessment is proper.