Robert Harrell on Language Assessment Principles

Source: Language Assessment; Principles and Classroom Practices, H. Douglas Brown and Priyanvada Abeywickrama. Pearson Education, 2010.

There are five principles of language assessment:

Practicality
Reliability
Validity
Authenticity
Washback

Practicality includes:

Staying within the budget
Doable within time constraints
Clarity of directions for administration
Appropriate utilization of human resources
Not exceeding available material resources
Considering time and effort involved in both design and scoring

Reliability includes:

Consistency in conditions across two or more administrations
Clarity of directions for scoring/evaluation
Uniformity of rubrics for scoring/evaluation
Consistency of application of rubrics by scorer(s)
Unambiguity of items/tasks to the test taker
Issues in reliability
- Student-related reliability issues (e.g. illness, fatigue, anxiety, other physical and psychological factors, test-wiseness)
- Rater reliability issues (e.g. multiple raters, rater training and experience, fatigue, bias, unclear scoring criteria)
- Test administration issues (e.g. testing conditions such as noise, lighting, temperature, physical condition of desks and chairs)
- Test reliability issues (e.g. task design, poorly-written test items, excessive test items, time constraints)

Validity (“an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment”) includes:

Genuinely measuring what the test proposes to measure
Excluding the measurement of irrelevant or “contaminating” variables
Relying as much as possible on empirical evidence (performance)
Involving performance that samples the test’s criterion (objective)
Offering useful, meaningful information about a test-taker’s ability
Support by a theoretical rationale or argument
Evidence of validity
- Content-related evidence (“It actually samples the subject matter about which conclusions are to be drawn, and … it requires the test-taker to perform the behavior that is being measured” p. 30 “If you are trying to assess a person’s ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple-choice questions requiring grammatical judgments does not achieve content validity.” P. 31)
- Criterion-related evidence (The extent to which the criterion of the test has actually been reached. This is best indicated by comparing test performance with other types of assessment)
- Construct-based evidence (The test taps into the theory, hypothesis, or model that explains observed phenomena. Linguistic constructs include proficiency, communicative competence, and fluency. Before accepting a test’s construct validity, the constructs have to be defined – e.g. what is oral proficiency? – and then the test compared with that definition)
- Consequential validity (i.e. the impact of the test on both a macro and a micro level)
- Face validity (does the test “look right” with a well-constructed format and familiar tasks, tasks that can be completed in the allotted time, items that are clear and uncomplicated, directions that are crystal clear, tasks that have been rehearsed in previous course work, tasks that relate to course work, reasonable difficulty level)

Authenticity includes:

Language that is as natural as possible
Items that are contextualized rather than isolated
Meaningful, relevant, interesting topics
Thematic organization such as through a story line or episode
Tasks that replicate real-world tasks

Beneficial Washback includes:

Positive influence on what and how teachers teach
Positive influence on what and how learners learn
Opportunity for learners to prepare adequately
Feedback that enhances learners’ language development
Being more formative than summative
Providing conditions for peak performance by the learner

Note from Robert: I believe the three big issues will be Validity, Authenticity, and Washback. Any test that does not assess proficiency and fluency is not valid, if those two constructs are the objectives of instruction. Any test that does not replicate real-world tasks is not authentic. Any test that negatively affects teachers or learners, fails to enhance language development, or is primarily summative does not provide positive washback. Any common assessment needs to be vetted according to these principles of test design and construction. Any assessment that contains discrete grammar items, isolated conjugation tasks, and irrelevant (contaminating) variables is unacceptable. This is where I would work to be certain that the common assessment is proper.