Summary of assessment implications for first feedback to Leadership Team
The following represents some summary notes on educational assessment for the purposes of feeding back to the staff at my school. Much of it is the material from the first session of the module on summative assessment, with some reference to formative assessment for the benefit of comparison. Anyone should feel free to comment in any way.
What is assessment?
Measurement/judgement about 'how learners are getting on'?
These are not interchangeable although some formative apps of summative and vice versa. Different purposes – different ways of assessing.
Summative assessment (the focus of this module)
Grading, levelling, marks, accountability, external data, reporting back to parents, county, governors. An end product purpose. Not for feeding back per se, although levels are used as incentives and so do feed back into children's learning and motivation. In a positive way? Experience in y5 and y6 generally seemed to indicate that children became more motivated once they were given full knowledge of the levels and what achieving a higher level entailed.
reliability – results are replicable for different students of the same level, the same student being assessed in the same way at a different time, the same student taking a parallel test at the different time – all very problematic due to the number of variables. You can't step in the same river twice. No tests or assessment of any kind are very reliable. Some lend themselves more to fixed or closed methods of assessment and so are more likely to be more reliable. Experience in y6 past papers in maths – consistency of marks for each child. English assessment is notoriously unreliable, especially in the use of criteria based assessment methods. Experience of marking of writing 2009 and 2010. Not dependent on the apparent expertise of the markers – discrepancies of 3 levels, grades etc.
Validity – the assessment must assess what it purports to assess, for instance, a science assessment should not be assessing the child's ability (or inability) to read. Teaching to the tests would indicate that the tests are assessing something other than the national curriculum requirements. (Poetry writing, for example, is never a requirements for the tests)
manageability – any system of assessment will fail if it is not perceived as manageable. Examples of this include the early years of the NC where hundreds of ATs were attempted. Tick sheets – collapse of purpose. Implications for APP – use for all pupils considered unmanageable so 3 selected for scrutiny. Is this use of teacher assessment for learning or is this for accountability? What are we really supposed to be doing for the other pupils?
Payoff in the three aspects - increase in reliability = decrease in validity (difficulty assessing everything being learned, sampling size etc) and decrease manageability. Increase validity = major decrease manageability.
Assessment for Learning. Teacher assessment throughout learning period – marking, tests, questions, observations, discussions. Purpose – feedback, complicity, engagement, decision making, motivation, pupil influence. Improved learning. Does not need to generate a mark, level, reliable consistent result.
Possible implications for school
Do we believe everyone shares understanding of what assessment is?
Confusion with record keeping and tracking?
The act of assessment – what is that?
Do we need (is it appropriate) to make summative assessments so often throughout the year?
How can we improve the reliability of summative assessments?
Given the inherent unreliability of any grades/levels generated by assessment, but particularly by criterion/descriptive methods, is it appropriate to use these to measure teacher (school, county) performance, e.g. 'must make on average at least 4 points progress'?
Are we clearly defining the purposes of our assessment? Is it always summative? Do we know when we are assessing for formative purposes and if so, are we attempting to use these in a summative way?
Using formative assessment for summative purposes - issues of reliability