The following are reflection written whilst sat on the train back from uni, directly after reading 'Techniques for Monitoring the comparability of examination standards' Paul Newton, Aug 2007. They are therefore not overly ordered thoughts, and should be taken as such.
Newton looks at the processes in place for maintaining standards and comparability between boards within the same subject, this form of cross board standardisation being useful if only on a year by year basis. While the process gains some validity through its methodology, with a number of senior examiners making judgements against exemplar papers from other board, it does still remain that this judgement is made against 'the standard in their own head'.
While Newton recognises the examiners are 'those who are actually empowered to set standards in their respective boards', by holding that standard 'in their own head', highlights the human and subjective element of assessment. I am drawn back to the philosophy of the morning (8/10/11) with Cathie (an area in which I profess no proficiency), where we discussed the idea of each person being unable to truly know the world, as each experience is individual to that person, implying no ability to be able to make a standardised judgement, if the perception of this judgement is individual. Also, a cynic could suggest a chinese whispers effect in the dissemination of this standard down to examiners at ground level.
In addition to this, Newton recognises the challenge, possibly impossibly so, to apply one known standard to 'an unfamiliar syllabus, paper and mark scheme.' This highlights the incomparability (or challenge to comparability) between papers at thye same level, within the same subject, of the same year. An application of any similar methods for comparing across subject, year, board and tier is highly unrealistic.
Paired comparrison method
When I first read Newton's description of this method I instantly related it to how I approach marking coursework and key summative assessments, as I am sure many teachers do. I mark work focusing on successes and improvements before rank ordering students. often this is initially based on a 'gut reaction' to the quality of the students performance, before approaching the assessment criteria and boundaries, and allocating grades and marks. Here, however, Newton is referring to the comparison of two different examinations, and a judgement of which script is better. Is this a comparison of each exam board's harshness or leniency in regard to summative marking, or an evaluation of which board asks the questions which get the best from pupils? Is it only me who wonders this?
Common test method
This method I find intriguing, although clearly bringing up a variety of issues in terms of initial assessment, variables and reliability. This is another example of assessment at cross purposes - with a test specifically designed to assess one thing, how can we make any judgement on correlation to another assessment topic, format or style? For some reason it made me think of training I had two years ago on Jesson data, used for creating aspirational targets for our KS4 students. The idea that a 'student like you' achieved X, or more specifically illustrating a spread of possible outcomes, where 5% 'students like you' achieved V, 15% achieved W, 50% achieved X, 19% achieved Y, and 10% achieved Z. Thus suggesting the median and mode of students' achievement was X, but they had a 50% chance of actually achieving a different grade altogether!
I realise I digress here, but it highlights the incomparability student to student in terms of predicted grade, let alone other factors outlined for the comparison of results.
I realise I am reading Newton (2007) AFTER having read the Goldacre (2010) article. However, in insight terms it seems to highlight the complexities in comparability, in terms of existing methodologies. These are limited to those widely used at the time, and do not include specific, discrete research into comparability on a wider scale. It also does not set out with any agenda or answer relating to the ideas and questions posed in the the Goldacre article.