October 28, 2012

Contact for a small piece of educational research Survey of views of primary/junior science assess


Dear head teacher

I am a teacher working in a primary school in England and seeking to make contact with primary schools in other countries in order to carry out a small piece of research.

I am currently undertaking a Master's degree in Educational Assessment (MAEA) at Warwick University. I am gathering information for a dissertation centred on the experiences and views of stakeholders (teachers and pupils) on science assessment.

I have already carried out a smaller piece of research on this subject, based in the school where I work, with interesting results. I used questionnaires and interviews to survey views of the majority of our pupils (n = > 300), staff and parents. I would now like to obtain some international perspective and would like to find out the views of pupils and teachers in schools around the world.

School participation would be through a short online questionnaire (15 - 20 min), which the subjects could complete anonymously. Interviews, if any schools are interested, would be by agreement and could take place through email, wiki sites, phone or web-conferencing tools. Participants would be guaranteed confidentiality and anonymity at this end. The pupil questionnaire is quick, child-friendly and non-threatening in nature and the subsequent use of the data would be benign. No personal data would be obtained or stored and I have prior ethical approval from my university according to BERA (British Educational Research Association) guidelines.

I would ideally be looking for responses from as many pupils aged between 8 and 12 as possible and at least 5 teachers from each school, but any number would be welcome.

Some of the questions I am seeking to answer are:

    • What types of assessment (if any) are carried out in primary science in schools in different countries

    • What are the views of stakeholders on assessing children's science learning and or skills up to the age of 12?

    • How do these compare across different countries?

    • Do these relate to national policies?

    • How would pupils like to be assessed and what methods do they think would best show their science ability?

    • What use is made, if any, of technology in assessment?

    • How would teachers most like to assess children's science ability?



    A summary of results would be available on request to all participating schools.

    I would aim to collect data between September 2012 and June 2013.

    Are you in a position to help and would you be interested in participating or allowing some of your pupils and staff to take part in this survey? If so, I would be very grateful and delighted to send you the links to the online version. I also have a printable version for participants who do not have Internet access. My email address is:

    J.A.Nickels@warwick.ac.uk

    I look forward to hearing from you.

    Juliet Nickels


    April 11, 2012

    e–assessment – interesting video

    Writing about web page http://www.teachersmedia.co.uk/videos/e-assessment-where-next

    This is interesting. Good for the e-assessment question. 6 years old now - we should be so much further on.


    http://www.teachersmedia.co.uk/videos/e-assessment-where-next




    August 05, 2011

    Foundation research methods and mini–projects

    Trying to do a research project on the basis of a 5 day summer school is something like trying to captain an oil tanker on the basis of a weekend in a sailing dinghy! The task vastly outweighs the input.


    February 12, 2011

    ARG

    Writing about web page http://www.assessment-reform-group.org/publications.html

    Some of the publications


    J


    October 14, 2010

    APP staff meeting 14/10/10

    Notes on APP staff meeting 13/10/10


    Background


    APP was introduced to our school last year primarily in a 'top down' manner (it came from senior management without preamble or discussion among staff) with some dissemination by subject leaders who had had brief training in order to pass on the procedural information to the rest of the staff.


    In this session we were told how we were to continue with using APP, adding reading and writing to the maths we were doing last year. Teachers are to focus on 3 children per subject for their class and acquire a body of evidence for levelling those children. During the meeting we were provided with some written work from a y2 child and instructed to use the APP guidlines to derive a level for that child.


    Issues arising in the meeting


    During the meeting, several issues arose as bulletted below:


    • We are using the system for mainly summative purposes – that is we are to derive levels which are to be used for all our data, including 'high stakes' purposes such as judgements of teachers, cohorts and the school

    • Some mention is made of using APP assessments to help plan next steps (although little mention of feed back to pupils themselves) suggesting a confusion between formative and summative purposes

    • Levels must be generated each term

    • All children are to be given a 'teacher assessed' level but evidence need only be collected for 3 in each subject

    • Individual teachers are to make judgements about their pupils with some advice that we should work with year group partners

    • Interpretation of particular statements varied widely, e.g. 'with support' (support to keep going, support because it is work done during lesson, support because the work is done shortly after lesson)

    • In English writing, it is a requirement that 50% of the writing evidence must be in books other than Literacy books

    • Overall judgements about level are dictated by the number of assessment focuses (Afs) which have been ticked with a stated number of the required Afs in the guidance

    • There was no agreement about how much of the box in an AF needed to be ticked to constitute the level or the achievement of the AF

    • Teachers working in small groups or pairs came to significantly different decisions about the level of the work they were looking at (from high 2 to nearly a low 4)


    If the system is for mainly summative purposes then a serious question arises about the use of this data to judge teachers' performance since it is very difficult to achieve reliability and comparability by assessing in this way. Individual teachers are required to gather evidence and to perform the assessment themselves. There is no opportunity for teachers to discuss at length the children and the evidence of their levels. Furthermore, the summative assessments for the rest of the cohort will not be backed up by any evidence but it will be taken as read that if the judgement about 3 children is 'accurate' then the rest can be assumed. It is difficult to see how this is a convincing argument, especially as there is no reliability in a system which requires each teacher to make a summative judgement of work based on criteria referencing. The recent SATs writing problems are a prime example of the likelihood of disagreement between individuals about judgements based on criteria. Is it acceptable or ethical to use data derived in this way to make judgements about staff and to use that in performance management?


    Another question to consider is, if we are using APP for summative purposes, then why are levels to be derived 3 times a year, rather than at the end of the teaching period? It might be suggested that this is to track the progress of the children, as there is no requirement to report data to any outside agencies until the end of the year. Since the children can only cover a certain amount of material before the end of the year, the criteria can not be fully addressed until that material has been covered.


    Staff found it very difficult to achieve consensus on many aspects of the guidelines, including some fundamental statements as bulletted above. They also failed to agree on a level for the work presented but this was not addressed as an issue. Staff were subsequently informed that the pupil was 'a secure 3' without any empirical evidence being presented to support that statement.


    It seems like APP should have an 'assessment for learning' purpose – that is it is intended to be used primarily for formative assessment – in which case it needs to be applied to all the pupils rather than a selected 3. As such it has some obvious usefulness and can be a powerful tool. Criteria within the AFs provide some clarification of next steps in learning and can be helpful in guiding and motivating pupils as to what they hope to achieve. Used in this way, reliability is less of an issue. Problems arise when it is used in a 'high stakes' environment and no acknowledgement is made either of its unavoidable unreliability in determining levels and in the unmanageability of attempting to achieve reliability.


    The images below represent annotation performed during the meeting. We identified evidence for the various AFs in the writing APP guidelines at the level we judged the work to be at.


    pupilpupil




    October 11, 2010

    MAEA

    Writing about web page http://www2.warwick.ac.uk/fac/soc/wie/teaching/masters/assessment/maeablogs/

    (copied from my student blog… the picture of the graph is visible on that one – along with formatting!)

    Excercise from MA in Educational Assessment, Summative Assessment module, session 1

    Responses to Goldacre’s article in ‘The Guardian’ (Aug 21 2010) National: Bad Science: Mystery of the A-level

    Initial reaction to the A-level results

    Firstly – a confession. Saturday morning’s session was not the first occasion on which I had considered the A level results issue. Watching the media coverage, my initial reaction was two-fold – derisive and subsequently a tad peevish. Most of us have done A-levels at some time in the past. I sat mine in 1980 at a very well-regarded comprehensive (it was once listed as one of the top state schools in the country), with (at the time) supposedly high-achieving pupils. We would have been dumbfounded at anyone achieving 4-5 A grades at A-level in a year. That would have represented academic ability to a staggering degree worthy of a good deal of awe and even veneration! I think it would have been nigh on physically impossible to have covered the material necessary in the time given, for a start. I was intrigued to see how results had changed over time and came across this graph from the University of Buckingham Centre for Education and Employment Research.

    I found it on the BBC website in their response to the ‘issue’.

    http://www.bbc.co.uk/news/education-11012369

    There are more of them here:

    http://www.bbc.co.uk/news/education-11011564

    More of that later.

    Goldacre’s article

    So does Goldacre have a particular argument? He seems to be saying that there is a definite issue in the public consciousness and that reactions to the high % of A grades at A level are polarised between ‘getting easier’ and ‘they are not’. Goldacre writes, ‘how do you know?’ and proceeds to outline the two positions and present some evidence.

    Goldacre’s evidence is:

    Students getting cleverer
    IQ scores have to be adjusted for the Flynn effect.

    Exams getting easier/students’ performance not improving
    Royal Society of Chemistry research – Five Decade Challenge in 2008 – students’ scores in O-level papers from the 60s onwards showed a steady increase in time

    TDA scores have declined and flattened out indicating no increase in performance ability over time in spite of increased A-level scores

    Steady decline in scores in undergraduates’ basic maths skills from 60 university depts in maths , physics and engineering

    Goldacre presents some arguments which may support either the notion that exams are easier or that pupils may just be performing better for other reasons, such as improved teaching methods, differences in subject choices, changes in exam focus etc. He pleads for research to clarify these issues.

    What evidence would be appropriate – what research required?

    Let’s assume for this purpose of considering how we might address his plea, that Goldacre’s presentation of it represents the actual issue. What would constitute evidence for either view and how could it be obtained? Given that the whole area of reliability in assessment is one fraught with problems, it’s difficult to see exactly how we could reasonably compare students’ performances across the decades in any convincing way. My own cohort of 1980s students are no longer the same people we were – sitting a current A-level in our subject (can one still do Zoology?) wouldn’t necessarily show how able we would have been to have sat the same paper back in the day. Similarly, the project to investigate how contemporary students coped with O-level papers of previous decades was not able to address all the variables that might account for the differences in results (though how much more interesting it would have been if the modern pupils had performed better in the old papers!). Is it possible that certain subjects (maths for example) have remained sufficiently static over time and the requirements for national qualifications so consistent, that papers from 1960 onwards could be used in a widespread test on performance with maths students, with levels awarded according to the relevant system of the time? (On a personal note, there must have been some assumed stability when I was revising for my O-levels at the end of the 70s; our practice papers reached back into the 1950s and were still considered relevant!)

    Maybe, but the relevant system of the time presents a problem. In his response letter to the Guardian, KJ Eames stated that there had been a move from ‘norm referencing’ to ‘criterion referencing’ in awarding levels. So in analysing changes over time, do we apply one or either or both methods for grading in our investigation? Even the methods for monitoring comparability year on year have changed. For instance, Paul Newton writes (August 2007) that examining board researchers largely stopped using the common test method to monitor comparability in the 1980s because of the likelihood of plausible challenge on the grounds of uncontrolled variables. There is no comparability, therefore, between papers of the millennium and papers of the 70s.

    Is research for evidence to support or to refute either position in this polarised debate even appropriate, however difficult? Possibly, in a general sense – there is some desire to know how the nation’s students are performing year on year – but probably not in this case specifically, because there is no like comparison. The A-level of 2010 is fundamentally not the same exam or qualification I was entered for in 1980. Even the way in which students study for it is different – the rise of coursework and the interference of the AS level are two examples.

    That said, there clearly was a change in the 1980s – did we make a big move from norm-referenced levelling to criterion-referenced? That could certainly account for the nature of the graph above – stasis for two decades followed by a steady rise. If we had retained a norm-referenced system would we not have kept our A grades at roughly 8% forever?

    How do I feel about the article and the issues?

    I think Goldacre is presenting his perceptions of public opinion which may or may not be true. Perhaps someone should actually investigate whether the public really does hold those opinions. I don’t think the actual issue, however, is about the ease of the exams or the relative intelligence or academic ability of the students over time (I’m skeptical about the degree to which this can change in a human population of this size over a few decades, anyway). However, the statistics clearly show it’s easier to achieve an A grade at A level for whatever reason. This is what Robert Coe (April 2007) had to say about the 2006 results:

    A level grades achieved in 2006 certainly do correspond to a lower level of general academic ability than the same grades would have done in previous years. Whether or not they are better taught makes no difference to this interpretation; the same grade corresponds to a lower level of general ability.

    If that is true, is it important? It is in this respect: there is an impact on perception of the value of the qualification by the public, the students themselves, further education establishments, employers and previous students. The question arises as to what is the value of an A-level. If 25% of students achieve an A, then how do institutions award those who could demonstrate higher achievement? With an A*? Will there be a continuous devaluation process? A**? If the A-level of today is not the same as that of 1980 then why is it still called an A-level since that implies that it has retained certain characteristics over time? What is it even for as a qualification? Maybe it is the vestige of something which no longer serves its originally intended purpose.

    Refs:

    Coe, Robert (2007) Changes in standards at GCSE and A-Level: Evidence from ALIS and YELLIS – Report for the ONS by, CEM Centre, Durham University

    Newton, Paul (2007) Techniques for monitoring the comparability of examination standards – QCA


    October 10, 2010

    Excercise from MA in Educational Assessment, Summative Assessment module, session 1

    Excercise from MA in Educational Assessment, Summative Assessment module, session 1


    Responses to Goldacre's article in 'The Guardian' (Aug 21 2010) National: Bad Science: Mystery of the A-level


    Initial reaction to the A-level results


    Firstly – a confession. Saturday morning's session was not the first occasion on which I had considered the A level results issue. Watching the media coverage, my initial reaction was two-fold – derisive and subsequently a tad peevish. Most of us have done A-levels at some time in the past. I sat mine in 1980 at a very well-regarded comprehensive (it was once listed as one of the top state schools in the country), with (at the time) supposedly high-achieving pupils. We would have been dumbfounded at anyone achieving 4-5 A grades at A-level in a year. That would have represented academic ability to a staggering degree worthy of a good deal of awe and even veneration! I think it would have been nigh on physically impossible to have covered the material necessary in the time given, for a start. I was intrigued to see how results had changed over time and came across this graph from the University of Buckingham Centre for Education and Employment Research.


    a grades at a-level since 1960


    I found it on the BBC website in their response to the 'issue'.


    http://www.bbc.co.uk/news/education-11012369


    There are more of them here:


    http://www.bbc.co.uk/news/education-11011564


    More of that later.


    Goldacre's article


    So does Goldacre have a particular argument? He seems to be saying that there is a definite issue in the public consciousness and that reactions to the high % of A grades at A level are polarised between 'getting easier' and 'they are not'. Goldacre writes, 'how do you know?' and proceeds to outline the two positions and present some evidence.


    Goldacre's evidence is:


    Students getting cleverer

    • IQ scores have to be adjusted for the Flynn effect.

    Exams getting easier/students' performance not improving

    • Royal Society of Chemistry research – Five Decade Challenge in 2008 – students' scores in O-level papers from the 60s onwards showed a steady increase in time

    • TDA scores have declined and flattened out indicating no increase in performance ability over time in spite of increased A-level scores

    • Steady decline in scores in undergraduates' basic maths skills from 60 university depts in maths , physics and engineering


    Goldacre presents some arguments which may support either the notion that exams are easier or that pupils may just be performing better for other reasons, such as improved teaching methods, differences in subject choices, changes in exam focus etc. He pleads for research to clarify these issues.



    What evidence would be appropriate – what research required?


    Let's assume for this purpose of considering how we might address his plea, that Goldacre's presentation of it represents the actual issue. What would constitute evidence for either view and how could it be obtained? Given that the whole area of reliability in assessment is one fraught with problems, it's difficult to see exactly how we could reasonably compare students' performances across the decades in any convincing way. My own cohort of 1980s students are no longer the same people we were – sitting a current A-level in our subject (can one still do Zoology?) wouldn't necessarily show how able we would have been to have sat the same paper back in the day. Similarly, the project to investigate how contemporary students coped with O-level papers of previous decades was not able to address all the variables that might account for the differences in results (though how much more interesting it would have been if the modern pupils had performed better in the old papers!). Is it possible that certain subjects (maths for example) have remained sufficiently static over time and the requirements for national qualifications so consistent, that papers from 1960 onwards could be used in a widespread test on performance with maths students, with levels awarded according to the relevant system of the time? (On a personal note, there must have been some assumed stability when I was revising for my O-levels at the end of the 70s; our practice papers reached back into the 1950s and were still considered relevant!)


    Maybe, but the relevant system of the time presents a problem. In his response letter to the Guardian, KJ Eames stated that there had been a move from 'norm referencing' to 'criterion referencing' in awarding levels. So in analysing changes over time, do we apply one or either or both methods for grading in our investigation? Even the methods for monitoring comparability year on year have changed. For instance, Paul Newton writes (August 2007) that examining board researchers largely stopped using the common test method to monitor comparability in the 1980s because of the likelihood of plausible challenge on the grounds of uncontrolled variables. There is no comparability, therefore, between papers of the millennium and papers of the 70s.


    Is research for evidence to support or to refute either position in this polarised debate even appropriate, however difficult? Possibly, in a general sense – there is some desire to know how the nation's students are performing year on year - but probably not in this case specifically, because there is no like comparison. The A-level of 2010 is fundamentally not the same exam or qualification I was entered for in 1980. Even the way in which students study for it is different – the rise of coursework and the interference of the AS level are two examples.


    That said, there clearly was a change in the 1980s – did we make a big move from norm-referenced levelling to criterion-referenced? That could certainly account for the nature of the graph above – stasis for two decades followed by a steady rise. If we had retained a norm-referenced system would we not have kept our A grades at roughly 8% forever?


    How do I feel about the article and the issues?


    I think Goldacre is presenting his perceptions of public opinion which may or may not be true. Perhaps someone should actually investigate whether the public really does hold those opinions. I don't think the actual issue, however, is about the ease of the exams or the relative intelligence or academic ability of the students over time (I'm skeptical about the degree to which this can change in a human population of this size over a few decades, anyway). However, the statistics clearly show it's easier to achieve an A grade at A level for whatever reason. This is what Robert Coe (April 2007) had to say about the 2006 results:


    A level grades achieved in 2006 certainly do correspond to a lower level of general academic ability than the same grades would have done in previous years. Whether or not they are better taught makes no difference to this interpretation; the same grade corresponds to a lower level of general ability.


    If that is true, is it important? It is in this respect: there is an impact on perception of the value of the qualification by the public, the students themselves, further education establishments, employers and previous students. The question arises as to what is the value of an A-level. If 25% of students achieve an A, then how do institutions award those who could demonstrate higher achievement? With an A*? Will there be a continuous devaluation process? A**? If the A-level of today is not the same as that of 1980 then why is it still called an A-level since that implies that it has retained certain characteristics over time? What is it even for as a qualification? Maybe it is the vestige of something which no longer serves its originally intended purpose.


    Refs:


    Coe, Robert (2007) Changes in standards at GCSE and A-Level: Evidence from ALIS and YELLIS - Report for the ONS by, CEM Centre, Durham University


    Newton, Paul (2007) Techniques for monitoring the comparability of examination standards - QCA


    Summary of assessment implications for first feedback to Leadership Team

    The following represents some summary notes on educational assessment for the purposes of feeding back to the staff at my school. Much of it is the material from the first session of the module on summative assessment, with some reference to formative assessment for the benefit of comparison. Anyone should feel free to comment in any way.


    Educational Assessment


    What is assessment?


    Measurement/judgement about 'how learners are getting on'?


    Purposes

    • Summative

    • Formative


    These are not interchangeable although some formative apps of summative and vice versa. Different purposes – different ways of assessing.


    Summative assessment (the focus of this module)


    Summative purpose

    Grading, levelling, marks, accountability, external data, reporting back to parents, county, governors. An end product purpose. Not for feeding back per se, although levels are used as incentives and so do feed back into children's learning and motivation. In a positive way? Experience in y5 and y6 generally seemed to indicate that children became more motivated once they were given full knowledge of the levels and what achieving a higher level entailed.


    Issues of:

    • reliability – results are replicable for different students of the same level, the same student being assessed in the same way at a different time, the same student taking a parallel test at the different time – all very problematic due to the number of variables. You can't step in the same river twice. No tests or assessment of any kind are very reliable. Some lend themselves more to fixed or closed methods of assessment and so are more likely to be more reliable. Experience in y6 past papers in maths – consistency of marks for each child. English assessment is notoriously unreliable, especially in the use of criteria based assessment methods. Experience of marking of writing 2009 and 2010. Not dependent on the apparent expertise of the markers – discrepancies of 3 levels, grades etc.

    • Validity – the assessment must assess what it purports to assess, for instance, a science assessment should not be assessing the child's ability (or inability) to read. Teaching to the tests would indicate that the tests are assessing something other than the national curriculum requirements. (Poetry writing, for example, is never a requirements for the tests)

    • manageability – any system of assessment will fail if it is not perceived as manageable. Examples of this include the early years of the NC where hundreds of ATs were attempted. Tick sheets – collapse of purpose. Implications for APP – use for all pupils considered unmanageable so 3 selected for scrutiny. Is this use of teacher assessment for learning or is this for accountability? What are we really supposed to be doing for the other pupils?


    The payoff


    Payoff in the three aspects - increase in reliability = decrease in validity (difficulty assessing everything being learned, sampling size etc) and decrease manageability. Increase validity = major decrease manageability.


    Formative assessment


    Assessment for Learning. Teacher assessment throughout learning period – marking, tests, questions, observations, discussions. Purpose – feedback, complicity, engagement, decision making, motivation, pupil influence. Improved learning. Does not need to generate a mark, level, reliable consistent result.


    Possible implications for school


    • Do we believe everyone shares understanding of what assessment is?

    • Confusion with record keeping and tracking?

    • The act of assessment – what is that?

    • Do we need (is it appropriate) to make summative assessments so often throughout the year?

    • How can we improve the reliability of summative assessments?

    • Given the inherent unreliability of any grades/levels generated by assessment, but particularly by criterion/descriptive methods, is it appropriate to use these to measure teacher (school, county) performance, e.g. 'must make on average at least 4 points progress'?

    • Are we clearly defining the purposes of our assessment? Is it always summative? Do we know when we are assessing for formative purposes and if so, are we attempting to use these in a summative way?

    • Using formative assessment for summative purposes - issues of reliability





    October 04, 2010

    First day of MA

    Well I made it to the Induction Evening, against the odds! This is my embarkation on the Master's in Educational Assessment about which I am now quite excited, though trepidatious, of course, that I'll not have time to fit it in around a full-time job in primary education and membership of several gigging bands - and that's not even taking into account my need to watch movies! I say 'against the odds' because I had no communication about the course whatsoever, until a few days ago, in spite of frequent emails on my part asking if the course was still on as I hadn't heard anything and appeared to have disappeared off the system. I'm sure it'll all sort itself out as we go along, but it wouldn't have hurt to have had the reading list during the summer...


    August 2019

    Mo Tu We Th Fr Sa Su
    Jul |  Today  |
             1 2 3 4
    5 6 7 8 9 10 11
    12 13 14 15 16 17 18
    19 20 21 22 23 24 25
    26 27 28 29 30 31   

    Search this blog

    Tags

    Galleries

    Most recent comments

    • Love the idea of linking the technology to learning – but still very wary of "distraction" – depth o… by on this entry
    • Assessment online has distinct advantages. After moderating work at a centre/school/college it is li… by on this entry
    • I'm fairly sure that it's an issue with the 'digital immigrant' generation and also the interesting … by Juliet Nickels on this entry
    • Yes, I agree that the pace of change does not seem to have been as great as we might have expected –… by Sara Hattersley on this entry

    Blog archive

    Loading…
    RSS2.0 Atom
    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXIX