All 3 entries tagged Lewis Beer
No other Warwick Blogs use the tag Lewis Beer on entries | View entries tagged Lewis Beer at Technorati | There are no images tagged Lewis Beer on this blog
July 24, 2023
Can ChatGPT evaluate the quality of insights in a student essay? In January 2023, Daisy Christodoulou published an article (Can ChatGPT mark writing?, No More Marking Blog) exploring this question. She tested ChatGPT using English essays, and found that while its grading and feedback were more or less aligned with her own, the AI was sometimes unable to spot fundamental problems in essays. Christodoulou offers some fascinating insights, but one thing she does not explore in any detail is the relationship between ChatGPT’s qualitative evaluation of an essay and the content of the essay itself.
In this post, I will share the results of my own brief experiment, in which I pasted two film reviews into ChatGPT and asked it to evaluate (and compare) the quality of insights in these reviews. My aim here was to use these texts as substitute ‘essays’, and consider how the AI-generated responses might help or hinder a marker in giving meaningful feedback.
The dialogues quoted from below were conducted on 23 March 2023, using the free ChatGPT 3.5. (I later repeated the dialogues with both this and ChatGPT 4.0, but found the responses from 23 March gave more substantial and interesting feedback. I will not discuss those later dialogues in this post.)
Review 1: Peter Bradshaw
To begin with, I asked ChatGPT to ‘Evaluate the quality of insight in the following review of the film Nomadland’, then pasted the text of Peter Bradshaw’s review of Nomadland from The Guardian. I wanted to see how it would respond to this vague request, in which I provided no criteria on which to base an evaluation.
The response did a good job of summarising the review using selected words from the text. For instance, at the end of its evaluation the chatbot said:
[Bradshaw] does offer a mild critique that the film could have been more explicit in its condemnation of the economic forces causing the characters' hardships. However, they still recognize the film's greatness and Chloé Zhao's exceptional filmmaking.
This is clearly pieced together from the following passage in Bradshaw’s review, in which I have highlighted the details used by ChatGPT:
[The film is] brought off with exceptional intelligence and style. Arguably it is not angry enough about the economic forces that are causing all this but it still looks superbly forthright. There is real greatness in Chloé Zhao’s film-making.
The paraphrasing techniques used here – transposing words like ‘exceptional’ and ‘greatness’ into slightly different contexts, and spelling out that ‘all this’ means ‘the characters’ hardships’ – are effective in ‘feedback’ terms, because they seem to indicate an understanding of the writer’s (i.e. student’s) argument. However, Bradshaw says the film is ‘not angry enough’, not that it is ‘not explicit enough’, and this may be a significant error.
In its final verdict on the review, ChatGPT said:
Overall, this review appears to be well-written and insightful, providing a thorough analysis of the film’s themes, characters, and style while also highlighting its strengths and potential weaknesses.
Without my having provided any criteria, the AI assumes that an insightful film review will explore themes, characters, and style, and that it will offer a balanced assessment of the film’s strengths and weaknesses.
More of ChatGPT’s ‘marking criteria’ emerged when I asked it to ‘Identify some ways in which the review could be improved’. It offered some valid points in response to this prompt. For example:
The review could have delved deeper into the film's technical aspects, such as its cinematography, editing, and sound design. While the review mentions the film's style and how it differs from traditional narrative-driven films, it could have gone into more detail about how the film achieves this style and what specific techniques the filmmakers used.
This is arguably an area where Bradshaw’s review falls short. Bradshaw does discuss how Nomadland differs from traditional narrative-driven films, and he does connect this with Zhao’s stylistic choices; in this part of his review, he might have discussed film-making technique in more detail. So the AI’s choice to structure feedback like this (‘While you did talk about x, you could have delved deeper into y...’) could be helpful for a student looking to improve their essay by building on their strengths.
On that note, here is another of ChatGPT’s constructive criticisms:
The review could have included a more critical evaluation of the film. While the review praises the film’s strengths, it could have offered a more balanced assessment that acknowledges both the film’s strengths and its weaknesses.
This picks up on the detail quoted above, from the AI’s initial evaluation, noting that Bradshaw’s critique of the film is ‘mild’. The AI has perhaps noticed that Bradshaw’s more negative language is limited to the very end of his review, and is couched in the word ‘arguably’. Again, if we imagine this as feedback being provided to a student, ChatGPT’s evaluations do a good job of mixing praise and criticism: ‘You balance your argument by acknowledging the film’s weaknesses, but you only do this briefly at the end – you could have included a more critical evaluation.’
Other responses, however, show ChatGPT’s limitations, and would constitute problematic essay feedback. For instance:
The review could have offered more specific examples of how the film explores its themes and characters. While the review mentions some of the film's themes, such as the impact of economic hardship on older Americans, it could have gone into more detail about how the film portrays these themes and how they are relevant to contemporary society.
This is not really a fair critique: Bradshaw does highlight specific examples of how the film explores ‘the impact of economic hardship on older Americans’, and he does allude to contemporary issues such as factory closures, the dominance of Amazon, and the importance of the tourist trade in this part of America:
...looking for seasonal work in bars, restaurants and – in this film – in a gigantic Amazon warehouse in Nevada, which takes the place of the agricultural work searched for by itinerant workers in stories such as The Grapes of Wrath.
Fern, a widow and former substitute teacher in Empire, Nevada – a town wiped off the map by a factory closure – who is forced into piling some possessions into a tatty van and heading off...
At times, the film looks like a tour of a deserted planet, especially when she heads out to the Badlands national park in South Dakota, where there is also tourist-trade work to be had.
ChatGPT also says:
The review could have provided more context for the film's production and reception. For example, the review could have mentioned the awards and critical acclaim that the film has received, or how it fits into Chloé Zhao's broader filmography.
Some of this is fair – the review was published after Nomadland’s Oscar success, so Bradshaw could have mentioned this – but it misses the contextual details Bradshaw includes about the film’s production:
Zhao was even allowed to film inside one of Amazon’s eerie service-industry cathedrals.
The movie is inspired by Jessica Bruder’s 2017 nonfiction book, Nomadland: Surviving America in the Twenty-First Century, and by the radical nomadist and anti-capitalist leader Bob Wells, who appears as himself.
The people she meets on the road are, mostly, real nomads who have vivid presences on screen.
As with the previous criticism, ChatGPT has not acknowledged key details of the review in its initial assessment, so its critique is not balanced: it is like a marker who blames a student for ‘not doing x’ when the student in fact spent several paragraphs on ‘x’. (Human markers sometimes do this, of course.)
Review 2: Beatrice Loayza
I then asked ChatGPT, ‘Is the following review of the film Nomadland more incisive than the previous one?’, and pasted the text of Beatrice Loayza’s review of Nomadland, from Sight & Sound. Again, I deliberately did not provide any assessment criteria. ChatGPT’s answer was ‘yes’, for several reasons – some valid, some less so. First of all, it said, Loayza ‘provides a detailed analysis of the film's themes and cinematography, as well as the performance of Frances McDormand’. This is fair, and picks up on one of the criticisms of Bradshaw cited above (namely his lack of attention to technical aspects). Loayza comments on specific camera techniques, naming the cinematographer and describing the light effects he achieves. She also does more than Bradshaw to explain why McDormand’s performance is so effective.
ChatGPT picks up on another of its own criticisms of Bradshaw by praising Loayza’s critical perspective on the film:
However, the review also criticizes the film's lack of force and clarity in its insights into labor in the 21st century and the exploitation of older Americans. The author points out that the film's depiction of workers exploited by Amazon feels too easygoing and questions the film's liberal naivete in addressing the conditions of the nomadic lifestyle. Overall, the review provides a more nuanced and thoughtful analysis of the film.
This draws upon the following passage in Loayza’s review; again, I have highlighted phrases that ChatGPT seems to have picked up on:
[The film’s] insights into labour in the 21st century, and the exploitation of an older generation of Americans, lack force and clarity. At the very beginning of the film, Fern is employed by Amazon’s CamperForce programme, which provides base wages and free parking space to seasonal workers in their 60s and 70s. In 2020, Amazon doubled its profits during a global pandemic, which makes Zhao’s easygoing depiction of workers exploited by the company feel rather toothless. That the film aims to capture the ways in which a kind of working-class American outsider struggles without fully addressing the conditions of that struggle casts over it the shadow of a questionable liberal naivete.
- In its initial assessment of Bradshaw’s review, ChatGPT noted that his critique of the film was ‘mild’
- In suggesting improvements, it built on this comment by recommending a more balanced approach
- And in drawing a comparison with Loayza’s review, it notes her more substantial version of Bradshaw’s criticism.
At each stage, the AI appears to be drawing upon specific evidence from the texts, rather than just ‘hallucinating’ these evaluative comments.
Elsewhere in its comparison between Bradshaw and Loayza, however, ChatGPT did hallucinate some differences in order to justify its verdict. I will not cite these here, as this post is already very long, but the inaccuracies were of a similar kind to those in the summary of Bradshaw discussed in the previous section.
If these film reviews were formative essays that I had to mark, I could use ChatGPT’s feedback to offer legitimate praise and criticism, suggest improvements, and judge the relative merits of the two essays in relation to each other. However, I would also notice that ChatGPT misses important details in these texts and draws some un-founded contrasts between them.
In the course of this experiment, I tried several variations on the above prompts. Here are some things to note if you want to try a similar experiment yourself:
- I fed the reviews into ChatGPT several times, and in a different order. When I asked it to make a comparative evaluation, it tended to prefer the second review (even if this was Bradshaw’s). When I asked if it could reverse its comparative evaluation (i.e. ‘Can you argue that the other review is more insightful than the first?’), its responses varied: sometimes it doubled down on its first opinion, sometimes it conceded that an alternative opinion could be justified. Again, the reasons given for these opinions ranged from ‘valid’ to ‘hallucinatory’.
- This post demonstrates what Chat-GPT is capable of in the hands of a technically ignorant, time-poor amateur like me, but by using the right prompts and follow-up prompts, it would no doubt be possible to collate more credible ‘essay feedback’, and then ask the AI to present and construct this in an appropriate way. Have a look at the other articles and resources linked to on the AI in Education Learning Circle webpage, try an experiment of your own, and share the results in the comments below. In particular, you might think about the learning outcomes and marking criteria specific to your discipline, or your module, and consider how you might train ChatGPT to use these in evaluating a piece of text.
April 11, 2022
As a member of the newly formed ‘Diverse Assessments’ Learning Circle in WIHEA (led by Leda Mirbahai and Isabel Fischer), how have you engaged with this topic during your time in Higher Education, and are there any lessons for the future?
Back in 2015, while teaching on a Shakespeare module in Warwick’s English Department, I worked with several students who chose not to write a standard academic essay, but instead to submit a Creative Project – for instance a painting, an animated film, a screenplay – accompanied by a reflective essay. Students were advised to centre their project on a clear research question, and were also told that the project would be assessed not for its artistic quality but for its achievement ‘as a piece of intellectual exploration’.
None of this was my idea, and frankly it made me nervous. But when I saw the students’ creative outputs and accompanying reflections, and read the feedback my colleagues gave them, I could see the unique benefits of this assessment method. One student, Amy Brandis, produced a 55-minute documentary called Lady Macbeth: From Stratford to Stage. She sourced A/V equipment, enlisted the help of students, staff, and people outside the university, and developed filming and editing skills to a high standard. She also reflected incisively on the intellectual exploration entailed by this process. Amy has since pursued a career as a writer, film-maker, and script consultant. For students like her, the Creative Project opened up types of intellectual discourse, and levels of insight, that fall outside the scope of a ‘standard’ essay. It also enabled students to explore topics in a way that fed into their broader development goals: not just academic development, but also personal and professional.
Five years later, compiling the Student Innovation at Warwick report during my time in Warwick Enterprise, I spoke to over 120 colleagues across 33 departments about the myriad ways in which Warwick students can ‘engage with innovation’. One way is through innovative pedagogy, and especially through non-standard modes of assessment that encourage students to be creative and take risks. Not only in the English Department, but also in Engineering, Psychology, History, and many others, academics model the risk-positive mindset they want to foster: it’s no mean feat to overcome the administrative and pedagogical challenges of creating new assessment schemes, getting them approved, and introducing them to curious but anxious students. In my current role, supporting Warwick’s involvement in the EUTOPIA Alliance, I see this spirit of pedagogic experimentation playing out on an even larger scale. It’s wonderful that there is so much interesting work being done in this area, but more than a little overwhelming.
The WIHEA-funded project we will carry out in the ‘Diverse Assessments’ Learning Circle, between now and the end of 2022, is an opportunity to explore these issues more deeply, and with a sense of focus that makes the exploration feasible. The Learning Circle members are now pooling their knowledge and experiences to understand the current ‘state of play’ regarding diverse assessments at Warwick. We look forward to consulting a wider range of staff and students to understand the benefits and challenges of diversifying assessment schemes, and to help shape Warwick’s future vision regarding diverse assessments.
There’s still room for more people to join our circle! For further information, please contact the co-leads: Leda Mirbahai, Warwick Medical School (WMS) (Leda.Mirbahai@warwick.ac.uk) and Isabel Fischer, Warwick Business School (WBS) (Isabel.Fischer@wbs.ac.uk).
Link to blog one: Interested in diverse assessments?
Link to blog three: A Student Perspective on Assessment Diversity and Strategy
March 14, 2022
Interested in diverse assessments? Join our learning circle for an exciting grant-funded project on the future of assessments.
Authors: Isabel Fischer; Leda Mirbaha; Lewis Beer; Dawn Collins; Peter Fossey; Celine Martin; Natasha Nakariakov; Pula Prakash; Farrah Vogel-Javeri
We have recently created an interdisciplinary learning circle which aims to optimise the learning opportunities for Warwick students. We want to ensure that the teaching and learning opportunities are inclusive and cater for our diverse student community. Assessment and feedback are critical stages in the learning process. Using diverse assessments will ensure that students are not unfairly disadvantaged or advantaged by a specific form of assessment. Although it is worth noting that what may count as diverse assessment in one faculty may not necessary be seen as a diverse assessment approach in another. Therefore, using diverse assessments comes with its own challenges and barriers such as:
- Diversification without sufficient opportunity for students to practice and get familiar with the new and different forms of assessment, disadvantaging group of students that may not be as familiar with certain style of assessments
- The resource and time component needed for familiarisation then reduces the uptake and engagement by faculty with more innovative assessment approaches
Therefore, the aim of this learning circle is to capture both staff and student experience of diverse assessments and to involve students, staff, and other stakeholders in shaping the future of assessments. Furthermore, the learning circle aims to develop practical recommendations on overcoming some of the challenges associated with use of diverse assessments which will significantly benefit the community.
To achieve this we need your support!
If we have not yet convinced you to join our learning circle, then read on:
Vision: Our vision is to foster an inclusive environment where assessments are designed and developed in partnership with students, staff, and external stakeholders, to effectively promote learning, valuing students’ uniqueness and considering their future employment(s) and wellbeing, as well as the social and environmental responsibility and sustainability of the wider community.
Mission. Our mission is to:
- Gather existing data on practices around use of diverse and inclusive assessments, including Warwick staff and student experiences
- Develop shared understanding of principles and practice of diverse assessment
- Develop an evaluative framework for measuring the success of diversified assessment strategies at module, year, and course level
- Capture student and staff views on diverse assessments
To help us achieve our mission we have successfully applied and been awarded a WIHEA funding which will enable us to capture staff and student experiences of some of the diverse and innovative assessment approaches used in different disciplines to address some of the key questions around: 1) perception of diverse assessments from a staff and student perspective, 2) practical tips for successful application of the assessment method and marking, 3) communicating assessed skills and requirements, 4) overcoming challenges. To achieve this, we will conduct interviews with staff and students and will share resources, included but not limited to examples of assessments, marking and feedback rubrics and assessment briefs.
For further information on joining the learning circle please contact the co-leads: Leda Mirbahai, Warwick Medical School (WMS) (Leda.Mirbahai@warwick.ac.uk) and Isabel Fischer, Warwick Business School (WBS) (Isabel.Fischer@wbs.ac.uk)
Stay tuned for further updates and blogs on our initiative, such as: Creative Projects and the ‘state of play’ in diverse assessments – Lewis Beer