Using AI for Formative Feedback: Current Challenges, Reflections, and Future Investigation
By Matthew Voice, Applied Linguistics at the University of Warwick
One strand of the WIHEA’s working group for AI in education has focused on the role of AI in formative feedback. As part of this strand, I have been experimenting with feeding my own writing to a range of generative AI (ChatGPT, Google Bard, and Microsoft Bing), to learn more about the sorts of feedback they provide.
The accompanying presentation documents my observations during this process. Some issues, such as the propensity of AI to ‘hallucinate’ sources, are well-documented concerns with current models. As discourse on student use of AI begins to make its way into the classroom, these challenges might provide a basis for critical discussion around the accuracy and quality of the feedback produced by language models, and the need for student to review any outputs produced by LLMs.
Other common issues present different challenges for students using LLMs to elicit formative feedback. For instance, the prompt protocol in the presentation revealed a tendency for AI to provide contradictory advice when its suggestions are queried, leading to a confusing stance on whether or not an issue raised actually constitutes a point for improvement within the source text. When tasked with rewriting prompt material for improvement, LLMs consistently misconstrued (and therefore left absent) some of the nuances of my original review, in a fashion which changed key elements of the original argumentation without acknowledgement. The potential challenges for student users which arise from these tendencies is discussed in more detail in the presentation’s notes.
In addition to giving some indication of the potential role of LLMs in formative feedback, this task has also prompted me to reflect on the way I approach and understand generative AI as an educator. Going forward, I want to suggest two points of reflection for future tasks used to generate and model LLM output in pedagogical contexts. Firstly: is the task a reasonable one? Using LLMs ethically requires using my own writing as a basis for prompt material, but my choice to use published work means that the text in question had already been re-drafted and edited to a publishable standard. What improvements were the LLMs supposed to find, at this point? In future, I would be interested to try eliciting LLM feedback on work in progress as a point of comparison.
Secondly, is the task realistic, i.e. does it accurately reflect the way students use and engage with AI independently? The review in my presentation, for example, presupposes that the process of prompting an LLM for improvements to pre-written text is comparable to student use of these programmes. But how accurate is this assumption? In the Department of Applied Linguistics, our in-progress Univoice project sees student researchers interviewing their peers about their academic process. Data from this project might provide clearer insight into the ways students employ AI in their learning and writing, providing a stronger basis for future critical investigation of the strengths and limitations in AI’s capacity as a tool for feedback.
This is blog 14 in our diverse assessment series, the two most recent previous blogs can be found here:
- Assessments: Capturing Lived Experience and Shaping the Future
- Building knowledge on the pedagogy of using generative AI in the classroom and in assessments