The OECD’s landmark report on generative AI in education provides the most comprehensive evidence yet on when AI helps students learn and when it merely helps them perform. For Australia’s VET sector, the implications for assessment integrity, trainer productivity, and the future of competency-based education are profound.
The Most Important Finding in the Report
The OECD Digital Education Outlook 2026, released in early 2026 under the title Exploring Effective Uses of Generative AI in Education, is the most comprehensive international synthesis of evidence on what generative AI actually does to learning. It draws on randomised controlled trials, neuroscience research, large-scale field experiments, design studies, and expert analysis from across OECD member countries. It runs to 13 chapters and more than 250 pages. And its central finding can be stated in a single sentence: generative AI can dramatically improve student performance while simultaneously undermining student learning.
This is not a theoretical concern. A field experiment in Türkiye involving high school mathematics students found that access to GPT-4 improved practice performance by 48 per cent using a standard chatbot interface, and by 127 per cent using a tutoring version specifically designed to support learning. But when access was removed, and students sat closed-book exams, those who had used the standard interface performed 17 per cent worse than students who had never used AI at all. A neuroscience study across five US universities found that students who wrote essays with ChatGPT assistance could recall only 12 per cent of the content of their own essays afterwards, compared to 89 per cent for students who wrote alone or with a search engine. Brain imaging confirmed lower neural connectivity and reduced cognitive involvement in the AI-assisted group.
For the vocational education and training sector, where the entire credentialing system rests on the principle that a qualification represents demonstrated competency, not just demonstrated output, these findings are not peripheral. They are foundational. They tell us that a learner who uses generative AI to produce competent-looking assessment evidence may genuinely not possess the knowledge or skills that the evidence appears to demonstrate. And they tell us that the difference between AI-enhanced performance and AI-undermined learning depends entirely on how the AI is designed, deployed, and integrated into the pedagogical process.
This article examines the OECD report’s key findings, translates them into the specific context of Australia’s VET system, and identifies what RTOs must do to ensure that generative AI strengthens rather than erodes the integrity of competency-based education.
1. The Evidence: When AI Helps and When It Harms
The OECD report synthesises evidence across multiple experimental designs and educational contexts to establish a clear pattern. When generative AI provides direct answers, completed solutions, or finished outputs, student performance on the immediate task improves, often substantially. But learning, defined as the durable acquisition of knowledge and skills that can be demonstrated independently, does not improve and may actively decline. The report identifies this as a fundamental misalignment between task performance and genuine learning.
The mechanism is what researchers describe as cognitive offloading. When a student uses a general-purpose chatbot to answer a question, generate an essay, or solve a problem, the cognitive effort that would normally drive learning, including diagnosis, evaluation, iteration, and reflection, is transferred to the AI system. The student receives a competent output but has not performed the mental work required to develop the underpinning knowledge or skill. Chinese studies of essay revision found that students interacting with human experts followed a structured help-seeking process: diagnosing what they needed, asking for help, evaluating the response, iterating, and implementing. Students interacting with a general-purpose LLM frequently skipped the diagnosis, evaluation, and iteration stages entirely, going directly from question to implementation. Researchers termed this pattern metacognitive laziness.
However, the evidence is not uniformly negative. The OECD report identifies circumstances where AI demonstrably improves both performance and learning. The critical variable is how the AI is designed and used. Purpose-built educational AI tools that employ structured tutoring strategies, such as Socratic questioning that guides learners through reasoning rather than providing answers, show more promise than general-purpose chatbots. The Türkiye experiment found that while the standard GPT-4 interface undermined learning, a tutoring version designed to support the learning process produced performance gains of 127 per cent during practice while preserving learning when access was removed. Collaborative learning studies found that AI acting as an information hub, a personalised materials generator, or a peer contributor in group tasks produced small-to-medium improvements in subject learning and large improvements in critical thinking and teamwork.
The following table consolidates the key experimental evidence from the OECD report, showing the consistent pattern: performance gains are common, but learning gains depend on design.
|
Study / Source |
Performance Finding |
Learning Finding |
|
Türkiye field experiment (Bastani et al., 2024) |
High school students using GPT-4 improved maths practice performance by 48% with the standard interface and 127% with a tutoring version designed to support learning |
When GPT-4 access was removed, students who had used the standard interface performed 17% worse on closed-book exams than students who had never used AI; only the tutoring version preserved learning gains |
|
US neuroscience study (Kosmyna et al., 2025) |
Students across five US universities wrote essays under three conditions: alone, with a search engine, or with ChatGPT; the ChatGPT group produced well-rated essays |
Only 12% of the ChatGPT group could recall specific content from their own essays afterwards, compared to 89% in the other groups; brain imaging showed lower neural connectivity and reduced cognitive involvement |
|
Chinese essay revision studies (Fan et al., 2025; Chen et al., 2025) |
Students using a general-purpose LLM to revise essays achieved the highest task performance scores among all groups tested |
Knowledge gains did not improve; students using GenAI performed fewer metacognitive tasks, especially evaluation and orientation; some skipped diagnosis and iteration stages entirely, demonstrating what researchers termed “metacognitive laziness” |
|
England teacher productivity (cited in editorial) |
Secondary science teachers in England using AI for lesson and resource planning achieved a 31% reduction in time spent on these tasks |
Time savings were substantial and measurable, representing one of the clearest productivity gains documented in the report; however, the long-term effects on teaching quality and teacher skill maintenance remain under investigation |
|
AI tutoring support (cited in editorial) |
Low-experience tutors using AI support achieved a 9 percentage point increase in student pass rates; more experienced tutors saw smaller gains |
AI support appears to have the greatest impact where human expertise is thinnest, suggesting a potential equalising effect; however, the risk of skill atrophy among tutors who rely heavily on AI support is flagged as a concern |
|
Collaborative learning studies (Chapter 4) |
GenAI supports collaborative learning in four roles: information hub, personalised materials generator, teacher feedback provider, and peer contributor in group tasks |
Some studies found small-to-medium improvements in subject learning and large improvements in critical thinking and teamwork; evidence is still limited, but the direction is positive |
The Performance-Learning Gap
The OECD report’s central finding is that generative AI can improve task performance while undermining genuine learning. Students produce better outputs but develop weaker skills.
The Türkiye experiment quantifies this: 48% better practice performance with AI, but 17% worse exam performance when AI is removed. For VET, where qualifications certify competency rather than output quality, this gap is not an academic concern.
It is an assessment integrity crisis waiting to happen.
2. The Scale of Adoption: Students and Teachers Are Already Using AI
The OECD report provides the most detailed international picture yet of how extensively students and teachers are using generative AI. The adoption data make clear that generative AI is not a future challenge for education systems. It is a present reality that is already shaping how learners engage with their training and how teachers design and deliver instruction.
Student adoption has moved from marginal to mainstream since ChatGPT’s launch in 2022. In Estonia, 90 per cent of upper secondary students reported using AI tools for study in 2024. In Germany, 94 per cent of higher education students were using AI in 2025, with 65 per cent doing so daily or weekly. In the United States, 68 per cent of teenagers aged 15 to 17 reported using AI chatbots by 2025, up from approximately 25 to 33 per cent in 2023. Across a seven-country European survey of 12-to-17-year-olds, 48 per cent had used ChatGPT in 2024, with almost half instructed to do so by their teachers. Australia, where ChatGPT usage as a share of internet users is among the highest in OECD countries, is firmly within this trend.
The purposes for which students use AI are overwhelmingly oriented toward convenience rather than deep learning. The most common uses are obtaining information, getting explanations of terms and concepts, and generating ideas. Nearly one-third of European students reported using AI to provide complete solutions to tasks. Only 20 per cent reported using AI for self-regulatory functions such as structuring personalised learning plans or tracking progress. Students are, in the OECD’s framing, using AI primarily for cognitive support and production support rather than as a genuine learning tool.
Teacher adoption is substantial and growing. The OECD’s TALIS 2024 survey found that 36 per cent of lower secondary teachers across OECD countries had used AI for work-related tasks in the previous 12 months, with enormous variation across countries: approximately 75 per cent in Singapore compared to fewer than 20 per cent in France and Japan. Teachers are primarily using AI for tasks such as learning about topics, summarising content, and supporting lesson planning. The report documents a 31 per cent reduction in time spent on lesson and resource planning among secondary science teachers in England who used AI tools, representing one of the most clearly quantified productivity gains in the evidence base.
3. What This Means for VET: Five Critical Implications
The OECD report is written for education systems globally, with a primary focus on school and higher education contexts. It does not specifically address vocational education and training. But its findings are, if anything, more consequential for VET than for any other sector of education. VET’s entire credentialing model is built on the principle that a qualification certifies competency: the demonstrated ability to perform to workplace standards, supported by valid, sufficient, authentic, and current evidence. If generative AI can produce evidence that appears competent without the learner actually being competent, the foundation of VET’s value proposition is at risk.
The following table maps the OECD’s key findings to five critical VET domains, showing the evidence, why it matters for VET specifically, and what RTOs should do in response.
|
VET Domain |
OECD Evidence |
Why It Matters for VET |
What RTOs Should Do |
|
Assessment integrity |
Students using general-purpose AI can produce competent-looking work without genuine understanding; the Türkiye experiment shows 17% worse performance once AI is removed |
Competency-based assessment that requires observed demonstration, workplace performance, and competency conversations is the system’s primary defence against AI-generated evidence that masks genuine skill gaps |
RTOs must design AI-resilient assessment that requires learners to demonstrate competence in conditions where AI cannot substitute for genuine capability; observation, practical tasks, and oral questioning are essential |
|
Trainer and assessor productivity |
English teachers saved 31% of planning time using AI; low-experience tutors saw 9 percentage point student pass rate gains with AI support |
VET trainers can use AI for lesson planning, resource development, formative feedback generation, and administrative tasks, freeing time for the high-value human interactions that drive competency development |
The productivity gains are real, but must be balanced against the risk of skill atrophy; trainers who outsource too much to AI may gradually lose the pedagogical expertise that makes human-led assessment irreplaceable |
|
Learner AI literacy |
94% of German higher education students use AI; 68% of US 15-to-17-year-olds use AI chatbots; VET learners arrive with established AI habits |
VET has a responsibility to develop learners’ ability to use AI effectively in their future workplaces, including understanding AI limitations, hallucination risks, and the difference between AI-assisted performance and genuine competence |
AI literacy should be embedded across qualifications as a foundation skill, not treated as a standalone unit; every qualification should address how AI is used and misused in the relevant industry context |
|
Teacher-AI teaming |
The OECD identifies three paradigms: replacement, complementarity, and augmentation. Augmentation through collaborative engagement produces the best outcomes |
VET’s competency-based model is naturally suited to the augmentation paradigm: AI handles routine tasks while trainers focus on complex judgment, workplace observation, and learner support that requires human expertise |
RTOs should frame AI as augmenting, not replacing, trainer capability; professional development should focus on how to use AI effectively while maintaining the assessment and pedagogical skills that define professional practice |
|
Cognitive offloading risk |
Neuroscience evidence shows 12% recall for ChatGPT-assisted writers vs 89% for those writing alone; researchers identify “metacognitive laziness” as a systematic risk |
VET learners who use AI to generate assessment responses without genuine cognitive engagement may pass tasks without developing the underpinning knowledge or skills the qualification represents |
Assessment design must require learners to demonstrate the cognitive process, not just the output; competency conversations, reflective journals, and observed practical tasks are essential evidence types that AI cannot substitute |
4. Assessment Design in an AI World: The VET Advantage
The OECD report’s most urgent message for education systems is that assessment must be redesigned to account for AI’s ability to produce competent outputs without competent understanding. Higher education institutions are grappling with this challenge across written assignments, research papers, and examination formats that were designed in a pre-AI world. VET, by contrast, is structurally better positioned than any other education sector to maintain assessment integrity in the age of generative AI, precisely because competency-based assessment was never primarily about written outputs in the first place.
The assessment methods that are most resistant to AI manipulation are the methods that VET has always relied on: direct observation of workplace performance, practical demonstration of skills under realistic conditions, competency conversations where assessors probe understanding through follow-up questioning, third-party workplace reports from supervisors who observe the learner performing real tasks, and portfolio evidence that is collected over time and verified for authenticity. These methods require the learner to demonstrate competence in person, in real time, under conditions that generative AI cannot replicate or substitute.
This is VET’s structural advantage. But it is only an advantage if RTOs actively design their assessment systems to leverage it. An RTO that relies heavily on written assessment tasks, knowledge-based questions with text-based responses, and take-home assignments is as vulnerable to AI-generated evidence as any university. An RTO that designs assessment around observation, practical demonstration, competency conversations, and workplace-based evidence is building on the strongest possible foundation for assessment integrity in an AI-disrupted environment.
The OECD report’s recommendation that education systems should shift from evaluating output to evaluating process aligns precisely with VET’s competency-based model. The Standards for RTOs 2025, with their emphasis on assessment system operation, pre-validation of tools, and systematic application of the Rules of Evidence, provide the regulatory framework within which this shift can be made. The challenge is ensuring that every RTO translates this framework into an assessment practice that is genuinely AI-resilient.
5. Teacher-AI Teaming: The Augmentation Model for VET Trainers
The OECD report proposes a conceptual framework for how teachers and AI can work together, identifying three paradigms: replacement, complementarity, and augmentation. Replacement, where AI takes over tasks previously performed by teachers, risks loss of professional skills and teacher-student interaction. Complementarity, where human judgement and machine efficiency are paired, is better. But the most effective approach, the report argues, is augmentation through collaborative engagement: an iterative process in which teachers and AI critique and refine each other’s outputs, preserving professional judgement while leveraging AI’s capacity for efficiency and scale.
For VET trainers and assessors, the augmentation model is a natural fit. AI can draft lesson plans, generate formative quiz questions, summarise learner progress data, prepare initial assessment mapping reviews, and automate administrative tasks such as scheduling and reporting. The trainer’s irreplaceable contribution is professional judgement: conducting competency conversations that probe genuine understanding, observing workplace performance and interpreting contextual cues, making holistic competency determinations that account for the complexity of real-world professional practice, and building the relationships with learners that motivate engagement and support development.
The OECD data on teacher productivity supports this model. The 31 per cent time saving on lesson planning documented among English teachers is a significant efficiency gain that, if applied in VET contexts, could free trainers to spend more time on the high-value assessment and learner support activities that drive quality outcomes. The finding that AI support produced a 9-percentage-point increase in student pass rates for low-experience tutors, with smaller gains for experienced tutors, suggests that AI can be particularly valuable for newer trainers who are still developing their pedagogical expertise. But the report also warns that overreliance on AI for core teaching tasks risks eroding the professional skills that make expert trainers effective. The balance between productivity gains and skill maintenance is a strategic challenge that every RTO must manage deliberately.
Conclusion: The Challenge for VET Is Not Whether to Use AI but How
The OECD Digital Education Outlook 2026 makes clear that generative AI is not a future possibility for education. It is a present reality. Students are using it. Teachers are using it. It is improving task performance, saving time, and creating genuine productivity gains. It is also, in specific and well-documented circumstances, undermining the cognitive processes that produce genuine learning, reducing metacognitive engagement, and creating a gap between what learners can produce with AI and what they actually know and can do without it.
For VET, this gap is the defining challenge. A qualification that certifies competency must certify genuine competency, not AI-assisted performance. An assessment system that satisfies the Rules of Evidence must produce evidence that is valid, sufficient, authentic, and current, not evidence that was generated or substantially shaped by a technology that the learner may not have access to in the workplace. A training system that serves employers must produce graduates who can actually perform, not graduates who performed well in training because they had AI support that will not follow them into the job.
The OECD report provides both the warning and the pathway. The warning is that general-purpose AI used as a shortcut degrades learning. The pathway is that purpose-built educational AI, designed with strong pedagogy, structured tutoring strategies, and teacher oversight, can enhance both performance and learning. VET’s competency-based model, with its emphasis on observed performance, practical demonstration, and human judgement in assessment, is structurally well-positioned to navigate this transition. But structure alone is not enough. RTOs must design AI-resilient assessments, invest in trainer AI literacy, adopt the augmentation model for teacher-AI teaming, and embed learner AI literacy across every qualification. The sector that has always assessed what people can do, not just what they can write, has the strongest foundation for maintaining educational integrity in the age of generative AI. The question is whether it will build on that foundation deliberately.
Summary: What RTOs Should Take from the OECD Report
-
-
- General-purpose AI improves performance but can undermine learning; the gap is documented across multiple experimental designs and educational contexts.
- Assessment integrity is the primary risk: learners can produce competent-looking evidence without genuine competency; RTOs must design AI-resilient assessments.
- VET’s competency-based model, with observation, practical demonstration, and competency conversations, is structurally stronger than written-output models.
- Trainer productivity gains from AI are real (31% time saving on planning documentation) but must be balanced against skill atrophy risk from overreliance.
- The augmentation model (teacher-AI teaming with human oversight preserved) is the recommended approach; replacement and full automation are cautioned against.
- Student AI adoption is mainstream: 90%+ in upper secondary/higher education; VET learners arrive with established AI habits that assessment must account for.
- AI literacy must be embedded as a foundation skill across all VET qualifications.
- Purpose-built educational AI tools outperform general-purpose chatbots for learning; RTOs should advocate for and invest in education-specific AI solutions.
- General-purpose AI improves performance but can undermine learning; the gap is documented across multiple experimental designs and educational contexts.
-
References and Further Reading
- OECD (2026). OECD Digital Education Outlook 2026: Exploring Effective Uses of Generative AI in Education. OECD Publishing, Paris. https://doi.org/10.1787/062a7394-en
- Bastani, H. et al. (2024). Generative AI Can Harm Learning. The Wharton School Research Paper. http://dx.doi.org/10.2139/ssrn.4895486
- Chen, W. et al. (2025). Help-Seeking Patterns in Human and AI Interactions During Essay Revision. Cited in OECD (2026).
- Fan, Y. et al. (2025). Metacognitive Processes in GenAI-Assisted Learning. Cited in OECD (2026).
- Kosmyna, N. et al. (2025). Neural Correlates of AI-Assisted Writing. Cited in OECD (2026).
- Liu, Y., Huang, J. and Wang, H. (2025). Who on Earth Is Using Generative AI? Global Trends and Shifts in 2025. World Bank Group Policy Research Working Paper.
- OECD (2025). TALIS 2024: Teaching and Learning International Survey. OECD Publishing, Paris.
