Exploring the regulatory, pedagogical, and ethical boundaries of AI in assessment, and why the Rules of Evidence demand human judgment that no algorithm can replicate
The Promise and the Illusion
Generative artificial intelligence can now produce written assessment responses that are grammatically polished, structurally coherent, and substantively plausible across a remarkable range of vocational subject areas. A well-crafted prompt can generate a workplace health and safety risk assessment, a client care plan, a business marketing strategy, or a project management report that, on the surface, reads as competent work. For some observers, this raises an enticing question: if AI can produce work that looks competent, could AI also assess whether a person is competent?
The answer, grounded in both regulatory requirements and pedagogical principles, is no. Not because AI lacks processing power or sophistication, but because competency-based assessment in Australia’s vocational education and training system is fundamentally about something that AI cannot observe, verify, or judge: whether a real human being can perform real work, in real workplace conditions, to the standard expected by the industry. The ability to generate text that resembles a competent response is not the same as the ability to demonstrate competence. And the ability to pattern-match an output against assessment criteria is not the same as the professional judgement required to determine whether a learner has genuinely met those criteria across sufficient occasions, in authentic contexts, with evidence that is current and truly their own.
I have watched assessment practice evolve through multiple cycles of reform, each reinforcing the same foundational truth: the quality of assessment depends on the quality of human judgement applied to it. The emergence of generative AI makes this truth more important, not less. This article examines why AI cannot replace competency-based assessment, what the regulatory and pedagogical boundaries are, and how RTOs should be designing assessment systems that are resilient to AI disruption while harnessing AI’s genuine strengths as a support tool.
1. The Regulator Has Drawn the Line
1.1 ASQA’s Position: Human-in-the-Loop Is Non-Negotiable
ASQA’s Artificial Intelligence Transparency Statement https://www.asqa.gov.au/about-us/reporting-and-accountability/artificial-intelligence-ai-transparency-statement provides the clearest available signal of where the regulator stands on AI and decision-making. The Statement confirms that ASQA does not currently employ AI capability in service delivery, including regulatory decision-making, and that any final decisions or actions are made by a human, described as “human-in-the-loop,” to maintain accountability and accuracy. This is not a tentative position. It is a deliberate operational principle adopted by the national VET regulator for its own most consequential work.
The precedent this sets for RTOs is direct and powerful. If Australia’s VET regulator has concluded that AI cannot be trusted to make regulatory decisions about providers, it follows logically and practically that RTOs should not be entrusting AI to make assessment decisions about learners. The stakes are comparable. A regulatory decision about an RTO’s compliance affects the provider’s registration and reputation. An assessment decision about a learner’s competence affects their qualification, their career, and in fields such as healthcare, aged care, construction, and community services, the safety and well-being of the people they will serve. In both cases, the decision-maker must be a qualified human who can exercise professional judgement, be held accountable for the outcome, and explain the reasoning behind the decision.
1.2 Academic Integrity as a Regulatory Risk Priority
ASQA’s 2024-25 Regulatory Risk Priorities explicitly identify academic cheating, including the use of artificial intelligence tools, contract cheating, and plagiarism, as a key threat to assessment integrity in VET. The ASQA IQ publications from March and April 2025 emphasise that academic integrity, contract cheating, and the misuse of AI are central risks that the regulator is actively monitoring and expects providers to manage.
Critically, ASQA’s approach is not to ban AI but to insist that RTOs focus on producing authentic, valid, and reliable evidence of competence. The regulator views AI primarily as a risk to the authenticity and sufficiency of assessment evidence if it is unmanaged, and expects providers to adapt their assessment design and verification practices accordingly. Sector guidance and specialist commentary reinforce that what ASQA wants to see is evidence that is valid, sufficient, authentic, and current, and that unmanaged use of AI in assessment threatens the very foundation of competency-based training and risks drawing unwanted attention from the regulator.
This regulatory stance makes two things clear. First, AI is not being prohibited. It is being positioned as a tool that must be governed. Second, the burden is on RTOs to demonstrate that their assessment systems produce genuine evidence of learner competence, regardless of whether AI was involved in the learning process, the evidence production, or both. The regulator will judge the quality of the evidence, not the technology used to create it.
|
Regulatory Principle ASQA is not seeking to ban AI. It is seeking authentic, valid, and reliable evidence of competence. The regulator itself applies human-in-the-loop to all its decisions. RTOs should apply the same principle to all assessment decisions. AI may assist. Humans must decide. |
2. The Rules of Evidence: Why Human Judgement Is Structurally Required
Competency-based assessment in Australia’s VET system is anchored in the Rules of Evidence and the Principles of Assessment. These are not guidelines or suggestions. They are regulatory requirements that every RTO must satisfy, and they create structural demands for human judgment that AI, in its current and foreseeable forms, cannot meet independently.
The Rules of Evidence require that all assessment evidence be valid, sufficient, authentic, and current. The Principles of Assessment require that assessment be fair, flexible, valid, and reliable. Together, these requirements demand that assessors interpret context, adapt to learner needs, balance multiple evidence sources, maintain consistency over time and across assessors, and exercise the kind of nuanced, relational, and situational judgement that distinguishes professional assessment from mechanical checking.
The following table maps each Rule of Evidence to the human judgement it demands and the fundamental limitation AI faces in meeting that demand.
|
Rule of Evidence |
What Does the Assessor Require |
Why AI Cannot Satisfy This Alone |
|
Valid |
Evidence must directly match the performance criteria and assessment requirements of the unit of competency, requiring the assessor to interpret alignment between what was demonstrated and what the unit demands |
AI can pattern-match text against criteria, but cannot observe physical performance, interpret contextual workplace application, or judge whether demonstrated behaviour genuinely meets the intent of the performance criteria |
|
Sufficient |
There must be enough evidence across multiple occasions and contexts to confirm competence, requiring the assessor to judge when the threshold has been reached |
AI cannot determine whether a portfolio of evidence is sufficient without understanding the breadth of workplace contexts, the depth of skill demonstrated, and whether the learner can transfer competence to unfamiliar situations |
|
Authentic |
The evidence must genuinely belong to the candidate and represent their own work, skill, and knowledge, requiring the assessor to verify authorship and ownership |
AI cannot independently verify that a person performed the work, wrote the response, or demonstrated the behaviour. AI-generated text is, by definition, not authentic evidence of learner competence |
|
Current |
Evidence must reflect the learner’s current skill level and be relevant to contemporary workplace practice, requiring the assessor to evaluate recency and relevance |
AI cannot assess whether evidence is current without access to the learner’s recent workplace context, industry developments, and the evolving standards of practice in the relevant occupation |
The common thread across all four Rules is that they require the assessor to do something that AI structurally cannot: independently verify the relationship between the evidence, the person, and the workplace context. AI can process text. It can compare patterns. It can flag anomalies. But it cannot observe a learner performing a task in a real workplace, ask probing follow-up questions based on the assessor’s professional instinct that something does not quite add up, verify through direct interaction that the learner genuinely understands and can apply what they have submitted, or make the professional judgement call that the totality of evidence is sufficient to warrant a determination of competence.
The Principles of Assessment reinforce this further. Fairness requires the assessor to consider individual learner circumstances, including cultural background, disability, language proficiency, and personal context, in a way that maintains standards while respecting the learner as a person. Flexibility requires the assessor to adapt methods and contexts without compromising the assessment outcome. Reliability requires consistency of judgement across assessors and occasions. None of these principles can be satisfied by an algorithm operating in isolation, because they each depend on the assessor’s capacity to perceive, interpret, and respond to the unique characteristics of each learner and each assessment situation.
3. Competence Is Performed, Not Generated: The Pedagogical Case
3.1 Competence Is Performance-Based, Contextual, and Embodied
Competence in VET is defined as the ability to apply knowledge and skills to complete work activities to the standard expected in the workplace. This definition is deliberately broad because it must capture the full range of what competent workplace performance involves: task skills, task management skills, contingency management skills, and job and role environment skills. A competent aged care worker does not merely know how to assist with personal care. They can do it safely, respectfully, and adaptively in the unpredictable conditions of a real care environment, with real human beings who have individual needs, preferences, and responses.
Many units of competency require demonstration of performance across a number of occasions and across a number of contexts, supplemented by evidence from third-party workplace reports, direct observation, and structured questioning. These evidence-gathering methods rely on human observation and interpretation of behaviour, not just the assessment of written artefacts that could, in theory, be uploaded into an AI system for analysis. An AI model cannot currently observe real-time performance, interpret body language and non-verbal communication, evaluate safety behaviours in unpredictable physical environments, or detect the subtle cues of professional judgement, teamwork, and situational awareness that distinguish competent performance from a rehearsed or superficial demonstration.
This is not a temporary technical limitation that will be resolved with the next generation of AI models. It is a structural mismatch between what AI does, which is process digital information, and what competency-based assessment requires, which is the observation and evaluation of human performance in physical, social, and professional contexts that are inherently variable, embodied, and relational.
3.2 Judgement, Trust, and Ethical Reasoning
Competency-based assessment frequently includes dimensions that go beyond technical skill to encompass professional conduct, ethical reasoning, communication quality, respect for cultural diversity, adherence to codes of practice, and the exercise of sound professional judgement under uncertainty. This is especially true in community services, healthcare, education, aged care, disability support, and leadership qualifications, where the consequences of incompetent or unethical practice fall directly on vulnerable people.
Evidence of these dimensions typically emerges over time through a combination of observation, questioning, workplace reports, and the assessor’s accumulating understanding of the learner’s character, consistency, and professional development. An assessor who has observed a learner across multiple clinical placements, for example, develops an informed view of whether that learner has genuinely internalised the values and practices of their profession, or is merely producing the right words in the right order. This is a form of professional knowledge that is built through relationships, experience, and direct interaction. It is, by its nature, human.
Research and guidance on AI in education consistently warn that AI cannot reliably model context-dependent ethics, power dynamics, or cultural nuance. These are precisely the domains where human judgement, lived experience, and relational trust are essential. The meaning of evidence, whether a learner has truly internalised safety culture, ethical practice, or professional responsibility, cannot be read off a text sample or quiz output alone. It requires an assessor who can see, question, probe, and ultimately make a professional determination based on the totality of what they have observed.
|
The Core Distinction AI can generate text that resembles a competent response. It cannot demonstrate competence. AI can compare a written output against assessment criteria. It cannot determine whether a human being can perform the work in the workplace to the standard the industry expects. This is not a limitation of current AI. It is a structural mismatch between what AI does and what competency-based assessment requires. |
4. Signals from Higher Education: AI Reshapes Assessment, but Does Not Replace the Assessor
While TEQSA regulates higher education rather than VET, the assessment and AI stance emerging from Australia’s university sector provides useful signals about where the broader educational consensus is heading. TEQSA’s resource on enacting assessment reform in a time of artificial intelligence frames AI as making some traditional take-home assessment tasks trivially solvable by machines and argues that assessment design must change structurally. The emphasis is on authentic tasks, supervised or integrated assessment, where generative AI use is irrelevant to the learning outcome, and assessment methods that test what AI cannot do. AI is treated as an environmental constraint that assessment must be designed around, not as an assessor or replacement for human judgment.
The University of Sydney’s AI assessment policy, implemented during 2024-25, illustrates how this plays out in practice. The University adopted a “two-lane” model: AI-capable tasks where students are permitted and even encouraged to use AI tools, and AI-restricted tasks where AI use is prohibited, and assessment is conducted under supervised conditions. Even in the AI-capable lane, students are required to demonstrate critical thinking and original analysis, and must acknowledge and explain how AI tools were used. The policy explicitly excludes supervised examinations and in-semester tests from AI use, recognising that some forms of assessment must remain under conditions that ensure the evidence produced is authentically the student’s own.
TEQSA’s broader commentary highlights a transition from a primarily educative stance on AI toward a more regulatory-led framework by 2026, driven by escalating concerns about assessment integrity and equity. The direction of travel across both VET and higher education is convergent: AI is being embraced as a tool for learning and productivity, but regulators, institutions, and assessment experts are drawing an increasingly firm line on using AI to replace human assessment judgment. The consensus is “AI assist, human decide.”
For RTOs, the higher education experience offers both a warning and a model. The warning is that assessment systems designed before generative AI existed are fundamentally vulnerable to integrity breaches if not redesigned. The model is that the solution lies not in banning AI or in deploying AI detectors, which are unreliable and produce false positives, but in redesigning assessment so that the evidence gathered inherently requires human performance, human interaction, and human judgement to produce and to evaluate.
5. Drawing the Line: Where AI Can Assist and Where It Must Stop
Acknowledging that AI cannot replace competency-based assessment does not mean that AI has no role in the assessment ecosystem. The opposite is true. AI can be a powerful tool for improving the efficiency, consistency, and quality of assessment processes, provided its use is governed, bounded, and transparent. The challenge for RTOs is to draw clear, defensible, and operationally practical boundaries between what AI can do and what must remain exclusively human.
Based on current regulatory expectations, sector guidance, and pedagogical principles, the following boundaries represent a responsible framework for RTOs.
|
AI-Assist Zones (Permitted with Governance) |
Human-Only Zones (AI Prohibited or Strictly Limited) |
|
Drafting non-graded practice tasks, learning materials, and formative feedback for learners to build skills before the summative assessment |
Making final determinations of competence, issuing results statements, or recording assessment outcomes in student management systems |
|
Analysing data sets for patterns in learner engagement, item performance, or completion trends, with humans interpreting and acting on the insights |
Substituting for direct observation of workplace performance, practical demonstrations, or safety-critical skill assessments |
|
Supporting administrative checks such as cross-referencing AVETMISS data fields, formatting compliance documents, or generating draft reports under human supervision |
Replacing structured competency conversations, oral questioning, or professional dialogue between assessor and learner |
|
Assisting with assessment tool development by generating initial drafts of scenarios, case studies, or question banks for assessors to review, adapt, and validate |
Deciding RPL outcomes without an assessor interrogating evidence for authenticity, depth of experience, and workplace context |
|
Providing AI-powered study support tools that help learners practise, self-assess, and identify knowledge gaps before formal assessment |
Generating summative assessment evidence on behalf of the learner, or evaluating summative evidence without qualified human review |
These boundaries are not arbitrary. They align directly with ASQA’s own operational principle of keeping all regulatory decisions human-made, with the Rules of Evidence’s demand for authentic, sufficient, valid, and current evidence, and with the pedagogical reality that competence is demonstrated through human performance, not generated by machine output. RTOs that codify these boundaries in their training and assessment strategies, assessment tool conditions, and AI policies will be well positioned both for regulatory compliance and for the kind of assessment quality that produces genuinely competent graduates.
6. Designing AI-Resilient Competency Assessment
Rather than attempting to detect AI-generated content after the fact, a strategy that is unreliable and creates adversarial dynamics between RTOs and learners, the more effective and sustainable approach is to design assessment systems that are inherently resilient to AI disruption. This means building assessments around evidence types and methods that require human performance and human interaction to produce, making AI-generated content largely irrelevant to the assessment outcome.
6.1 Triangulation of Evidence
The single most powerful strategy for AI-resilient assessment is evidence triangulation: requiring competence to be demonstrated through multiple, complementary evidence methods that, taken together, make it extremely difficult for AI to substitute for genuine learner performance. For example, a written assessment response can be complemented by an observed workplace demonstration of the same skills, followed by a structured oral questioning session in which the assessor probes the learner’s understanding, reasoning, and ability to apply their knowledge to unfamiliar scenarios. Each evidence source validates and enriches the others, creating a multi-dimensional picture of competence that no single AI-generated artefact can replicate.
6.2 Observation and Workplace-Based Assessment
Direct observation of learner performance, whether in the workplace, in a simulated environment, or during practical demonstrations, remains the gold standard for competency-based assessment precisely because it requires the learner to physically perform the work and the assessor to witness and evaluate that performance in real time. Observation is inherently AI-resistant. No generative AI model can perform a clinical procedure, operate heavy machinery, manage a difficult client interaction, or demonstrate safe food handling on behalf of a learner. RTOs should be increasing, not decreasing, the proportion of assessment that relies on direct observation, particularly in qualifications where safety, client welfare, or regulatory compliance depend on the learner’s demonstrated physical and interpersonal competence.
6.3 Competency Conversations and Oral Assessment
Structured competency conversations, sometimes called professional conversations or oral assessments, are among the most effective tools available for verifying the authenticity and depth of a learner’s competence. In a competency conversation, the assessor engages the learner in a dialogue about their work, their reasoning, their decision-making processes, and their understanding of the underpinning knowledge and skills. The conversation is responsive and adaptive: the assessor follows the learner’s responses, probes areas of uncertainty, asks for examples from the learner’s own experience, and evaluates not just what the learner says but how they say it, whether they can think on their feet, and whether their responses are consistent with the other evidence in their portfolio.
A learner who has genuinely developed competence will navigate a competency conversation with confidence, specificity, and the ability to connect their knowledge to their practice. A learner who has relied on AI to generate their written evidence will struggle to explain, elaborate, or defend that evidence in a live, interactive dialogue. Competency conversations are therefore both a verification tool and a pedagogical tool: they encourage learners to genuinely engage with their learning, knowing that they will need to articulate and defend their understanding in their own words.
6.4 Scenario-Based and Integrated Assessment
Assessment tasks that present learners with complex, realistic workplace scenarios requiring them to apply multiple skills and knowledge areas simultaneously are inherently more resistant to AI substitution than isolated knowledge questions or single-task activities. When a learner must analyse a scenario, identify relevant issues, propose and justify a course of action, demonstrate the practical implementation of their plan, and reflect on the outcome, the assessment captures a breadth and depth of competence that a single AI-generated response cannot replicate. Integrated assessment that clusters multiple units around a realistic workplace project or case study further strengthens this approach, as it requires the learner to demonstrate the connections between different areas of competence in a way that mirrors actual workplace performance.
7. A Future-Ready Stance: Embrace AI, Protect the Assessor
The argument of this article is not that AI should be feared, avoided, or excluded from the VET assessment ecosystem. It is that AI must be placed in its proper role: as a powerful tool for efficiency, support, and learning enhancement, operating within a governance framework that preserves the irreplaceable role of the human assessor in making competency determinations.
RTOs should be embracing AI where it genuinely adds value. AI can help assessors manage administrative workload, generate formative practice activities, analyse learner engagement patterns, and support the development of high-quality assessment resources. AI can help learners build knowledge and practise skills more effectively before they enter the summative assessment environment. These are real benefits that should be pursued.
At the same time, RTOs must codify in policy and in their training and assessment strategies that machines can never replace the assessor’s professional judgement, the trust relationship between assessor and learner, or the ethical responsibility for determining whether a person is competent to enter a profession, an industry, or a role where their performance will affect the safety, wellbeing, and rights of others. This is not a conservative or defensive position. It is a principled one, grounded in what competency-based assessment is designed to achieve and what the evidence, both regulatory and pedagogical, tells us about the limits of what AI can do.
The Standards for RTOs 2025, ASQA’s regulatory risk priorities, ASQA’s own AI Transparency Statement, the evolving stance of TEQSA and leading universities, and the fundamental architecture of the Rules of Evidence and Principles of Assessment all point in the same direction. AI will transform how we learn, how we prepare for assessment, how we administer assessment systems, and how we analyse outcomes. It will not, and should not, replace the human being who looks a learner in the eye, evaluates the totality of their evidence, and makes the professional call: competent, or not yet competent.
|
Summary: What RTOs Should Do Now 1. Review all assessment tools to ensure AI conditions are explicitly stated for each task. 2. Increase the proportion of assessments that rely on observation, competency conversations, and workplace-based evidence. 3. Design assessment around evidence triangulation so that no single artefact, AI-generated or otherwise, can determine competence. 4. Update validation checklists to include AI-related integrity considerations. 5. Train assessors in AI awareness, integrity verification strategies, and competency conversation techniques. 6. Codify in TAS and policy that final competency determinations are exclusively human decisions. 7. Communicate clearly to learners what AI use is permitted, restricted, or prohibited in assessment. 8. Treat AI as a tool for assessment quality and efficiency, not as a substitute for the assessor. |
References and Further Reading
ASQA (2025). Standards for RTOs 2025. https://www.asqa.gov.au/rtos/2025-standards-rtos
ASQA (2025). Artificial Intelligence (AI) Transparency Statement. https://www.asqa.gov.au/about-us/reporting-and-accountability/artificial-intelligence-ai-transparency-statement
ASQA (2025). ASQA IQ March 2025: Academic Integrity. https://www.asqa.gov.au/news-events/news/asqa-iq-march-2025
ASQA (2025). ASQA IQ April 2025. https://www.asqa.gov.au/news-events/news/asqa-iq-april-2025
ASQA (2025). ASQA IQ October 2025: Regulatory Risk Priorities. https://www.asqa.gov.au/news-events/news/asqa-iq-october-2025
AICP (2025). AI in RTOs: Safeguarding Compliance and Assessment Integrity. https://aicp.edu.au
CAQA Resources (2025). Maintaining Integrity in the AI Era: New Imperatives for Vocational Assessment. https://caqaresources.com.au
TEQSA (2025). Enacting Assessment Reform in a Time of Artificial Intelligence. https://www.teqsa.gov.au
Total VET Training Resources (2025). ASQA Rules of Evidence. https://totalvettrainingresources.com.au
University of Sydney (2024). AI Assessment Policy. https://www.sydney.edu.au
