dorotea
November 21, 2025

Some Musings on the Challenges of Incorporating LLMs in Contemporary Education

The integration of Large Language Models into education represents a fundamental disruption of the teacher-student knowledge hierarchy, the assessment paradigm, and the very definition of learning itself. We’re trying to educate students for a world where AI is their constant companion, yet our pedagogical frameworks were built for a world where knowledge scarcity and complexity were the main challenges. This creates five interconnected challenges that strike at the heart of what it means to teach and learn in the age of AI.

The inversion of teachers-student knowledge hierarchy

An uncomfortable truth confronts today’s classrooms: many students understand GenAI and LLMs better than their teachers. The recent Higher Education Policy Institute (HEPI) report documented a staggering 92% of university students have used AI tools for coursework; meanwhile, professors are reportedly much more resistant in adopting these tools beyond experimentation. This inverts the traditional knowledge hierarchy. Educational theory has long warned against models where teachers deposit knowledge into passive students, but LLMs have shattered this framework unexpectedly—students now possess technical fluency their teachers might lack. Meanwhile, not only the content teachers try to impart in their students, but in fact the entire pedagogical paradigm that many teachers are holding onto may be dismissed as outdated by their students. This knowledge asymmetry creates a crisis of instructional authority.

This inversion generates three compounding problems. First, teachers cannot guide students in responsible LLM use without understanding themselves the tools’ capabilities, limitations, and mechanisms. Second, students develop critical blind spots—they may know how to use LLMs but lack frameworks for determining what they need to master themselves, and indeed when they shouldn’t rely on AI or how AI outputs might be misleading. Third, professional development initiatives perpetually race to catch up with accelerating technological development. By the time institutional training addresses one generation of AI tools, students are already experimenting with the next version(s) of the technologies, ever widening rather than closing the expertise gap. The solution requires reconceptualizing teacher expertise itself—positioning educators as critical and experienced coaches for technology adaptation rather than technical experts or knowledge gatekeepers. At the same time, teaching and instructing can focus on the why and when of AI use rather than the how. Indeed the how may be becoming ever less relevant, because engaging with AI requires less and less technical expertise.

Cultivating critical thinking and evaluation

LLMs present outputs with remarkable confidence—even when spectacularly wrong. This creates automation bias, the tendency to trust machine-generated information over human judgment. In education, this bias proves particularly dangerous because learning requires active and critical evaluations of polished answers. Moreover, in the educational realm such biases may be more difficult to detect, because the conclusions of the vast majority of student’s written assignments seldom find their way into practical applications and implementations, where mistakes would more quickly become apparent and be more likely to have tangible and dire consequences. Perhaps these limited consequences of mishandling LLMs in educational contexts, coupled with outdated assessment policies, and a minimum likelihood of detection, is why LLM misuse among students is so difficult to detect let alone ameliorate in an effort to instill critical thinking skills`.

Adapting answers directly from LLMs, students regularly submit work containing fabricated citations, invented statistics, and confident assertions about nonexistent research. They may wrongly assume the AI’s authoritative tone reflects accuracy. Growing up trusting search engines and crowd-sourced knowledge platforms, students treat LLMs as natural extension of these smarter search engines. They don’t intuitively grasp that LLMs are prediction machines, not knowledge repositories, generating what sounds right rather than what is right.

Teaching critical evaluation of LLM outputs requires developing new literacy skills beyond traditional source evaluation frameworks. Students need conceptual understanding of how these systems work—not deep technical knowledge, but sufficient grasp to recognize them as sophisticated autocomplete systems, not thinking machines. They must learn to identify the confidence trap, where LLMs present uncertain information with the same assured tone as verified facts, never acknowledging doubt or limitation. They need to recognize hallucination patterns, understanding LLMs are particularly unreliable with recent events, numerical data, precise citations, and specialized domain knowledge. Most critically, students must grasp the echo chamber effect—that LLMs reflect their training data’s biases and gaps, reproducing conventional wisdom rather than generating breakthrough thinking. This represents a more complex challenge than earlier digital literacy initiatives around for instance website credibility, because unreliable websites appear unreliable while confident AI responses appear authoritative, masking their probabilistic nature. The most promising pedagogical approach reframes LLMs as “first draft generators” requiring verification rather than “answer engines” to be trusted, positioning the technology as thinking partners demanding supervision rather than outsourced intelligence to be accepted uncritically.

Preserving cognitive diversity

LLMs trained on vast amounts of text typically generate content that is competent, coherent, yet utterly conventional. This creates a subtle threat to education: the homogenization of thought. Breakthrough innovations rarely emerge from the middle-of-the-bell curve; research and entrepreneurial ideas that are revolutionary often seem impractical initially, emerging from cognitive diversity and unconventional thinking. LLMs, by design, cannot produce this type of out-of-the-box thinking and ideas because they optimize for identification of existing patterns. When people see polished examples early in their creative process, they unconsciously anchor to those examples. Therefore, the readily available information from LLMs flood students with prepared solutions, short-circuiting creativity, and therewith the production process that generates original insights.

This challenge extends beyond individual creativity to collective intelligence. In group work, LLMs can accelerate consensus but eliminate the creative friction that produces breakthrough ideas. When teams immediately turn to AI to resolve disagreements or generate initial directions, they lose this productive conflict before it yields novel solutions. In our experience students working with AI assistance produce remarkably similar outputs—all sensible, all citing established frameworks, all deeply uninspiring—while students working without AI generate messier, more varied ideas including genuinely novel approaches. Overall, the goal of considering AI’s role in education is not eliminating LLM use but ensuring students develop their own cognitive style and quality thresholds first, using AI to enhance rather than substitute for their own critical and creative thinking.

Understanding and mitigating biases in AI systems

LLMs are not neutral—they are shaped by training data, fine-tuning processes, and content policies that embed particular worldviews. This creates a subtle challenge: students may not realize they receive filtered, censored, or biased information. When students use AI to analyze ethical dilemmas in international contexts or explore politically sensitive topics, responses are often noticeably cautious, or sometimes reflect main-stream or censored opinions. The challenge deepens with systemic bias embedded in training data and model design, that we may be far less aware of. Research has documented systematic biases in LLM outputs regarding gender, race, geography, and cultural perspectives. When students ask about successful leaders, they receive predominantly Western, male examples; when they request case studies, they get disproportionately American contexts; when they explore historical events, they encounter narratives reflecting dominant rather than marginalized perspectives. These patterns reinforce existing knowledge gaps and insidious biases rather than correcting them, and students rarely recognize them. Critical scholarship has long examined how power operates through discourse, and LLMs represent a new form of epistemic power students must learn to recognize, interrogate, and indeed challenge. Teaching students to detect AI censorship and bias requires developing algorithmic skepticism through strategies such as comparing multiple AI tools to observe output variations, comparing AI-generated content with primary sources to identify what gets smoothed over. An effective pedagogical exercise involves having students pose the same question not only to different LLMs but also to human experts from different backgrounds. This can reveal variations in responses, and teach students important insights about the limitations of LLMs.

Assessing learning in the age of AI

Perhaps the most pressing challenge is assessment. Traditional assignments—essays, problem sets, reports—can now be completed by LLMs in seconds. The natural institutional response has been doubling down on AI detection tools, but this creates the “proctoring paradox”: we assess students in artificial conditions bearing no resemblance to how they will actually work. In real workplaces, professionals have constant access to AI tools, with the majority of knowledge workers now leveraging AI assistants regularly. We train students for careers where AI augmentation is the norm, yet assess them as if it comprises cheating. The deeper problem is that traditional assessments measure what students produce, not how they think; if an essay’s quality does not reveal whether the student learned anything, perhaps the essay never measured what mattered. Assessment theory argues that effective assessment should support learning through constructive alignment with learning objectives and learning content. Several innovative approaches are emerging. For example, process-focused assessment evaluates decision-making rather than final products; authentic tasks requiring knowledge only the student possesses from their own context and experience; collaborative evaluation asks group members to assess one another’s intellectual contributions; meta-cognitive reflection requires students to analyze their own learning process.

Final thoughts: how to move forward

The above five challenges are not isolated problems but interconnected symptoms of a deeper issue: education is fundamentally an endeavor premised on developing individual judgment, creativity, and wisdom, while LLMs are computational tools optimized for pattern recognition and reproduction. The question is whether, and if so how we can implement LLMs in education in ways that enhance rather than diminish human learning. If we fail to address this question, we risk producing graduates seemingly fluent in AI use but incapable of independent thought. Our goal is to help students become sophisticated users of powerful tools, even more importantly to cultivate their meta-skills—the ability to learn, adapt, and maintain critical judgment. By and large, in the emerging AI-empowered workplaces, these meta-skills are more essential than ever.

Shuai Yuan & Stefan T. Mol[1]

Amsterdam Business School, The University of Amsterdam

[1] Note: The conceptualization and ideas contained within this article are ours, yet AI assistants helped with writing the (very rough) first draft. We evaluated, modified, fine-tuned, and approved the final product.

Share the Post:

Ethical and Pedagogical Considerations: Human-AI Collaboration in the Classroom

As artificial intelligence (AI) continues to enter educational environments, a new form of collaboration is emerging—one that brings together teachers,

If AI Can Do the Work, What Should We Teach?

Artificial Intelligence is often described as a tool that improves the speed and efficiency of existing processes. This framing, however,

Some Musings on the Challenges of Incorporating LLMs in Contemporary Education

Related Posts

Ethical and Pedagogical Considerations: Human-AI Collaboration in the Classroom

If AI Can Do the Work, What Should We Teach?

Menu

Privacy policy

Newsletter