According to Jandrić (2019) in “The Post-digital Challenge of Critical Media Literacy,” Artificial Intelligence (AI) systems can exhibit unpredictable and biased behaviors that their creators and researchers are unable to directly control. Jandrić argues that AI systems do not merely reproduce or embed existing biases found in their training data but can also recombine these biases in novel ways to generate new, unforeseen biases. More recently Dictionary.com made “hallucinate” the Word of the Year. “Hallucinate, (of artificial intelligence) to produce false information contrary to the intent of the user and present it as if true and factual” (p. 1. 2023). This highlights the challenges of developing AI systems that are free from problematic biases.
As the emergent and opaque nature of AI can lead to the introduction of new biases that are difficult for developers to predict or mitigate, Jandrić emphasizes the need for critical media literacy to help users understand the limitations and potential biases of AI systems, rather than blindly trusting their outputs. A study by Athaluri, et al. (2023) of the frequency of hallucinations in 50 research proposals crafted by ChatGPT found that of the 178 references cited by ChatGPT, 28 did not exist. In a similar study of medical articles generated by ChatGPT, of the 117 references, 46% were fabricated, and only 7% were authentic and accurate (Bhattacharyya, et. al., 2023)
The potential for hallucinations in AI-powered applications can have detrimental consequences for student learning, assessment, and overall educational outcomes. Inaccurate or fabricated information generated by these systems can reinforce misconceptions, spread misinformation, and undermine the development of critical thinking skills among community college students (Fortune, 2024). As such, it is crucial for educators to understand the root causes of this problem and adopt effective strategies to mitigate the risks of AI hallucinations in their classrooms.
Hallucinations refer to the generation of content by AI systems that is nonsensical, factually incorrect, or unfaithful to the provided source material. This issue poses a serious challenge as these models are incorporated into decision-making processes and learning tools used in community college classrooms. According to the CNET article, a hallucination can include generative AI producing answers that are incorrect but stated with confidence as if correct. An example would be an AI chatbot responding that “Leonardo da Vinci painted the Mona Lisa in 1815,” which is 300 years after it was painted (CNET, 2024). The reasons for these hallucinations in AI systems are not entirely known, but they are likely related to biases in the training data, limitations of the language models, and a lack of contextual understanding (Testilo, 2024, and Google Cloud, 2024). To build more trustworthy generative AI applications for educational use, businesses and developers should favor their own corporate data and vetted third-party data sets, implement transparency and explainability measures, and incorporate human oversight and feedback loops (Fortune, 2024).
Despite these efforts, mistakes will still occur, and educators must approach AI tools with a critical eye, demanding transparency, seeking explainability, and being aware of potential biases (Fortune, 2024). By understanding the risks of hallucinations and adopting effective mitigation strategies, educators can ensure the reliable and correct use of AI-powered applications in their classrooms, supporting student learning and the development of critical thinking skills.
Understanding the origins of hallucinations in large language models
In general, there are three types of hallucinations that can occur when interacting with large language models (LLMs) like ChatGPT, Bing, and Google’s Bard.
- Contradictions. LLMs sometimes provide responses that contradict each other or themselves, unable to reconcile inconsistencies in their training data.
- False facts. LLMs can fabricate information, citing made-up sources and statistics, and fail fact-checking exercises. This raises concerns about the potential impact on issues like election disinformation.
- Lack of nuance and context. While LLMs can generate seemingly coherent responses, they can lack the necessary domain knowledge and contextual understanding to provide accurate and nuanced information, as demonstrated by the examples of the LLM struggling with simple math word problems about telling time.
These hallucinations underscore significant limitations in current LLM technology, especially in educational settings where factual accuracy is paramount (EdTech Evolved, 2023).
Hallucinations in AI systems can arise from a variety of factors, many of which are particularly relevant in the context of community college education. One of the primary contributors is the quality and breadth of the training data used to develop these models (Lacy, 2023). If the data is insufficient, biased, or lacks diversity, the AI may struggle to grasp the nuances and complexities of language, leading to the generation of inaccurate or fabricated information. This is especially problematic in community college settings, where the content and language used can be highly specialized and context-dependent, reflecting the diverse backgrounds and needs of the student population.
Another key factor is the inherent limitations of the underlying machine learning algorithms. LLMs are trained to predict the most likely sequence of words based on statistical patterns in the training data, rather than possessing true reasoning or comprehension capabilities (Lacy, 2023). This can result in the model generating plausible sounding but factually incorrect responses, particularly when faced with unfamiliar or ambiguous inputs, such as those that may arise in the community college classroom.
Addressing the challenge of hallucinations in AI systems requires a multifaceted approach, drawing from various techniques and strategies. As a community college educator, you can use the following methods to reduce the risk of hallucinations in your classroom.
- Prompt engineering and constrained outputs
- Crafting prompts that provide clear instructions and limit the possible outputs can guide the AI system to generate more reliable and relevant responses (Awa-abuon, 2023; G, 2023).
- Example: Avoid a broad prompt like, “Summarize the causes of the Civil War.” Instead, you could provide a more specific prompt such as, “Write a four-paragraph summary explaining the key political, economic, and social factors that led to the outbreak of the American Civil War from 1861 to 1865.”
- Structuring prompts with specific requirements, such as word count, formatting, or inclusion of topics can help the AI system understand the level of detail and accuracy expected in the response.
- Using prompts that require the AI to synthesize information from multiple sources, rather than relying on a single input, can encourage the model to generate more well-rounded and reliable responses.
- Break down complex topics into smaller, more manageable prompts. This can help the LLM focus on specific details and reduce the risk of factual errors.
- Example: Instead of asking the LLM to “Summarize the American Civil War,” ask a series of focused prompts like, “Explain the key political differences between the Northern and Southern states leading up to the Civil War” and “Describe the economic factors that contributed to the outbreak of the Civil War.”
- Frame prompts as questions that require the LLM to analyze and synthesize information. This can encourage a more thoughtful response compared to simple summaries.
- Example: Instead of asking for a “List of causes of the French Revolution,” reword it by asking it to “Analyze the social and economic factors that contributed to the French Revolution.”
- Data augmentation and model regularization
- Incorporating a diverse range of high-quality educational resources, such as community college textbooks, academic journal articles, and industry-specific case studies into the AI’s training data can improve its performance and reduce the risk of hallucinations (Lacy, 2023; G, 2023).
- If you’re using an AI-powered tool to generate explanations for a community college biology course, you could work with your department to create a dataset that includes widely used textbooks, online biology tutorials, and scientific papers covering topics like cellular biology, genetics, and evolutionary theory.
- Applying techniques like data augmentation where you generate added training data through techniques like paraphrasing or backtranslation, can help the AI model generalize and handle a wider range of inputs (Lacy, 2023).
- In addition to using community college-specific datasets, consider collaborating with colleagues from other institutions to create a broader and more diverse training data pool. This capability aids the LLM in generalizing and managing a broader spectrum of subjects and prompts.
- Example: Community college instructors from various science departments could collaborate to create a shared dataset of science textbooks, lab manuals, and research papers. This would benefit all science courses using AI-powered learning tools.
- Human-in-the-loop validation
- Involving subject matter experts, such as experienced community college instructors in your department, in the review and validation of AI-generated content can help identify and address instances of hallucinations or other undesirable behaviors (G, 2023; Conway, 2023).
- Relying solely on AI-generated content without human review and validation can lead to the propagation of inaccurate information. Implementing human oversight and feedback loops can help catch and correct hallucinations before they are used in the classroom.
- If you’re using an AI system to generate exam questions for a literature course, you could have seasoned English instructors review the questions to ensure they accurately reflect the course material, are appropriately challenging for the student level, and align with the learning objectives.
- Establishing a regular review process where subject matter experts provide feedback on the AI’s performance can help continuously improve the reliability and trustworthiness of the system.
- Develop a system for students to provide feedback on the AI’s outputs. This feedback can be used to identify areas where the LLM struggles and inform future improvements.
- Example: Integrate a brief survey after students use an AI-powered tool for practice problems. The survey can ask students to rate the clarity and accuracy of the AI’s explanations and identify any factual inconsistencies.
- Benchmarking and monitoring
- Developing a set of standardized assessments or quizzes that cover a range of topics and learning objectives can help you measure the AI system’s accuracy and reliability in your college classroom (G, 2023).
- For a community college physics course you could create a bank of questions that evaluate the AI’s ability to provide accurate explanations of key concepts, such as Newton’s laws of motion, energy conservation, and electromagnetism.
- Regularly assessing AI’s performance using these standardized assessments can help you identify areas where the system struggles and make informed decisions about its continued use in your classroom.
- Example: Create a short quiz (3-5 questions) testing the LLM’s understanding of a specific event, like the American Civil War. Administer the quiz after the LLM generates a summary and analyzes its performance to identify factual errors or knowledge gaps.
By implementing a combination of these strategies, community college educators can help ensure the AI-powered tools they use generate accurate, reliable, and trustworthy information that supports student learning and critical thinking.
Specific applications
The key is to provide the AI model with clear instructions and constraints to guide it towards generating reliable, hallucination-free responses. Incorporating these prompting techniques can help mitigate the risks of AI hallucinations in educational and other high-stakes applications. Reddit recommended six prompting techniques to reduce hallucinations by up to 20 percent (p.1, 2023).
- Temperature. Directing the AI program to adjust the “temperature” to a lower value can reduce the likelihood of it generating overly creative or speculative responses that may be inaccurate.
Example: When asking an AI to respond to, “When did Leonardo da Vinci paint the Mona Lisa?”, using a prompt like, “Respond to the question using a temperature setting of 0.5 to reduce the likelihood of hallucinations” can encourage the model to provide a more conservative, fact-based answer instead of confidently stating an incorrect date.
- Role assignment. Explicitly assigning the AI to a specific role or persona, such as an expert on a topic, can help constrain the model’s responses to be more aligned with that expertise.
Example: Use a narrative approach like, “You are an expert historian on the Renaissance period. Using your extensive knowledge, provide an accurate response to the following question about Leonardo da Vinci.”
- Specificity. Providing the AI with detailed and specific prompts, rather than open-ended ones, can limit the range of possible outputs and reduce the chances of hallucinations.
Example: Direct the inquiry with, “Provide a detailed, step-by-step explanation of how to execute the Python code you generate, ensuring it is fully functional.”
- Content grounding. Instructing the AI to base its responses only on information that can be directly attributed to reliable sources, like Wikipedia or academic publications, helps ensure the content is factual.
Example: Incorporate a narrative such as, “Respond to this query using only information that can be directly attributed to reliable sources like Wikipedia or academic publications.”
- Instructional dos and don’ts. Giving the AI clear guidelines on what types of responses are acceptable (e.g., no factually incorrect information) and what are not, can steer it away from generating hallucinations.
Example: Direct the notation as, “Do not generate any content that is factually incorrect or unsubstantiated. Strictly adhere to providing truthful and well-researched information.”
- Prompt chaining. Breaking down a task into multiple steps, where the AI must first research authoritative sources before synthesizing a final response, can improve the reliability of the output.
Example: Provide an order of response such as, “First, research the most authoritative sources on this topic. Then, synthesize that information into a concise and accurate response to the question.”
Conclusion
The challenge of hallucinations in AI systems is a complex and multifaceted issue that requires a comprehensive approach, particularly in the context of community college education. As an educator, you play a crucial role in mitigating the risks of hallucinations and ensuring that the AI-powered tools used in your classroom are reliable and trustworthy. By leveraging a range of strategies, from prompt engineering to ethical deployment, you can work to build more reliable and trustworthy language models that benefit your community college students and the broader educational community.
Robert Blanck, MA, has 35 years’ experience in teaching and administration within the fields of education and business. Blanck earned the Lean Six Sigma Black Belt designation for industrial quality control.
David E. Balch, PhD, is a professor at Rio Hondo College and has published articles in the areas of ethics, humor, distance education, and AI.
Note: this article was the result of collaboration between the human authors and the AI programs of ChatGPT, Copilot, and Meta AI.
References
AI programs used in preparation of this article were ChatGPT (https://chat.openai.com/), Copilot (https://copilot.microsoft.com/), and Meta AI (https://www.meta.ai/)
Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References. Cureus, 15(4), e37432. https://doi.org/10.7759/cureus.37432
Awa-abuon, J. (2023). How to Reduce AI Hallucination With These 6 Prompting Techniques. https://www.makeuseof.com/how-to-reduce-ai-hallucination/
Bhattacharyya, M., Miller, V. M., Bhattacharyya, D., Miller, L. E., & Miller, V. (2023). High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus, 15(5).
CNET. (2024, April 11). ChatGPT Glossary: 42 AI Terms That Everyone Should Know. https://www.cnet.com/tech/computing/chatgpt-glossary-42-ai-terms-that-everyone-should-know/
Conway, A. (2023). AI can hallucinate, and here’s one way I experienced it. https://www.xda-developers.com/ai-can-hallucinate-and-heres-one-way-i-experienced-it/
Dictionary.com. (2023, December 12). Dictionary.com’s 2023 Word of the year is… Dictionary.com. https://content.dictionary.com/word-of-the-year-2023/
DeGeurin, M. (2023). Ready or not, AI is in our schools. https://www.popsci.com/technology/ai-in-schools/
EdTech Evolved. (2023). AI in Education: The Problem with Hallucinations. https://www.esparklearning.com/blog/ai-in-education-the-problem-with-hallucinations/
Fortune. (2024, April 11). Hallucinations are the bane of AI-driven insights. Here’s what search can teach us about trustworthy responses, according to Snowflake’s.
G, V. (2023). 4 Ways to Prevent AI Hallucinations. https://www.makeuseof.com/prevent-ai-hallucination/
Google Cloud. (2024, March 30). What are AI hallucinations? https://cloud.google.com/discover/what-are-ai-hallucinations
Jandrić, P. (2019). The post-digital challenge of critical media literacy. Postdigital Science and Education, 1(2), 202-205. https://petarjandric.com/images/pdf/Jandric_JCML.pdf
Lacy, L. (2023). Hallucinations: Why AI Makes Stuff Up and What’s Being Done About It.
Ramaswamy, S. (2024, April 11). Hallucinations are the bane of AI-driven insights. here’s what search can teach us about trustworthy responses, according to Snowflake’s CEO. Fortune. https://fortune.com/2024/04/11/hallucinations-ai-search-trust-responses-snowflake-ceo-sridhar-ramaswamy/
Reddit. (2023, August 10). A simple prompting technique to reduce hallucinations by up to 20%. r/ChatGPTPromptGenius. https://www.reddit.com/r/ChatGPTPromptGenius/comments/15nhlo1/a_simple_prompting_technique_to_reduce
Testlio. (2024, February 23). Preventing Hallucinations in AI Apps with Human-in-the-Loop Testing. https://testlio.com/blog/hitl-ai-hallucinations/