Understanding and Mitigating Strategies for Large Language Model (LLMs) Hallucinations in HR Chatbots

Rishab Bansal; Reena Chandra; Karan Lulla

doi:10.22399/ijcesen.2471

Authors

Rishab Bansal Research Scholar
Reena Chandra
Karan Lulla

DOI:

https://doi.org/10.22399/ijcesen.2471

Keywords:

Large Language Models, LLM, Hallucination, RAG, Retrieval Augmented Generation

Abstract

Large language models are widely used in enterprise workflows, particularly in human resources and internal communication using chatbots. Although they provide efficiency and shorter turnaround times, their tendency to hallucinate—generating plausible but factually incorrect information—is a significant concern. This paper provides a comprehensive review of the problem statement and the solutions studied. It starts with defining and evaluating the causes and types of hallucinations particular to HR applications. The research also explores industry use cases and implements mitigating measures such as retrieval-augmented generation (RAG), confidence rating, abstention mechanisms, prompt engineering, domain-specific fine-tuning, and post-generation fact-checking. Using accessible empirical data, the research assesses the limitations, scalability, and effectiveness of various methods. Important research gaps are found, including the absence of HR-specific hallucination benchmarks, difficulties in uncertainty estimates, and the necessity of ongoing domain knowledge integration. Aiming to create reliable and grounded AI systems for HR and corporate support, the article ends by suggesting practical directions for future research and development.

References

[1] Huang, Lei, et al. (2025): A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems 43(2);1-55.

[2] Berberette, Elijah, Jack Hutchins, and Amir Sadovnik. (2024) Redefining" Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation." arXiv preprint arXiv:2402.01769.

[3] Bang, Yejin, et al. (2025): HalluLens: LLM Hallucination Benchmark. arXiv preprint arXiv:2504.17550.

[4] Tonmoy, S. M., et al. (2024) A comprehensive survey of hallucination mitigation techniques in large language models." arXiv preprint arXiv:2401.01313 6 (2024).

[5] Xu, Ziwei, Sanjay Jain, and Mohan Kankanhalli. (2024) "Hallucination is inevitable: An innate limitation of large language models." arXiv preprint arXiv:2401.11817.

[6] Li, Johnny, et al. (2024) Banishing LLM hallucinations requires rethinking generalization." arXiv preprint arXiv:2406.17642.

[7] Huang, Lei, et al. (2025) A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions." ACM Transactions on Information Systems 43(2) 1-55.

[8] Banerjee, Sourav, Ayushi Agarwal, and Saloni Singla. (2024) Llms will always hallucinate, and we need to live with this. arXiv preprint arXiv:2409.05746.

[9] Afzal, Anum, et al. (2024) Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop. arXiv preprint arXiv:2407.05925.

[10] Yang, Fangkai, et al. (2023) Empower large language model to perform better on industrial domain-specific question answering. arXiv preprint arXiv:2305.11541 (2023).

[11] Simhi, Adi, et al. (2025): Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs. arXiv preprint arXiv:2502.12964 (2025).

[12] Farquhar, Sebastian, et al. (2024) Detecting hallucinations in large language models using semantic entropy. Nature 630.8017;625-630.

[13] Yadkori, Yasin Abbasi, et al. (2024) Mitigating llm hallucinations via conformal abstention. arXiv preprint arXiv:2405.01563.

[14] Yang, Fangkai, et al. (2023) Empower large language model to perform better on industrial domain-specific question answering. arXiv preprint arXiv:2305.11541.

[15] Yang, Borui, et al. (2025) Hallucination Detection in Large Language Models with Metamorphic Relations. arXiv preprint arXiv:2502.15844.

[16] Liu, Tianyu, et al. (2021) A token-level reference-free hallucination detection benchmark for free-form text generation. arXiv preprint arXiv:2104.08704 .

[17] Shuster, Kurt, et al. (2021) Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567.

[18] Yang, Chengrun, et al. (2023) Large language models as optimizers. arXiv preprint arXiv:2309.03409.

[19] Okafoeze, C. (2025). Analysing the potential solutions to LLM hallucinations in abstractive text summarisation.

[20] Chakraborty, N., Ornik, M., & Driggs-Campbell, K. (2025). Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art. ACM Computing Surveys.

[21] Kulkarni, N., & Tupsakhare, P. (2024). Strategies for Avoiding GPT Hallucinations. International.

[22] Lin, Z., Guan, S., Zhang, W., Zhang, H., Li, Y., & Zhang, H. (2024). Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models. Artificial Intelligence Review, 57(9), 243.

[23] Talwar, S. (2025). Dynamic Just-In-Time App Servers with Automated Access Management on AWS.

[24] Li, J., Cheng, X., Zhao, W. X., Nie, J. Y., & Wen, J. R. (2023). Halueval: A large-scale hallucination evaluation benchmark for large language models. arXiv preprint arXiv:2305.11747.

[25] Lin, S., Hilton, J., & Evans, O. (2021). Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958.

[26] He, P., Liu, X., Gao, J., & Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.

[27] Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).

[28] Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).

[29] Saha, B., Saha, U., & Malik, M. Z. (2024). Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance. IEEE Access.

Understanding and Mitigating Strategies for Large Language Model (LLMs) Hallucinations in HR Chatbots

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Information

Keywords

Announcements

Current Issue