Accuracy of Artificial Intelligence Chatbots Versus Endodontic Residents in Regenerative Endodontic Treatment Knowledge
DOI:
https://doi.org/10.58600/eurjther3013Keywords:
artificial intelligence, ChatGPT, dental education, endodontics, regenerative endodontic treatmentAbstract
Objective: This cross-sectional study aimed to compare the accuracy of artificial intelligence (AI)-based chatbots and endodontic residents regarding their knowledge of regenerative endodontic treatment (RET).
Methods: A 30-item true/false questionnaire was completed by 128 residents (1st–3rd year). The same questions were asked to ChatGPT-4o (OpenAI) and Gemini Advanced (Google DeepMind) in independent chat sessions. Each question was asked twice daily for 10 consecutive days to obtain multiple AI responses. Accuracy rates and group differences were analyzed using the Kruskal–Wallis test with Bonferroni adjustment; agreement was assessed with Fleiss’ Kappa.
Results: AI models demonstrated higher accuracy rates than residents overall (ChatGPT-4o = 82.3%, Gemini Advanced = 83.7%, Residents = 76.3%). Gemini scored significantly higher than all resident groups, while ChatGPT-4o exceeded first-year residents. Residents showed weak consistency (κ = 0.190–0.319), whereas ChatGPT-4o demonstrated substantial agreement (κ = 0.689).
Conclusion: AI-based chatbots may provide useful information in RET and could serve as supplementary tools in dental education. However, their role in clinically related decision-making requires further validation.
References
[1] Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K, Cotofana S, Alfertshofer M (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol. 280(9):4271–4278. https://doi.org/10.1007/s00405-023-08051-4
[2] Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Arcas BAY, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V (2023) Large language models encode clinical knowledge. Nature. 620(7972):172–180. https://doi.org/10.1038/s41586-023-06291-2
[3] Park YJ, Pillai A, Deng J, Guo E, Gupta M, Naugler C. (2024) Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med Inform Decis Mak. 24(1):72. https://doi.org/10.1186/s12911-024-02459-6
[4] Ghanem YK, Rouhi AD, Al-Houssan A, Saleh Z, Moccia MC, Joshi H, Dumon KR, Hong Y, Spitz F, Joshi AR, Kwiatt M (2024) Dr Google to Dr ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis. Surg Endosc. 38(5):2887–2893. https://doi.org/10.1007/s00464-024-10739-5
[5] Mihalache A, Popovic MM, Muni RH. (2023) Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 141(6):589–597. https://doi.org/10.1001/jamaophthalmol.2023.1144
[6] Ali K, Barhom N, Tamimi F, Duggal M. (2024) ChatGPT—a double-edged sword for healthcare education? Implications for assessments of dental students. Eur J Dent Educ. 28(1):206–211. https://doi.org/10.1111/eje.12937
[7] Kerner SM. GPT-4o explained: everything you need to know [Internet]. TechTarget; 2024 [cited 2025 Sep 21]. Available from: https://www.techtarget.com/whatis/feature/GPT-4oexplained-Everything-you-need-to-know
[8] Carlà MM, Gambini G, Baldascino A, Boselli F, Giannuzzi F, Margollicci F, Rizzo S (2024) Large language models as assistance for glaucoma surgical cases: a ChatGPT vs Google Gemini comparison. Graefes Arch Clin Exp Ophthalmol. 262(9):2945–2959. https://doi.org/10.1007/s00417-024-06470-5
[9] Masalkhi M, Ong J, Waisberg E, Lee AG. (2024) Google DeepMind’s Gemini AI versus ChatGPT: a comparative analysis in ophthalmology. Eye (Lond). 38(8):1412-1417. https://doi.org/10.1038/s41433-024-02958-w
[10] Lai G, Dunlap C, Gluskin A, Nehme WB, Azim AA. (2023) Artificial intelligence in endodontics. J Calif Dent Assoc. 51(1). https://doi.org/10.1080/19424396.2023.2199933
[11] Ahmed ZH, Almuharib AM, Abdulkarim AA, Alhassoon AH, Alanazi AF, Alhaqbani MA, Alshalawi MS, Almuqayrin AK, Almahmoud MI (2023) Artificial intelligence and its application in endodontics: a review. J Contemp Dent Pract. 24(11):912–917. https://doi.org/10.5005/jp-journals-10024-3593
[12] Umer F, Khan M. (2021) A call to action: concerns related to artificial intelligence. Oral Surg Oral Med Oral Pathol Oral Radiol. 132(2):255. https://doi.org/10.1016/j. oooo.2021.04.056
[13] Ozden I, Gokyar M, Ozden ME, Ovecoglu S. (2024) Assessment of artificial intelligence applications in responding to dental trauma. Dent Traumatol. 40(6):722- 729. https://doi.org/10.1111/edt.12965
[14] Trope M. (2010) Treatment of the immature tooth with a non-vital pulp and apical periodontitis. Dent Clin North Am. 54(2):313–24. https://doi.org/10.1016/j.cden.2009.12.006
[15] Murray PE, Garcia-Godoy F, Hargreaves KM. (2007) Regenerative endodontics: a review of current status and a call for action. J Endod. 33(4):377–90. https://doi.org/10.1016/j.joen.2006.09.013
[16] Kontakiotis EG, Filippatos CG, Tzanetakis GN, Agrafioti A. (2015) Regenerative endodontic therapy: a data analysis of clinical protocols. J Endod. 41(2):146–54. https://doi.org/10.1016/j.joen.2014.08.003
[17] Jeeruphan T, Jantarat J, Yanpiset K, Suwannapan L, Khewsawai P, Hargreaves KM. (2012) Mahidol study 1: comparison of radiographic and survival outcomes of immature teeth treated with either regenerative endodontic or apexification methods: a retrospective study. J Endod. 38(10):1330–6. https://doi.org/10.1016/j.joen.2012.06.028
[18] Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. (2024) Unveiling the ChatGPT phenomenon: evaluating the consistency and accuracy of endodontic question answers. Int Endod J. 57(1):108–113. https://doi.org/10.1111/iej.13985
[19] Díaz-Flores García V, Freire Y, Tortosa M, Tejedor B, Estevez R, Suárez A. (2024) Google Gemini’s performance in endodontics: a study on answer precision and reliability. Appl Sci. 14(15):6390. https://doi.org/10.3390/app14156390
[20] Durmazpinar PM, Ekmekci E. (2025) Comparing diagnostic skills in endodontic cases: dental students versus ChatGPT4o. BMC Oral Health. 25:457. https://doi.org/10.1186/ s12903-025-05857-y
[21] Landis JR, Koch GG. (1977) The measurement of observer agreement for categorical data. Biometrics. 33(1):159–74
[22] Antaki F, Touma S, Milad D, El-Khoury J, Duval R. (2023) Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 3(4):100324. https://doi.org/10.1016/j.xops.2023.100324
[23] Cascella M, Montomoli J, Bellini V, Bignami E. (2023) Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 47(1):33. https://doi.org/10.1007/s10916-023-01925-4
[24] Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D (2023) How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 9:e45312. https://doi.org/10.2196/45312
[25] Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2:e0000198. https://doi.org/10.1371/journal.pdig.0000198
[26] Uehara O, Morikawa T, Harada F, Sugiyama N, Matsuki Y, Hiraki D, Sakurai H, Kado T, Yoshida K, Murata Y, Matsuoka H, Nagasawa T, Furuichi Y, Abiko Y, Miura H (2025) Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination. J Dent Educ. 89(4):459–466. https://doi.org/10.1002/jdd.13766
[27] Jin HK, Lee HE, Kim E. (2024) Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC Med Educ. 24(1):1013. https://doi.org/10.1186/s12909-024-0594
[28] Sismanoglu S, Capan BS. (2025) Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and Gemini Advanced achieve comparable results to humans? BMC Med Educ. 25(1):214. https://doi.org/10.1186/s12909-024-06389-9
[29] Rathod R, Dean S, Sproat C. (2025) The effectiveness of a novel artificial intelligence (AI) model in detecting oral and dental diseases. BDJ Open. 11:62. https://doi.org/10.1038/ s41405-025-00336-6
[30] Butnaru OM, Tatarciuc M, Luchian I, Tudorici T, Balcos C, Budala DG, Sirghe A, Virvescu DI, Haba D (2025) AI efficiency in dentistry: comparing artificial intelligence systems with human practitioners in assessing several periodontal parameters. Medicina (Kaunas). 61(4):572. https://doi.org/10.3390/medicina61040572
Downloads
Published
How to Cite
License
Copyright (c) 2026 Merve Gökyar, İdil Özden, Hesna Sazak Öveçoğlu

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The content of this journal is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.









