IADR Abstract Archives

Assessing Language Models in Dental Education for Accuracy and Consistency

Objectives: The aim of this study was to evaluate and compare the performance of various language models in the context of dental education.
Methods: Four language models were evaluated: ChatGPT 3.5 (OpenAI), ChatGPT 4 (OpenAI), Claude 3 (Anthropic) and the open model Mistral AI. The procedure sheets of the CNEOC (Collège National des Enseignants en Odontologie Conservatrice) were used to design a first set of 428 multiple choice questions (MCQ). The questions were answered and classified by two experts depending on their category (emergencies, restorative procedures, disinfection, etc.). Accuracy was assessed by comparing the differences between the expert’s responses and the response given by each language model. Consistency was assessed using two metrics: robustness (the ability to provide identical responses to paraphrased questions). The context understanding was also evaluated based on the model's response to the appropriate category for each question. Finally, the lexicon of endodontic terms of the AAE (Association of American Endodontists) was used to create a secondary set of 539 questions and the accuracy was assessed as the ability to predict the correct term when given its definition.
Results: The more advanced models (Claude 3, ChatGPT4) demonstrated significantly higher accuracy in answering the MCQs compared to the simpler models (ChatGPT3.5, Mistral AI), but the robustness was low for all models. All performances for defining terms and for context understanding were high, except for Mistral AI.
Conclusions: While advanced language models demonstrate high accuracy and potential for dental education, their limited robustness necessitates caution in educational use. To improve performance, future studies should explore the integration of additional resources.

Division:
Meeting: 2024 Continental European and Scandinavian Divisions Meeting (Geneva, Switzerland)
Location: Geneva, Switzerland
Year: 2024
Final Presentation ID: 0129
Abstract Category|Abstract Category(s): e-Oral Health Network

Authors

Richert, Raphael ( Lyon Dental Hospital , Lyon , France )

Lafourcade, Claire ( UFR des Sciences Odontologiques, Université de Bordeaux , Bordeaux , France )

Ballester, Benoit ( Assistance Publique Des Hôpitaux de Marseille , Marseille , France )

Kérourédan, Olivia ( UFR des Sciences Odontologiques, Université de Bordeaux , Bordeaux , France )

Financial Interest Disclosure: None

SESSION INFORMATION

Oral Session
Oral 4 - Digital Dentistry Research Network
Thursday, 09/12/2024 , 10:30AM - 12:30PM

TABLES

Performance of the natural language models

Feature	Chat GPT3.5	Chat GPT4	Mistral AI	Claude 3
Accuracy	57%	71%	44%	72%
Robustness	71%	67%	50%	63%
Lexicon	76%	77%	56%	78%
Category	89%	89%	31%	98%