A Machine Learning Model for Orthodontic Extraction/Nonextraction Decision
Objectives: The purpose of this study was to create a robust and generalizable machine learning (ML) algorithm with the ability to predict the extraction/non-extraction decision in a racially and ethnically diverse sample. Methods: Data was gathered from the records of 393 patients (200 non-extraction and 193 extraction) from a racially and ethnically diverse population. Four ML models (logistic regression [LR], random forest [RF], support vector machine [SVM], and neural network [NN]) were trained on a training set (70% of samples) and then tested on the remaining samples (30%). The accuracy and precision of the ML model predictions were calculated using the area under the curve (AUC) of the receiver operating characteristics (ROC) curve. The proportion of correct extraction/non-extraction decisions was also calculated. Results: The LR, SVM, and NN models performed best, with an AUC of the ROC of 91.0%, 92.5%, and 92.3%, respectively. The overall proportion of correct decisions was 82%, 76%, 83%, and 81% for the LR, RF, SVM, and NN models, respectively. The features found to be most helpful to the ML algorithms in making their decisions were maxillary crowding/spacing, L1-NB (mm), U1-NA (mm), PFH:AFH, and SN-MP(°), although many other features contributed significantly. Conclusions: ML models can predict the extraction decision in a racially and ethnically diverse patient population with a high degree of accuracy and precision. Crowding, sagittal, and vertical characteristics all featured prominently in the hierarchy of components most influential to the ML decision-making process.