Methods: The FVRCT (Weintraub et al., 2006) enrolled 376 children ages 6-44 months randomizing them to 1 of 3 groups. Risk prediction models utilized 115 preschoolers receiving no FV with follow-up examinations at 1-2 years. ZIP code-level contextual measures were from US Census and California Center for Health Workforce Studies (dentist supply and geographical distribution). ECC was defined as defs>0 (Drury et al., 1999). Ensemble / hybrid classification and regression tree (CART) - logit models with 10-fold cross-validation (10CV) were used to develop models. The concordance (c)-index assessed model fit. Differential misclassification costs emphasized correctly classifying ECC (rather than equally weighting false positives and negatives). Premodeling knowledge discovery data mining (KDD) with relative cost <0.95 eliminated poor candidate variables. Customized automated KDD macros used 51 combinations of model settings (decision and terminal node size, splitting rule, priors) to develop the risk prediction model.
Results: Premodeling eliminated 24 of 76 candidate predictors. Final CART models included main and surrogate predictors such as age, mutans streptococci and lactobacilli bacterial levels, evening snacking frequency, parental dental health opinions, % age 0-5 population, % Asian population, subjective socioeconomic status ladder, and dentist-to-population ratio. CART-logit 10CV prediction characteristics were sensitivity=69.4% and specificity=81.8% with c=0.87 for 1:1 misclassification costs, but Sn=75.5% and Sp=59.1% with c=0.84 for 2:1 misclassification.
Conclusions: KDD models with contextual (neighborhood) factors such as provider supply can help better predict ECC. Differential misclassification costs can improve sensitivity (while decreasing specificity) so fewer children would have ECC undiagnosed.
Support: US DHHS NIH/NIDCR & NCMHD U54DE14251.