IADR Abstract Archives

Early Childhood Caries Prediction with Knowledge Discovery Data Mining Tools

Objective: (1) To predict early childhood caries (ECC) in preschool children of the UCSF fluoride varnish randomized controlled trial (FVRCT) receiving no FV, supplemented with contextual measures such as distribution of dentists and neighborhood characteristics; and (2) To assess whether contextual measures predict ECC above and beyond individual measures.

Methods: The FVRCT (Weintraub et al., 2006) enrolled 376 children ages 6-44 months randomizing them to 1 of 3 groups. Risk prediction models utilized 115 preschoolers receiving no FV with follow-up examinations at 1-2 years. ZIP code-level contextual measures were from US Census and California Center for Health Workforce Studies (dentist supply and geographical distribution). ECC was defined as defs>0 (Drury et al., 1999). Ensemble / hybrid classification and regression tree (CART) - logit models with 10-fold cross-validation (10CV) were used to develop models. The concordance (c)-index assessed model fit. Differential misclassification costs emphasized correctly classifying ECC (rather than equally weighting false positives and negatives). Premodeling knowledge discovery data mining (KDD) with relative cost <0.95 eliminated poor candidate variables. Customized automated KDD macros used 51 combinations of model settings (decision and terminal node size, splitting rule, priors) to develop the risk prediction model.

Results: Premodeling eliminated 24 of 76 candidate predictors. Final CART models included main and surrogate predictors such as age, mutans streptococci and lactobacilli bacterial levels, evening snacking frequency, parental dental health opinions, % age 0-5 population, % Asian population, subjective socioeconomic status ladder, and dentist-to-population ratio. CART-logit 10CV prediction characteristics were sensitivity=69.4% and specificity=81.8% with c=0.87 for 1:1 misclassification costs, but Sn=75.5% and Sp=59.1% with c=0.84 for 2:1 misclassification.

Conclusions: KDD models with contextual (neighborhood) factors such as provider supply can help better predict ECC. Differential misclassification costs can improve sensitivity (while decreasing specificity) so fewer children would have ECC undiagnosed.

Support: US DHHS NIH/NIDCR & NCMHD U54DE14251.


Division: IADR General Session
Meeting: 2006 IADR General Session (Brisbane, Australia)
Location: Brisbane, Australia
Year: 2006
Final Presentation ID: 515
Abstract Category|Abstract Category(s): Behavioral Sciences/Health Services Research
Authors
  • Gansky, Stuart A.  ( University of California -San Francisco, San Francisco, CA, USA )
  • Cheng, Nancy F.  ( University of California -San Francisco, San Francisco, CA, USA )
  • Shain, Sara G.  ( University of California - San Francisco, San Francisco, CA, USA )
  • Weintraub, Jane A.  ( University of California - San Francisco, San Francisco, CA, USA )
  • Ramos-gomez, Francisco  ( University of California - San Francisco, San Francisco, CA, USA )
  • SESSION INFORMATION
    Oral Session
    Keynote Address and Prevalence, Risk and Correlates of Oral Health Conditions
    06/29/2006