Predicting Oral Cancer Risk Using Machine Learning
Objectives: High-risk oral cancer screening is most effective. This study aims to develop a machine learning-based platform to predict the risk of oral cancer and oral potentially malignant disorders(OPMDs). Methods: Visual oral examination(VOE) was performed among 1467 participants of a community-based screening program by three calibrated dentists prospectively. Each individual’s status was defined as positive/negative for oral cancer/OPMDs and histologic confirmation of epithelial dysplasia(ED) and squamous cell carcinoma(SCC) was performed for positive status. Follow-up status of those that screened negative was monitored via state-linked electronic health records. Information on demography, habitual, lifestyle and familial risk factors was obtained, and expired carbon monoxide levels(in ppm) were assessed using a monitor. Input features(n=40) and histologic diagnoses were used to populate 12 machine learning algorithms with 80:20 train-test splitting applied to the data randomly during development. Recursive feature elimination with 10-fold cross-validation was used for feature selection while synthetic-minority-oversampling-technique with edited-nearest-neighbors was implemented for class imbalance correction. Internal validation was conducted with the unused 20% data with the comparison of outputs using McNemar’s test used for optimal model selection Performance metrics included recall, specificity, and F1-score. Results: Suspicious lesions and confirmed ED/SCC were identified in 4.50%(n=66) and 1.64%(n=24) of participants respectively. AdaBoost (F1:0.98±0.02, accuracy:0.99±0.03) and k-nearest-neighbors(kNN) (F1:0.99±0.01, accuracy:0.99±0.01) classifiers outperformed other algorithms. Upon internal validation, the AdaBoost model (accuracy-0.94, recall-0.75, specificity-0.95) was significantly better than the kNN model (accuracy-0.85, recall-0.75, specificity-0.85) (p<0.001) and comparable to the status classification provided by the trained examiners on-site for oral cancer and OPMDs following VOE (specificity and accuracy-0.91) (p=0.839). Models were deployed as web-based tools available at https://oral-cancer-risk-predictor-hku.herokuapp.com. Conclusions: Machine learning is successful in predicting oral cancer risk and may be applied to identify ‘at-risk populations’ in opportunistic and organized screening.
Division: Meeting:2022 IADR/APR General Session (Virtual) Location: Year: 2022 Final Presentation ID:1635 Abstract Category|Abstract Category(s):e-Oral Health Network
Authors
Adeoye, John
( University of Hong Kong
, Hong Kong
, Hong Kong
)
Alkandari, Abdulrahman
( University of Hong Kong
, Hong Kong
, Hong Kong
)
Zhu, Wang-yong
( University of Hong Kong
, Hong Kong
, Hong Kong
)
Zheng, Li-wu
( University of Hong Kong
, Hong Kong
, Hong Kong
)
Thomson, Peter
( James Cook University
, Cairns
, Queensland
, Australia
)
Choi, Siu-wai
( University of Hong Kong
, Hong Kong
, Hong Kong
)
Su, Yuxiong
( University of Hong Kong
, Hong Kong
, Hong Kong
)
Financial Interest Disclosure: NONE
SESSION INFORMATION
Interactive Talk Session
e-Oral Health Network I
Saturday,
06/25/2022
, 02:00PM - 03:30PM