Pilot study using machine learning to improve estimation of physical abuse prevalence.

Document Type


Publication Date



DOI: 10.1016/j.chiabu.2024.106681


BACKGROUND: International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) codes have been shown to underestimate physical abuse prevalence. Machine learning models are capable of efficiently processing a wide variety of data and may provide better estimates of abuse.

OBJECTIVE: To achieve proof of concept applying machine learning to identify codes associated with abuse.

PARTICIPANTS AND SETTING: Childrenyears, presenting to the emergency department with an injury or abuse-specific ICD-10-CM code and evaluated by the child protection team (CPT) from 2016 to 2020 at a large Midwestern children's hospital.

METHODS: The Pediatric Health Information System (PHIS) and the CPT administrative databases were used to identify the study sample and injury and abuse-specific ICD-10-CM codes. Subjects were divided into abused and non-abused groups based on the CPT's evaluation. A LASSO logistic regression model was constructed using ICD-10-CM codes and patient age to identify children likely to be diagnosed by the CPT as abused. Performance was evaluated using repeated cross-validation (CV) and Reciever Operator Characteristic curve.

RESULTS: We identified 2028 patients evaluated by the CPT with 512 diagnosed as abused. Using diagnosis codes and patient age, our model was able to accurately identify patients with confirmed PA (mean CV AUC = 0.87). Performance was still weaker for patients without existing ICD codes for abuse (mean CV AUC = 0.81).

CONCLUSIONS: We built a model that employs injury ICD-10-CM codes and age to improve accuracy of distinguishing abusive from non-abusive injuries. This pilot modelling endeavor is a steppingstone towards improving population-level estimates of abuse.

Journal Title

Child abuse & neglect



First Page


Last Page


MeSH Keywords

Child; Humans; Physical Abuse; Pilot Projects; Prevalence; Child Abuse; Machine Learning


Child abuse; Child maltreatment; International classification of diseases codes; Machine learning; Pediatric health information system; Physical abuse.


Grants and funding

Library Record