Document Type


Publication Date



DOI: 10.2196/47592; PMCID: PMC10394604


BACKGROUND: Although prior research has identified multiple risk factors for diabetic ketoacidosis (DKA), clinicians continue to lack clinic-ready models to predict dangerous and costly episodes of DKA. We asked whether we could apply deep learning, specifically the use of a long short-term memory (LSTM) model, to accurately predict the 180-day risk of DKA-related hospitalization for youth with type 1 diabetes (T1D).

OBJECTIVE: We aimed to describe the development of an LSTM model to predict the 180-day risk of DKA-related hospitalization for youth with T1D.

METHODS: We used 17 consecutive calendar quarters of clinical data (January 10, 2016, to March 18, 2020) for 1745 youths aged 8 to 18 years with T1D from a pediatric diabetes clinic network in the Midwestern United States. The input data included demographics, discrete clinical observations (laboratory results, vital signs, anthropometric measures, diagnosis, and procedure codes), medications, visit counts by type of encounter, number of historic DKA episodes, number of days since last DKA admission, patient-reported outcomes (answers to clinic intake questions), and data features derived from diabetes- and nondiabetes-related clinical notes via natural language processing. We trained the model using input data from quarters 1 to 7 (n=1377), validated it using input from quarters 3 to 9 in a partial out-of-sample (OOS-P; n=1505) cohort, and further validated it in a full out-of-sample (OOS-F; n=354) cohort with input from quarters 10 to 15.

RESULTS: DKA admissions occurred at a rate of 5% per 180-days in both out-of-sample cohorts. In the OOS-P and OOS-F cohorts, the median age was 13.7 (IQR 11.3-15.8) years and 13.1 (IQR 10.7-15.5) years; median glycated hemoglobin levels at enrollment were 8.6% (IQR 7.6%-9.8%) and 8.1% (IQR 6.9%-9.5%); recall was 33% (26/80) and 50% (9/18) for the top-ranked 5% of youth with T1D; and 14.15% (213/1505) and 12.7% (45/354) had prior DKA admissions (after the T1D diagnosis), respectively. For lists rank ordered by the probability of hospitalization, precision increased from 33% to 56% to 100% for positions 1 to 80, 1 to 25, and 1 to 10 in the OOS-P cohort and from 50% to 60% to 80% for positions 1 to 18, 1 to 10, and 1 to 5 in the OOS-F cohort, respectively.

CONCLUSIONS: The proposed LSTM model for predicting 180-day DKA-related hospitalization was valid in this sample. Future research should evaluate model validity in multiple populations and settings to account for health inequities that may be present in different segments of the population (eg, racially or socioeconomically diverse cohorts). Rank ordering youth by probability of DKA-related hospitalization will allow clinics to identify the most at-risk youth. The clinical implication of this is that clinics may then create and evaluate novel preventive interventions based on available resources.

Journal Title

JMIR Diabetes



First Page


Last Page



AI; DKA; LSTM; NLP; RNN; T1D; artificial intelligence; deep learning; diabetic ketoacidosis; long short-term memory; machine learning; natural language processing; recurrent neural network; type 1 diabetes


This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Diabetes, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.

Publisher's Link: