Skip to main content

Table 1 Characteristics of reviewed studies

From: The role of machine learning algorithms in detection of gestational diabetes; a narrative review of current evidence

Authors & Year

Study Design

Objective/Purpose

Population & Size

Machine learning algorithm

Data Sources

Outcome Measured

Key Findings

Gabriel Cubillos et al. (2023) [19]

Retrospective study

To develop machine learning (ML) models, for the early prediction of GDM using widely available variables, facilitating early intervention

Dataset used included registries from 1,611 pregnancy

Gaussian Naïve Bayes (GNB) and Bernoulli Naïve Bayes (BNB), Decision Trees (DT), Support Vector Machines (SVMs), Multi-Layer Perceptron (MLP), K-Nearest Neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Extra Trees (ET), Balanced Random Forest (BRF), Gradient Boosting (GB), implemented in Extreme Gradient Boosting (XGB), and Light Gradient Boosting Machine (LGBM)

Pregnancy registry of patients attending the Obstetrics and Fetal Medicine Unit of the Hospital Parroquial de San Bernardo, Santiago, Chile between 2019–2022

Maternal weight, BMI, Age, 1TFG, Chronic hypertension, Gravidity, Parity, Insulin resistance, hypothyroidism, vaginal deliveries, abortion.

Early prediction of GDM within early stages of pregnancy using regular examinations; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models.

Jesús et al. 2023 [20]

Prospective Cohort study

To develop an AI-based prediction model for risk of developing GDM among pregnant women in Mexico.

860 women (430 with GDM, and 430 without GDM)

Medición Integrada para la Detección Oportuna (MIDO) AI model for predicting gestational diabetes (MIDO GDM) - Multiple Artificial Neuronal Network (ANN) Algorithms

The Cuido Mi Embarazo (CME) study that collected data from 1709 pregnant women in Mexico between April 2019 and May 2021.

Age, Pregestational BMI, Parity, Family history of diabetes mellitus, Family history of hypertension, History of hypertension, Gestational week, Enrolment BMI, Random capillary glucose at study enrollment, and fasting plasma glucose measured between the 24th and 28th week of pregnancy (first OGTT measurement)

The artificial neural network used to build this model achieved a high level of accuracy (70.3%) and sensitivity (83.3%) for identifying women at high risk of developing GDM. This AI-based model was set to be applied throughout Mexico to improve the timing and quality of GDM interventions.

Kang BS et al. (2023) [21]

Retrospective cohort study

To compare the performances of light gradient boosting machine (LGBM) and extreme gradient boosting (XGBoost) algorithms, with a full set of variables in predicting gestational diabetes mellitus (GDM)

34,387 (nulliparous - multiparous women

Gradient boosting machine algorithms (LGBM & XGBoost)

Perinatal database for women who delivered between January 2009 and December 2020 at 7 hospitals in four areas of South Korea

Performances of LGBM and XGBoost across the whole data set, nulliparity, and multiparity cohorts, at four different stages (baseline, E0, E1, and M1)

GDM was diagnosed in 3,103 pregnancies (9.02%) in the entire cohort. XGBoost outperformed LGBM in most cohorts and at most time points, except for the E1

Yi-xin Li et al. (2023) [22]

Cohort study

To use machine learning (ML) algorithms to study data gathered throughout the first trimester in order to predict GDM.

4799 and 2795 women in their first trimester

Extreme gradient boosting (XGBoost)

Pregnant women for the Xinhua Hospital Chongming branch (XHCM) and the Shanghai Pudong New Area People’s Hospital (SPNPH) formed the independent cohorts

Pre-pregnancy BMI and maternal abdominal circumference at pregnancy initiation, and FPG and HbA1c at the end of the first trimester

The model predicted GDM with moderate performance at pregnancy initiation and good-to-excellent performance at the end of the first trimester in the XHCM cohort. The trained XGBoost showed moderate performance in the SPNPH cohort

Jenny Yang et al. (2022) [23]

Retrospective cohort study

To introduce a machine learning-based stratification system for identifying patients at risk of exhibiting high blood glucose levels

1148 pregnant women with GDM at Oxford University Hospital and 709 from Royak Berkshire Hospital

Linear and non-linear tree-based regression models including XGBoost MSE, R2, MAE

Pregnant women with GDM, managed at the OUH, and subscribed to the GDm-Health system between 30 April 2018 to 4 May 2021

Also, GDm-Health data of 709 pregnancy cases at the Royal Berkshire Hospital

4–6 times daily blood sugar check, BMI.

Study outlined and demonstrated a straightforward method for implementing proportionate care delivery based on features already existing in GDM clinics

Jie Zhang et al. (2022) [24]

 

To predict Gestational DM under Cascade and Ensemble Learning Algorithm

1000 training samples and 85-dimensional features

Logistics regression model, Lasso-Logistics, Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (Xgboost), Light Gradient Boosting Machine (Lightgbm), and Gradient Boosting Categorical Features (Catboost)

Data set commissioned by Beijing Qingwutong Health Technology Company published in the Tianchi Big Data Competition held by Alibaba

Physical indicators, such as age, height, weight, BMI, and cholesterol indicators. The other 55 are genetic features

Data set utilized in this work, the accuracy of the proposed prediction model is 80.3%, the precision is 74.6%, and the recall rate is 79.3%.

Lauren D Liao et al. (2022) [25]

Population-based cohort study

To investigate whether clinical data at varied stages of pregnancy can predict GDM treatment modality.

To predict risks for pharmacologic treatment beyond MNT(medical nutrition therapy)

30,474 pregnant women with GDM

Transparent and Ensemble machine learning prediction methods, including LASSO regression and super learner, containing classification and regression tree, LASSO regression, random forest, and extreme gradient boosting algorithms

Pregnant women with GDM delivered at Kaiser Permanente Northern California between 2007–2017 (KPNC Pregnancy Glucose Tolerance and GDM Registry)

Responsiveness to MNT, then to OHA and with insulin

Clinical data demonstrated reasonably high predictability for GDM treatment modality at the time of GDM diagnosis and high predictability at 1-week post GDM diagnosis

Mukkesh Kumar et al. (2022) [26]

Cohort study.

To evaluate the predictive ability of existing UK NICE guidelines for assessing GDM using machine learning

909 pregnant women

CatBoost gradient boosting algorithm, and the Shapley feature attribution framework

GUSTO (Growing Up in Singapore Towards healthy Outcomes) prospective multi-ethnic mother–offspring pregnant women recruited at 7–11 weeks of gestational age.

Mean arterial blood pressure in first trimester, age, ethnicity and previous history of GDM

UK NICE guidelines were insufficient to assess GDM risk in Asian women. The non-invasive predictive model developed in this study outperformed the current state-of-the-art machine learning models to predict GDM

Mukkesh Kumar et al. (2022) [27]

Prospective (preconception) cohort study.

To build a preconception-based GDM predictor to enable early intervention. To also assess the associations of top predictors with GDM and adverse birth outcomes

1032 Women planning for pregnancies were recruited from the KK Women’s and Children’s Hospital (KKH) and community of multi-ethnic groups (Chinese, Malay, Indian or any combination of these three ethnicities)between February 2015 and October 2017

Evolutionary algorithm-based automated machine learning (AutoML) - SHAP framework + TPOT

Mother–child dyads were followed for 7 years, with longitudinal phenotypic data collected across multiple health domains.

Demographics, medical/obstetric history, physical measures, blood-derived markers, lifestyle factors and antenatal OGTT

The study devised a population-based predictive care solution to assess the risk of developing GDM in preconception of Asian women

Yuhan Du et al. (2022) [28]

Randomized Clinical Trial

To apply machine learning to develop a clinical decision support system (CDSS) that predicts the risk of GDM in a high risk group of women with overweight and obesity

1,139 pregnant women (186 with GDM) from eastern China

Random Forest model and Logistic Regression model

The study was conducted at three primary women and child health care centres and a university-affiliated hospital.

Pre-pregnancy BMI, abdomen circumference in the first trimester, age, PCOS, gravidity,

The research developed a simple model to predict the risk of GDM using machine learning algorithm in the first trimester without blood examination indexes

Li-Li Wei et al. (2021) [29]

Retrospective study

To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy

1625 Pregnant women who had attended medical institutions for an antenatal examination in pregnant women in the Qingdao area of China from November 2017 to August 2018

Random Forest regression algorithm

Face-to-face questionnaire survey of participants and review of pregnancy-related medical records to obtain indicators related to GDM

BMI, Pregnancy weight, Blood group, Blood pressure, comorbidities

The variables of body weight at birth and mother’s weight were identified to be strongly predictive of GDM in all models. Other variables (e.g., colpomycosis, kidney disease, number of births by the mother, regular menstruation, blood type, and hepatitis) that consistently ranked in the top 20 most influential factors were also found to be linked to GDM in this study

Yang-Ting Wu et al. (2021) [30]

Retrospective study

To establish effective models to predict early GDM.

16 819 cases in the training data set, and 15 371 cases in the testing data set

Logistic regression (LR), K-nearest neighbor (KNN), Support vector machine (SVM), and Deep neural network (DNN)

2017 Obstetrical electronic medical record data from the International Peace Maternal and Child Health Hospital, Shanghai Jiao Tong University School of Medicine

Advanced maternal age, body mass index (BMI), and family history of diabetes, Blood pressure, Parity

A clinically cost-effective 7-variable LR model was developed. The relationship of GDM with thyroxine and BMI was also investigated in the Chinese population.

de Freitas et al. 2020 [31]

Case–control study

To investigate the use of attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy to analyse spectrochemical information using chemometric methods for accurate and low-cost GDM detection.

50 GDM women with single pregnancies at a gestational age of between 12 and 38 weeks and 50 healthy pregnant control group at a Reference Obstetrics and Gynecology Hospital between January and October 2018.

Chemometric approaches, including feature selection algorithms associated with discriminant analysis, such as Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Support Vector Machines (SVM)

Pregnant women at a Reference Obstetrics and Gynecology Hospital between January and October 2018.

Age, BMI by classification (suitable, low weight, overweight, obesity), marital status and parity, Previous mode of delivery, GDM history, family history of GDM, History of disease in pregnancy

The Fourier-transform infrared (FTIR) spectra of blood plasma samples taken from pregnant women with GDM can rapidly distinguish diabetic cohorts from healthy pregnant women. Using the Genetic Algorithm Linear Discriminant Analysis (GA-LDA) method, GDM could be distinguished from healthy pregnant controls with 100% accuracy, sensitivity, and specificity in an external test set.

Yunzhen Ye et al. (2020) [32]

Retrospective Cohort Study

To use machine learning methods to predict GDM and compare their performance with that of logistic regressions.

22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM

GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest

Obstetrics and Gynecology Hospital of Fudan University in China from 2013 to 2017.

Primary outcome was GDM. Secondary outcomes included adverse pregnancy outcomes, including cesarean delivery for any reason, preeclampsia, macrosomia, IUGR, preterm birth (≤ 34 gestational weeks), neonatal asphyxia, and perinatal death.

This study found that several machine learning methods did not outperform logistic regression in predicting GDM. A model with cutoff points for risk stratification of GDM was also developed.

Jingyuan Wang et al. 2021 [33]

Prospective cohort study

To develop and verify an early prediction model of gestational diabetes mellitus (GDM) using machine learning algorithm

2811 pregnant women in eastern China, from 2017 to 2019

Logical Regression (LR), Random Forest (RT), Articial Neural Network (ANN) and Support Vector Machine (SVM)

Dataset was derived from a cohort of pregnant women in Qingdao between November 2017 and December 2019

Socio-demographic characteristics and medical history, including age (identied from the identity card), height, pre-pregnancy body weight, and family history of diabetes. gravidity, parity, multiple birth (yes/no), and pregnancy complications), as well as laboratory test results, including Hemoglobin (Hb), Urine Ket (U-Ket), Fasting Plasma Glucose (FPG), triglyceride (TG), total cholesterol (TC), and HighDensity Lipoprotein (HDL)

Study constructed a New-Stacking model theoretically, for its best performance in specificity, accuracy and AUC. But the SVM model achieved the best performance in sensitivity