Please use this identifier to cite or link to this item: http://hdl.handle.net/11189/9664
Title: Detecting the most critical clinical variables of COVID-19 breakthrough infection in vaccinated persons using machine learning
Authors: Daramola, Olawande 
Kavu, Tatenda Duncan 
Kotze, Maritha J. 
Kamati, Oiva 
Emjedi, Zaakiyah 
Kabaso, Boniface 
Moser, Thomas 
Stroetmann, Karl 
Fwemba, Isaac 
Daramola, Fisayo 
Nyirenda, Martha 
Van Rensburg, Susan J. 
Nyasulu, Peter S. 
Marnewick, Jeanine L 
Keywords: Machine learning;vaccination;COVID-19;breakthrough infection;Pfizer vaccine;J&J vaccine;Explainable AI;XGBoost;deep multilayer perceptron;logistic regression
Issue Date: 2023
Publisher: SAGE PUBLICATIONS LTD (England)
Source: Daramola O. et al. 2023. Detecting the most critical clinical variables of COVID-19 breakthrough infection in vaccinated persons using machine learning. Digital Health, 9:1-23. [https://doi.org/10.1177/20552076231207593]
Journal: Digital Health 
Abstract: Background: COVID-19 vaccines offer different levels of immune protection but do not provide 100% protection. Vaccinated persons with pre-existing comorbidities may be at an increased risk of SARS-CoV-2 breakthrough infection or reinfection. The aim of this study is to identify the critical variables associated with a higher probability of SARS-CoV-2 breakthrough infection using machine learning. Methods: A dataset comprising symptoms and feedback from 257 persons, of whom 203 were vaccinated and 54 unvaccinated, was used for the investigation. Three machine learning algorithms - Deep Multilayer Perceptron (Deep MLP), XGBoost, and Logistic Regression - were trained with the original (imbalanced) dataset and the balanced dataset created by using the Random Oversampling Technique (ROT), and the Synthetic Minority Oversampling Technique (SMOTE). We compared the performance of the classification algorithms when the features highly correlated with breakthrough infection were used and when all features in the dataset were used. Result: The results show that when highly correlated features were considered as predictors, with Random Oversampling to address data imbalance, the XGBoost classifier has the best performance (F1 = 0.96; accuracy = 0.96; AUC = 0.98; G-Mean = 0.98; MCC = 0.88). The Deep MLP had the second best performance (F1 = 0.94; accuracy = 0.94; AUC = 0.92; G-Mean = 0.70; MCC = 0.42), while Logistic Regression had less accurate performance (F1 = 0.89; accuracy = 0.88; AUC = 0.89; G-Mean = 0.89; MCC = 0.68). We also used Shapley Additive Explanations (SHAP) to investigate the interpretability of the models. We found that body temperature, total cholesterol, glucose level, blood pressure, waist circumference, body weight, body mass index (BMI), haemoglobin level, and physical activity per week are the most critical variables indicating a higher risk of breakthrough infection. Conclusion: These results, evident from our unique data source derived from apparently healthy volunteers with cardiovascular risk factors, follow the expected pattern of positive or negative correlations previously reported in the literature. This information strengthens the body of knowledge currently applied in public health guidelines and may also be used by medical practitioners in the future to reduce the risk of SARS-CoV-2 breakthrough infection.
URI: http://hdl.handle.net/11189/9664
ISSN: 2055-2076
2055-2076
DOI: https://doi.org/10.1177/20552076231207593
Appears in Collections:FID - Journal Articles (DHET subsidised)

Files in This Item:
File Description SizeFormat 
Detecting_the_most_critical_clinical_variables.pdf2.38 MBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check

Altmetric


Items in Digital Knowledge are protected by copyright, with all rights reserved, unless otherwise indicated.