Date de publication

13 novembre 2022


Marathe, G., & al.



Depression is common in the human immunodeficiency virus (HIV)-hepatitis C virus (HCV) co-infected population. Demographic, behavioural, and clinical data collected in research settings may be of help in identifying those at risk for clinical depression. We aimed to predict the presence of depressive symptoms indicative of a risk of depression and identify important classification predictors using supervised machine learning.


We used data from the Canadian Co-infection Cohort, a multicentre prospective cohort, and its associated sub-study on Food Security (FS). The Center for Epidemiologic Studies Depression Scale-10 (CES-D-10) was administered in the FS sub-study; participants were classified as being at risk for clinical depression if scores ≥ 10. We developed two random forest algorithms using the training data (80%) and tenfold cross validation to predict the CES-D-10 classes—1. Full algorithm with all candidate predictors (137 predictors) and 2. Reduced algorithm using a subset of predictors based on expert opinion (46 predictors). We evaluated the algorithm performances in the testing data using area under the receiver operating characteristic curves (AUC) and generated predictor importance plots.


We included 1,934 FS sub-study visits from 717 participants who were predominantly male (73%), white (76%), unemployed (73%), and high school educated (52%). At the first visit, median age was 49 years (IQR:43–54) and 53% reported presence of depressive symptoms with CES-D-10 scores ≥ 10. The full algorithm had an AUC of 0.82 (95% CI:0.78–0.86) and the reduced algorithm of 0.76 (95% CI:0.71–0.81). Employment, HIV clinical stage, revenue source, body mass index, and education were the five most important predictors.


We developed a prediction algorithm that could be instrumental in identifying individuals at risk for depression in the HIV-HCV co-infected population in research settings. Development of such machine learning algorithms using research data with rich predictor information can be useful for retrospective analyses of unanswered questions regarding impact of depressive symptoms on clinical and patient-centred outcomes among vulnerable populations.