Objective To develop and compare the predictive performance of five machine learning models for adverse postoperative outcomes in cardiac surgery patients, and to identify key decision factors through SHapley Additive exPlanations (SHAP) interpretability analysis. Methods A retrospective collection of perioperative data (including demographic information, preoperative, intraoperative, and postoperative indicators) with 88 variables was conducted from adult cardiac surgery patients at the First Affiliated Hospital of Xinjiang Medical University in 2023. Adverse postoperative outcomes were defined as the occurrence of acute kidney injury and/or in-hospital mortality during the postoperative hospitalization period following cardiac surgery. Patients were divided into an adverse outcome group and a favorable outcome group based on the presence of adverse postoperative outcomes. After screening feature variables using the least absolute shrinkage and selection operator (LASSO) regression method, five machine learning models were constructed: eXtreme gradient boosting (XGBoost), random forest (RF), gradient boosting machine (GBM), light gradient boosting machine (LightGBM), and generalized linear model (GLM). The dataset was randomly divided into a training set and a test set at a 7 : 3 ratio using stratified sampling, with postoperative outcome as the stratification factor. Model performance was evaluated using receiver operating characteristic curves, decision curve analysis, and F1 Score. The SHAP method was applied to analyze feature contribution. Results A total of 639 patients were included, comprising 395 males and 244 females, with a median age of 62 (55, 69) years. The adverse outcome group consisted of 191 patients, while the favorable outcome group included 448 patients, resulting in an adverse postoperative outcome incidence of 29.9%. Univariate analysis showed no significant differences between the two groups for any variables (P>0.05). Using LASSO regression, 16 feature variables were selected (including cardiopulmonary bypass support time, blood glucose on postoperative day 3, creatine kinase-MB isoenzyme, systemic inflammatory response index, etc.), and five machine learning models (GLM, RF, GBM, LightGBM, XGBoost) were constructed. Evaluation results demonstrated that the XGBoost model exhibited the best predictive performance on both the training set (n=447) and test set (n=192), with area under the curve values of 0.761 [95%CI (0.719, 0.800) ] and 0.759 [95%CI (0.692, 0.818) ], respectively. It also significantly outperformed other models in positive predictive value, and balanced accuracy in the test set. Decision curve analysis further confirmed its clinical utility across various risk thresholds. SHAP analysis indicated that variables such as cardiopulmonary bypass support time, blood glucose on postoperative day 3, creatine kinase-MB isoenzyme, and inflammatory markers (SIRI, NLR, CAR) had high contributions to the prediction. Conclusion The XGBoost model effectively predicts adverse postoperative outcomes in cardiac surgery patients. Clinically, attention should be focused on cardiopulmonary bypass support time, postoperative blood glucose control, and monitoring of inflammatory levels to improve patient prognosis.
ObjectiveTo develop and validate a machine learning model based on preoperative clinical characteristics, laboratory indices, and radiological features for the non-invasive prediction of spread through air spaces (STAS) in patients with early-stage lung adenocarcinoma. Methods Preoperative data from patients with early-stage lung adenocarcinoma who underwent surgical resection at Northern Jiangsu People's Hospital between January 2020 and August 2025 were retrospectively collected. The data included clinical characteristics, laboratory indices, and radiological features. Patients were divided into a STAS-positive and a STAS-negative group based on postoperative pathological findings. The dataset was randomly split into a training set and a testing set at a 7 : 3 ratio. Feature variables were selected using the maximum relevance and minimum redundancy (mRMR) algorithm and the least absolute shrinkage and selection operator (LASSO) regression. Five machine learning models were constructed: logistic regression (LR), random forest (RF), support vector machine (SVM), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and decision curve analysis (DCA). The shapley additive explanations (SHAP) method was employed to interpret the optimal prediction model. Results A total of 377 patients were included, comprising 177 (46.9%) males and 200 females (53.1%), with a mean age of (63.31±9.73) years. There were 261 patients in the training set and 116 patients in the testing set. In the training set, statistically significant differences were observed between the STAS-positive group (n=130) and STAS-negative group (n=131) across multiple features, including age, sex, neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), clinical T stage, and maximum solid component diameter (P<0.05). A final set of 10 feature variables was selected by combining mRMR and LASSO regression, and five machine learning models (LR, RF, SVM, LightGBM, XGBoost) were developed. The XGBoost model demonstrated superior predictive performance in both the training and testing sets, achieving AUCs of 0.947 [95%CI (0.920, 0.975)] and 0.943 [95%CI (0.894, 0.993)], respectively, and achieved the optimal level in the testing set. DCA indicated that the XGBoost model provided a high net clinical benefit across a wide range of threshold probabilities. SHAP analysis revealed that the vessel convergence sign, clinical T stage, age, consolidation-to-tumor ratio (CTR), and MLR were the features with the highest contributions to STAS prediction. Conclusion The XGBoost model effectively predicts preoperative STAS status in early-stage lung adenocarcinoma, exhibiting excellent discriminative performance and good clinical interpretability. Key predictors such as the vessel convergence sign, clinical T stage, age and CTR provide a crucial reference for preoperative risk assessment and the individualized selection of surgical strategies, ultimately benefiting patients.