A Comparative Study of Bayesian-Optimized Machine Learning Models for Differentiated Thyroid Cancer Recurrence Prediction
Abstract
The most prevalent type of thyroid malignancy is the differentiated thyroid cancer (DTC), and predicting its recurrence remains a clinical challenge. This study addresses the growing need for reliable predictive tools by applying machine learning techniques enhanced with Bayesian optimization. Early detection of the risk of recurrence can significantly improve health maintenance and outcomes. The study develops a comparative framework using four supervised classifiers Logistic Regression, (XGBoost) extreme gradient boost, CatBoost, and (LightGBM) light gradient boosting machine on a clinical dataset related to differentiated thyroid cancer patients. Each Model is trained and evaluated both before and after hyperparameter tuning via Bayesian optimization. Model performance is assessed using accuracy, recall, precision, and the area under the receiver operating characteristic (ROC) curve (AUC). The optimized (XGBoost) model achieved the top performance, along, recall, precision, and accuracy of 0.97, 0.99, 0.9870 respectively and an AUC of 0.9737, clearly outperforming its default counterpart. In contrast, CatBoost shows a slight performance drop after optimization, while Logistic Regression and LightGBM exhibit no significant changes. The results demonstrate that Bayesian optimization can substantially enhance model performance depending on the algorithm. This study highlights the effectiveness of optimization techniques in boosting the predictive power of machine learning models in the medical field, particularly in recurrence prediction for differentiated thyroid cancer.
References
- Akiba, T., Sano, S., Yanase, T., Ohta, T. and Koyama, M., 2019, July. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-2631) https://doi.org/10.1145/3292500.3330701.
- Ali, H.A.S., 2024. Machine learning for internet of things (IoT) security: a comprehensive survey. International journal of Computer Networks and Application, 11(5), pp.617-659.doi: 10.22247/ijcna/2024/40
- Aljameel, S.S., Alzahrani, M., Almusharraf, R., Altukhais, M., Alshaia, S., Sahlouli, H., Aslam, N., Khan, I.U., Alabbad, D.A. and Alsumayt, A., 2023. Prediction of preeclampsia using machine learning and deep learning models: a review. Big Data and Cognitive Computing, 7(1), p.32 https://doi.org/10.3390/bdcc7010032
- Borzooei, S. & Tarokhian, A. 2023). Differentiated Thyroid Cancer Recurrence [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5632J
- Dritsas, E. and Trigka, M., 2022. Machine learning techniques for chronic kidney disease risk prediction.Big Data and Cognitive Computing,6(3), p.98. https://doi.org/10.3390/bdcc6030098
- Guo, J., Yun, S., Meng, Y., He, N., Ye, D., Zhao, Z., Jia, L. and Yang, L., 2023. Prediction of heating and cooling loads based on light gradient boosting machine algorithms.Building and Environment,236, p.110252. https://doi.org/10.1016/j.buildenv.2023.110252
- Hancock, J.T. and Khoshgoftaar, T.M., 2020. CatBoost for big data: an interdisciplinary review.Journal of big data,7(1), p.94. https://doi.org/10.1186/s40537-020-00369-8
- Hong, N., Liu, C., Gao, J., Han, L., Chang, F., Gong, M. and Su, L., 2022. State of the art of machine learning–enabled clinical decision support in intensive care units: literature review. JMIR medical informatics, 10(3), p.e 28781. https://doi.org/10.2196/28781
- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. & Liu, T.-Y., 2017. LightGBM: A highly efficient gradient boosting decision tree. In: Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, 4–9 December 2017, pp.3149–3157
- Namdar, K., Wagner, M.W., Ertl-Wagner, B.B. and Khalvati, F., 2025. Open-radiomics: a collection of standardized datasets and a technical protocol for reproducible radiomics machine learning pipelines. BMC Medical Imaging, 25(1), p.312. https://doi.org/10.1186/s12880-025-01855-2
- Park, Y.M. and Lee, B.J., 2021. Machine learning-based prediction model using clinico-pathologic factors for papillary thyroid carcinoma recurrence. Scientific Reports, 11(1), p.4948. https://doi.org/10.1038/s41598-021-84504-2
- Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V. and Gulin, A., 2018. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.
- Rahmatinejad, Z., Dehghani, T., Hoseini, B., Rahmatinejad, F., Lotfata, A., Reihani, H. and Eslami, S., 2024. A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department. Scientific Reports, 14(1), p.3406. https://doi.org/10.1038/s41598-024-54038-4
- Ramraj, S., Uzir, N., Sunil, R. and Banerjee, S., 2016. Experimenting XGBoost algorithm for prediction and classification of different datasets.International Journal of Control Theory and Applications,9(40), pp.651-662.
- Subaşı, N., 2024. Comprehensive Analysis of Grid and Randomized Search on Dataset Performance. European Journal of Engineering and Applied Sciences, 7(2), pp.77-83. https://doi.org/10.55581/ejeas.1581494
- Sudhakar, A., Sujatha, S., Sathiya, M., Sivaramakrishnan, A., Subramanian, B. and Venkata, R.K., 2025. Bayesian Optimization for Hyperparameter Tuning in Healthcare for Diabetes Prediction. Informing Science, 28, p.8. DOI:10.28945/5445
- Sun, H., Saeedi, P., Karuranga, S., Pinkepank, M., Ogurtsova, K., Duncan, B.B., Stein, C., Basit, A., Chan, J.C., Mbanya, J.C. and Pavkov, M.E., 2022. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes research and clinical practice, 183, p.109119. https://doi.org/10.1016/j.diabres.2021.109119
- Wang, H., Zhang, C., Li, Q., Tian, T., Huang, R., Qiu, J. & Tian, R., 2024. Development and validation of prediction models for papillary thyroid cancer structural recurrence using machine learning approaches. BMC Cancer, 24, 427 https://doi.org/10.1186/s12885-024-12146-4
- Xie, Z., et al., 2022. LightGBM based prediction of recurrence in differentiated thyroid cancer. Frontiers in Endocrinology, 13, p.849. 10.1097/MS9.0000000000003279
- Yu, T. and Zhu, H., 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689. https://doi.org/10.48550/arXiv.2003.05689
- Zhu, M., Zhang, Y., Gong, Y., Xing, K., Yan, X. and Song, J., 2024, May. Ensemble methodology: Innovations in credit default prediction using lightgbm, xgboost, and localensemble. In 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI) (pp. 421-426). IEEE. https://doi.org/10.1109/ICETCI61221.2024.10594630





