A Proposed Method Based on Logistic Regression and Cluster Analysis in Selecting Influential Variables for Kidney Failure Patients

Section: Article
Published
Nov 30, 2025
Pages
87-95

Abstract

The research aims to study kidney failure by analyzing the relationship between it and a set of independent variables. To achieve this, a method was proposed that relies on reducing the number of independent variables used in binary logistic regression. The method relies on merging the independent variables with the dependent variable using cluster analysis, to improve the accuracy of the model and obtain the best possible results.The proposed method was applied to a sample of 142 individuals to study the relationship between the response variable (renal and non-renal failure) and independent variables such as gender, age, smoking, urea, creatine, and calcium. The results showed that the proposed method succeeded in reducing the number of independent variables and provided an ideal model that classifies the data with high accuracy. The resulting model focused on the two most influential variables, urea and creatine, and achieved a high classification rate of 94.4%.,The proposed method proved effective in reducing the number of variables and achieving accurate results in classifying data related to kidney failure.

References

  1. -Thompson, S., James, M., Wiebe, N., Hemmelgarn, B., Manns, B., Klarenbach, S., & Tonelli, M. (2015). Cause of death in patients with reduced kidney function. Journal of the American Society of Nephrology, 26(10), 2504-2511.
  2. DOI: 10.1681/ASN.2014070714
  3. -Aqlan, F., Markle, R., & Shamsan, A. (2017). Data mining for chronic kidney disease prediction. In IIE Annual Conference. Proceedings (pp. 1789-1794). Institute of Industrial and Systems Engineers (IISE).‏
  4. https://www.researchgate.net/profile/Abdulrahman-Shamsan/publication/331440652
  5. -Cheng, Y., Shang, J., Liu, D., Xiao, J., & Zhao, Z. (2020). Development and validation of a predictive model for the progression of diabetic kidney disease to kidney failure. Renal failure, 42(1), 550-559.
  6. https://doi.org/10.1080/0886022X.2020.1772294
  7. -Bai, Q., Su, C., Tang, W., & Li, Y. (2022). Machine learning to predict end-stage kidney disease in chronic kidney disease. Scientific reports, 12(1), 8377.‏
  8. https://doi.org/10.1038/s41598-022-12316-z
  9. -Khan, N., Raza, M. A., Mirjat, N. H., Balouch, N., Abbas, G., Yousef, A., & Touti, E. (2024). Unveiling the predictive power: a comprehensive study of a machine learning model for anticipating chronic kidney disease. Frontiers in Artificial Intelligence, 6, 1339988.‏
  10. https://doi.org/10.3389/frai.2023.1339988
  11. -Wubalem, A., & Meten, M. (2020). Landslide susceptibility mapping using information value and logistic regression models in Goncha Siso Eneses area, northwestern Ethiopia. SN Applied Sciences, 2, 1-19.‏
  12. https://doi.org/10.1007/s42452-020-2563-0
  13. -Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environmental Modelling & Software, 25(6), 747-759.‏
  14. DOI: 10.1016/j.envsoft.2009.10.016
  15. -Alshebly,O.Q. & Ahmed, R. M,(2019).Prediction and Factors Affecting of Chronic Kidney Disease Diagnosis using Artificial Neural Networks Model and Logistic Regression Model.Journal of Statistical Sciences, 16(1), 140-159.
  16. DOI:10.33899/iqjoss.2019.0164186
  17. -Ali,Mohammed F.,Taha,Hutheyfa H (2025).Comparison between Logistic Regression and K-Nearest Neighbour Techniques with Application on Thalassemia Patients in Mosul.Iraqi Journal of Statistical Sciences, 22(1), 151-167.‏
  18. DOI: 10.33899/iqjoss.2025.187789
  19. -Ibrahim,N.S .Mohammed,N. N. & Mahmood,S. W.(2020).Multicollinearity in Logistic Regression Model -Subject Review-.Journal of Statistical Sciences, 17(1), 46-53.
  20. DOI:10.33899/iqjoss.2020.0165448
  21. Sujatha, E. R., & Sridhar, V. (2021). Landslide susceptibility analysis: A logistic regression model case study in Coonoor, India. Hydrology, 8(1), 41.‏
  22. https://doi.org/10.3390/hydrology8010041
  23. -Ebrahimi Kalan, M., Jebai, R., Zarafshan, E., & Bursac, Z. (2021). Distinction between two statistical terms: multivariable and multivariate logistic regression. Nicotine and Tobacco Research, 23(8), 1446-1447.‏
  24. https://d1wqtxts1xzle7.cloudfront.net/92080508/ntaa055-libre.pdf
  25. -David W. Hosmer, Jr., Stanley Lemeshow and Rodney X. Sturdivant, "Applied Logistic Regression", 3rd Edition. , John Wiley & Sons, (2013). DOI: 10.1080/00401706.1992.10485291
  26. -Essa, A. K., SH, L. F., & Shihab, D. H. (2023). A comparison between the hierarchical clustering methods for postgraduate students in Iraqi universities for the year 2019-2020 using the cophenetic and delta correlation coefficients. Periodicals of Engineering and Natural Sciences, 11(1), 174-185.
  27. file:///C:/Users/hjzxd/Downloads/174-185_3454-7914-1-RV.pdf
  28. -Härdle, W. K., & Simar, L. (2015). Applied multivariate statistical analysis. Springer Nature.‏
  29. http://springer.com/978-3-662-45170-0
  30. -Hamad, B. A. (2023). Combining Cluster Analysis with Multiple Linear Regression Analysis to Create the Most Accurate Prediction Model for Evaporation in the Kurdistan Region of. Iraqi Journal of Statistical Sciences, 20(2), 188-199.‏
  31. DOI: 10.33899/iqjoss.2023.181226
  32. -Manasa, P., Ananth, P., Natarajan, P., Somasundaram, K., Rajkumar, E. R., Ravichandran, K. S., ... & Gandomi, A. H. (2024). An analysis of causative factors for road accidents using partition around medoids and hierarchical clustering techniques. Engineering Reports, e12793.‏ https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/eng2.12793
  33. -Sarma, K. V. S., & Vardhan, R. V. (2018). Multivariate statistics made simple: a practical approach. Chapman and Hall/CRC.‏https://doi.org/10.1201/9780429465185
  34. -Adel ,Zainab & Rashed Safwan.(2021).Using the linear and non-linear discriminant function with cluster analysis to study the level of education for the completed stages (governmental – private) In Nineveh Governorate. Iraqi Journal of Statistical Sciences, 18(1), 88-98. DOI: 10.33899/iqjoss.2021.0168377
Download this PDF file

Statistics

How to Cite

Hameed, S. B. ., & Taher, M. M. . (2025). A Proposed Method Based on Logistic Regression and Cluster Analysis in Selecting Influential Variables for Kidney Failure Patients. IRAQI JOURNAL OF STATISTICAL SCIENCES, 22(2), 87–95. https://doi.org/10.33899/iqjoss.v22i2.54080