A Proposed Method Based on Logistic Regression and Cluster Analysis in Selecting Influential Variables for Kidney Failure Patients

Suhaib Bashar  Hameed; Mahmood M  Taher

doi:10.33899/iqjoss.v22i2.54080

A Proposed Method Based on Logistic Regression and Cluster Analysis in Selecting Influential Variables for Kidney Failure Patients

Section: Article

Issue

Vol. 22 No. 2 (2025): Volume 22 Issue 2

Published

Nov 30, 2025

Pages

87-95

Abstract

The research aims to study kidney failure by analyzing the relationship between it and a set of independent variables. To achieve this, a method was proposed that relies on reducing the number of independent variables used in binary logistic regression. The method relies on merging the independent variables with the dependent variable using cluster analysis, to improve the accuracy of the model and obtain the best possible results.The proposed method was applied to a sample of 142 individuals to study the relationship between the response variable (renal and non-renal failure) and independent variables such as gender, age, smoking, urea, creatine, and calcium. The results showed that the proposed method succeeded in reducing the number of independent variables and provided an ideal model that classifies the data with high accuracy. The resulting model focused on the two most influential variables, urea and creatine, and achieved a high classification rate of 94.4%.,The proposed method proved effective in reducing the number of variables and achieving accurate results in classifying data related to kidney failure.

References

-Thompson, S., James, M., Wiebe, N., Hemmelgarn, B., Manns, B., Klarenbach, S., & Tonelli, M. (2015). Cause of death in patients with reduced kidney function. Journal of the American Society of Nephrology, 26(10), 2504-2511.
DOI: 10.1681/ASN.2014070714
-Aqlan, F., Markle, R., & Shamsan, A. (2017). Data mining for chronic kidney disease prediction. In IIE Annual Conference. Proceedings (pp. 1789-1794). Institute of Industrial and Systems Engineers (IISE).‏
https://www.researchgate.net/profile/Abdulrahman-Shamsan/publication/331440652
-Cheng, Y., Shang, J., Liu, D., Xiao, J., & Zhao, Z. (2020). Development and validation of a predictive model for the progression of diabetic kidney disease to kidney failure. Renal failure, 42(1), 550-559.
https://doi.org/10.1080/0886022X.2020.1772294 ‏
-Bai, Q., Su, C., Tang, W., & Li, Y. (2022). Machine learning to predict end-stage kidney disease in chronic kidney disease. Scientific reports, 12(1), 8377.‏
https://doi.org/10.1038/s41598-022-12316-z
-Khan, N., Raza, M. A., Mirjat, N. H., Balouch, N., Abbas, G., Yousef, A., & Touti, E. (2024). Unveiling the predictive power: a comprehensive study of a machine learning model for anticipating chronic kidney disease. Frontiers in Artificial Intelligence, 6, 1339988.‏
https://doi.org/10.3389/frai.2023.1339988
-Wubalem, A., & Meten, M. (2020). Landslide susceptibility mapping using information value and logistic regression models in Goncha Siso Eneses area, northwestern Ethiopia. SN Applied Sciences, 2, 1-19.‏
https://doi.org/10.1007/s42452-020-2563-0
-Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environmental Modelling & Software, 25(6), 747-759.‏
DOI: 10.1016/j.envsoft.2009.10.016
-Alshebly,O.Q. & Ahmed, R. M,(2019).Prediction and Factors Affecting of Chronic Kidney Disease Diagnosis using Artificial Neural Networks Model and Logistic Regression Model.Journal of Statistical Sciences, 16(1), 140-159.
DOI:10.33899/iqjoss.2019.0164186
-Ali,Mohammed F.,Taha,Hutheyfa H (2025).Comparison between Logistic Regression and K-Nearest Neighbour Techniques with Application on Thalassemia Patients in Mosul.Iraqi Journal of Statistical Sciences, 22(1), 151-167.‏
DOI: 10.33899/iqjoss.2025.187789
-Ibrahim,N.S .Mohammed,N. N. & Mahmood,S. W.(2020).Multicollinearity in Logistic Regression Model -Subject Review-.Journal of Statistical Sciences, 17(1), 46-53.
DOI:10.33899/iqjoss.2020.0165448
Sujatha, E. R., & Sridhar, V. (2021). Landslide susceptibility analysis: A logistic regression model case study in Coonoor, India. Hydrology, 8(1), 41.‏
https://doi.org/10.3390/hydrology8010041
-Ebrahimi Kalan, M., Jebai, R., Zarafshan, E., & Bursac, Z. (2021). Distinction between two statistical terms: multivariable and multivariate logistic regression. Nicotine and Tobacco Research, 23(8), 1446-1447.‏
https://d1wqtxts1xzle7.cloudfront.net/92080508/ntaa055-libre.pdf
-David W. Hosmer, Jr., Stanley Lemeshow and Rodney X. Sturdivant, "Applied Logistic Regression", 3rd Edition. , John Wiley & Sons, (2013). DOI: 10.1080/00401706.1992.10485291
-Essa, A. K., SH, L. F., & Shihab, D. H. (2023). A comparison between the hierarchical clustering methods for postgraduate students in Iraqi universities for the year 2019-2020 using the cophenetic and delta correlation coefficients. Periodicals of Engineering and Natural Sciences, 11(1), 174-185.
file:///C:/Users/hjzxd/Downloads/174-185_3454-7914-1-RV.pdf
-Härdle, W. K., & Simar, L. (2015). Applied multivariate statistical analysis. Springer Nature.‏
http://springer.com/978-3-662-45170-0
-Hamad, B. A. (2023). Combining Cluster Analysis with Multiple Linear Regression Analysis to Create the Most Accurate Prediction Model for Evaporation in the Kurdistan Region of. Iraqi Journal of Statistical Sciences, 20(2), 188-199.‏
DOI: 10.33899/iqjoss.2023.181226
-Manasa, P., Ananth, P., Natarajan, P., Somasundaram, K., Rajkumar, E. R., Ravichandran, K. S., ... & Gandomi, A. H. (2024). An analysis of causative factors for road accidents using partition around medoids and hierarchical clustering techniques. Engineering Reports, e12793.‏ https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/eng2.12793
-Sarma, K. V. S., & Vardhan, R. V. (2018). Multivariate statistics made simple: a practical approach. Chapman and Hall/CRC.‏https://doi.org/10.1201/9780429465185
-Adel ,Zainab & Rashed Safwan.(2021).Using the linear and non-linear discriminant function with cluster analysis to study the level of education for the completed stages (governmental – private) In Nineveh Governorate. Iraqi Journal of Statistical Sciences, 18(1), 88-98. DOI: 10.33899/iqjoss.2021.0168377

Authors

Suhaib Bashar Hameed

Department of Statistics and Informatics, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq

Mahmood M Taher

Department of Statistics and Informatics, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq

Identifiers

https://doi.org/10.33899/iqjoss.v22i2.54080

Download this PDF file

PDF

Statistics

How to Cite

Hameed, S. B. ., & Taher, M. M. . (2025). A Proposed Method Based on Logistic Regression and Cluster Analysis in Selecting Influential Variables for Kidney Failure Patients. IRAQI JOURNAL OF STATISTICAL SCIENCES, 22(2), 87–95. https://doi.org/10.33899/iqjoss.v22i2.54080