A Proposed Method Based on Logistic Regression and Cluster Analysis in Selecting Influential Variables for Kidney Failure Patients
Abstract
The research aims to study kidney failure by analyzing the relationship between it and a set of independent variables. To achieve this, a method was proposed that relies on reducing the number of independent variables used in binary logistic regression. The method relies on merging the independent variables with the dependent variable using cluster analysis, to improve the accuracy of the model and obtain the best possible results.The proposed method was applied to a sample of 142 individuals to study the relationship between the response variable (renal and non-renal failure) and independent variables such as gender, age, smoking, urea, creatine, and calcium. The results showed that the proposed method succeeded in reducing the number of independent variables and provided an ideal model that classifies the data with high accuracy. The resulting model focused on the two most influential variables, urea and creatine, and achieved a high classification rate of 94.4%.,The proposed method proved effective in reducing the number of variables and achieving accurate results in classifying data related to kidney failure.
References
- -Thompson, S., James, M., Wiebe, N., Hemmelgarn, B., Manns, B., Klarenbach, S., & Tonelli, M. (2015). Cause of death in patients with reduced kidney function. Journal of the American Society of Nephrology, 26(10), 2504-2511.
- DOI: 10.1681/ASN.2014070714
- -Aqlan, F., Markle, R., & Shamsan, A. (2017). Data mining for chronic kidney disease prediction. In IIE Annual Conference. Proceedings (pp. 1789-1794). Institute of Industrial and Systems Engineers (IISE).
- https://www.researchgate.net/profile/Abdulrahman-Shamsan/publication/331440652
- -Cheng, Y., Shang, J., Liu, D., Xiao, J., & Zhao, Z. (2020). Development and validation of a predictive model for the progression of diabetic kidney disease to kidney failure. Renal failure, 42(1), 550-559.
- https://doi.org/10.1080/0886022X.2020.1772294
- -Bai, Q., Su, C., Tang, W., & Li, Y. (2022). Machine learning to predict end-stage kidney disease in chronic kidney disease. Scientific reports, 12(1), 8377.
- https://doi.org/10.1038/s41598-022-12316-z
- -Khan, N., Raza, M. A., Mirjat, N. H., Balouch, N., Abbas, G., Yousef, A., & Touti, E. (2024). Unveiling the predictive power: a comprehensive study of a machine learning model for anticipating chronic kidney disease. Frontiers in Artificial Intelligence, 6, 1339988.
- https://doi.org/10.3389/frai.2023.1339988
- -Wubalem, A., & Meten, M. (2020). Landslide susceptibility mapping using information value and logistic regression models in Goncha Siso Eneses area, northwestern Ethiopia. SN Applied Sciences, 2, 1-19.
- https://doi.org/10.1007/s42452-020-2563-0
- -Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environmental Modelling & Software, 25(6), 747-759.
- DOI: 10.1016/j.envsoft.2009.10.016
- -Alshebly,O.Q. & Ahmed, R. M,(2019).Prediction and Factors Affecting of Chronic Kidney Disease Diagnosis using Artificial Neural Networks Model and Logistic Regression Model.Journal of Statistical Sciences, 16(1), 140-159.
- DOI:10.33899/iqjoss.2019.0164186
- -Ali,Mohammed F.,Taha,Hutheyfa H (2025).Comparison between Logistic Regression and K-Nearest Neighbour Techniques with Application on Thalassemia Patients in Mosul.Iraqi Journal of Statistical Sciences, 22(1), 151-167.
- DOI: 10.33899/iqjoss.2025.187789
- -Ibrahim,N.S .Mohammed,N. N. & Mahmood,S. W.(2020).Multicollinearity in Logistic Regression Model -Subject Review-.Journal of Statistical Sciences, 17(1), 46-53.
- DOI:10.33899/iqjoss.2020.0165448
- Sujatha, E. R., & Sridhar, V. (2021). Landslide susceptibility analysis: A logistic regression model case study in Coonoor, India. Hydrology, 8(1), 41.
- https://doi.org/10.3390/hydrology8010041
- -Ebrahimi Kalan, M., Jebai, R., Zarafshan, E., & Bursac, Z. (2021). Distinction between two statistical terms: multivariable and multivariate logistic regression. Nicotine and Tobacco Research, 23(8), 1446-1447.
- https://d1wqtxts1xzle7.cloudfront.net/92080508/ntaa055-libre.pdf
- -David W. Hosmer, Jr., Stanley Lemeshow and Rodney X. Sturdivant, "Applied Logistic Regression", 3rd Edition. , John Wiley & Sons, (2013). DOI: 10.1080/00401706.1992.10485291
- -Essa, A. K., SH, L. F., & Shihab, D. H. (2023). A comparison between the hierarchical clustering methods for postgraduate students in Iraqi universities for the year 2019-2020 using the cophenetic and delta correlation coefficients. Periodicals of Engineering and Natural Sciences, 11(1), 174-185.
- file:///C:/Users/hjzxd/Downloads/174-185_3454-7914-1-RV.pdf
- -Härdle, W. K., & Simar, L. (2015). Applied multivariate statistical analysis. Springer Nature.
- http://springer.com/978-3-662-45170-0
- -Hamad, B. A. (2023). Combining Cluster Analysis with Multiple Linear Regression Analysis to Create the Most Accurate Prediction Model for Evaporation in the Kurdistan Region of. Iraqi Journal of Statistical Sciences, 20(2), 188-199.
- DOI: 10.33899/iqjoss.2023.181226
- -Manasa, P., Ananth, P., Natarajan, P., Somasundaram, K., Rajkumar, E. R., Ravichandran, K. S., ... & Gandomi, A. H. (2024). An analysis of causative factors for road accidents using partition around medoids and hierarchical clustering techniques. Engineering Reports, e12793. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/eng2.12793
- -Sarma, K. V. S., & Vardhan, R. V. (2018). Multivariate statistics made simple: a practical approach. Chapman and Hall/CRC.https://doi.org/10.1201/9780429465185
- -Adel ,Zainab & Rashed Safwan.(2021).Using the linear and non-linear discriminant function with cluster analysis to study the level of education for the completed stages (governmental – private) In Nineveh Governorate. Iraqi Journal of Statistical Sciences, 18(1), 88-98. DOI: 10.33899/iqjoss.2021.0168377





