Analyzing the Impact of Various Criteria on Blood Pressure Through General Linear Model by Box-Cox Transformations
Abstract
Blood pressure is a key indicator of cardiovascular health, reflecting the force exerted by blood against arterial walls. Data transformation tools are essential in statistical analysis for improving assumptions necessary for linear models, such as normality, linearity, and homoscedasticity, especially when these assumptions are violated. These techniques are especially helpful for correcting distortions in data structure and enhancing the validity of General Linear Models (GLMs). In this study, we applied the Box-Cox transformation, a method from the family of power transformations, to improve a linear regression model. Our objective was to identify the most appropriate power transformation to enhance model performance and interpretability, using statistical criteria on blood test datasets collected from Azadi Hospital in Duhok. These evaluation criteria are essential for the accurate interpretation of data, as they assess the modeBlood pressure is a key indicator of cardiovascular health, reflecting the force exerted by blood against arterial walls. Data transformation tools are essential in statistical analysis for improving assumptions necessary for linear models, such as normality, linearity, and homoscedasticity, especially when these assumptions are violated. These techniques are especially helpful for correcting distortions in data structure and enhancing the validity of General Linear Models (GLMs). In this study, we applied the Box-Cox transformation, a method from the family of power transformations, to improve a linear regression model. Our objective was to identify the most appropriate power transformation to enhance model performance and interpretability, using statistical criteria on blood test datasets collected from Azadi Hospital in Duhok. These evaluation criteria are essential for the accurate interpretation of data, as they assess the model’s quality and reliability. The criteria used included: adjusted R-squared, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), F-statistic, Maximum Likelihood Estimation (MLE), Root Mean Square Error (RMSE), and the Shapiro–Wilk test. These measures also guided the selection of the most appropriate GLM. We can conclude that different values optimize different model performance aspects: balances good model fit ( F-statistic). gives the most accurate predictions (lowest error). improves likelihood and residual normality. A computational algorithm was proposed to estimate the optimal power parameter, and the results of the criteria were discussed and compared. Based on the adjusted R-squared criterion, an optimal λ value was identified, indicating a strong model fit.l’s quality and reliability. The criteria used included: adjusted R-squared, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), F-statistic, Maximum Likelihood Estimation (MLE), Root Mean Square Error (RMSE), and the Shapiro–Wilk test. These measures also guided the selection of the most appropriate GLM. We can conclude that different values optimize different model performance aspects: balances good model fit ( F-statistic). gives the most accurate predictions (lowest error). improves likelihood and residual normality. A computational algorithm was proposed to estimate the optimal power parameter, and the results of the criteria were discussed and compared. Based on the adjusted R-squared criterion, an optimal λ value was identified, indicating a strong model fit.
References
- References
- Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370–384. https://doi.org/10.2307/2344614
- Dobson, A. J. (2002). An introduction to generalized linear models (2nd ed.). Chapman & Hall/CRC. https://doi.org/10.1201/9781315182780
- Ruppert, D. (2001). Statistical Analysis, Special Problems of: Transformations of Data. International Encyclopedia of the Social & Behavioral Sciences, 15007- 15014. https://doi.org/10.1016/B0-08-043076-7/00513-1
- Box, G. E. P., & Cox, D. R. (1964). An Analysis of Transformations. J R Stat Soc B Methodol, 26(2), 211–252. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x
- Yeo, I., & Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954–959. https://doi.org/10.1093/biomet/87.4.954
- Yang, Z. (2006). A modified family of power transformations. Economics Letters, 92(1), 14–19. https://doi.org/10.1016/j.econlet.2006.01.011
- Hou, Q., Mahnken, J. D., Gajewski, B. J., & Dunton, N. (2011). The Box-Cox power transformation on nursing sensitive indicators: Does it matter if structural effects are omitted during the estimation of the transformation parameter? BMC Medical Research Methodology, 11(1), 118. https://doi.org/10.1186/1471-2288-11-118
- Raymaekers, J., & Rousseeuw, P. J. (2020). Transforming variables to central normality. Machine Learning, 113(8), 4953–4975. https://doi.org/10.1007/s10994-021-05960-5
- Hawkins, D. M. (2024). Testing normality of data transformed by maximum likelihood Box-Cox. School of Statistics, University of Minnesota. https://doi.org/10.48550/arXiv.2407.19329
- Wichitaksorn, N., Choy, S. T. B., & Gerlach, R. (2014). A generalized class of skew distributions and associated robust quantile regression models. Canadian Journal of Statistics, 42(4), 579–596. https://doi.org/10.1002/cjs.11228
- Pek, J., Wong, O., & Wong, C. M. (2017). Data Transformations for Inference with Linear Regression: Clarifications and Recommendations. Practical Assessment, Research and Evaluation, 22(9), 1–11. https://doi.org/10.7275/2w3n-0f07
- Marill, K. A. (2004). Advanced statistics: Linear regression, part II: Multiple linear regression. Academic Emergency Medicine, 11(1), 94–102. https://doi.org/10.1197/j.aem.2003.09.006
- Lin, L., Jiang, W., Chen, B., Yu, J., & Zheng, C. (2024). Construction and Application of Cost Prediction Model Based on Multiple Linear Regression Analysis. In Procedia Computer Science (Vol. 247, Issue C, pp. 617–623). https://doi.org/10.1016/j.procs.2024.10.074
- Ramachandran, K. M., & Tsokos, C. P. (2009). Mathematical statistics with applications. Burlington. Elsevier Academic Press.
- Mohammed, A. H., & Mahdi, M. J. (2025). Comparison of estimation methods for the parameters of the Fréchet distribution using simulation. Iraqi Journal of Statistical Sciences, 22(1), 101–113. https://doi.org/10.33899/iqjoss.2025.187758
- Hogg, R. V., McKean, J. W., & Craig, A. T. (2019). Introduction to mathematical statistics (8th ed.). Boston. Pearson. ISBN 978-0-13-468699-8.
- Atkinson, A. C., Riani, M., & Corbellini, A. (2021). The Box–Cox Transformation: Review and Extensions. Statistical Science, 36(2), 239–255. DOI: 10.1214/20-STS778
- Wohlwend, B. (2023). Regression model evaluation metrics: R-squared, adjusted R-squared, MSE, RMSE, and MAE. Medium.
- Pirenne, S., & Claeskens, G. (2024). Exact post-selection inference for adjusted R squared selection. Statistics and Probability Letters, 211, 11013. https://doi.org/10.1016/j.spl.2024.110133
- Miles, J. (2005). R Squared, Adjusted R Squared. Wiley StatsRef: Statistics Reference Online, 4, 1655–1657. https://doi.org/10.1002/0470013192.bsa526
- Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
- de Myttenaere, A., Golden, B., Le Grand, B., & Rossi, F. (2016). Mean Absolute Percentage Error for regression models. Neurocomputing, 192, 38–48. https://doi.org/10.1016/j.neucom.2015.12.114
- Gupta, S. C., & Kapoor, V. K. (2000). Fundamentals of mathematical statistics: A modern approach (10th rev. ed.). Sultan Chand & Sons.
- Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). Hoboken. John Wiley & Sons. https://doi.org/10.1002/9780471722199.ch1
- El-horbaty, Y. S. (2024). A Note on Effective Transformation-based Exact F-test for Sub- Clustering Effect in Two-Fold Nested Error ANOVA Model. Journal of Statistics and Computer Science, 3(1), 79-89. https://doi.org/10.47509/JSCS.2024.v03i01.05
- Khatun, N. (2021). Applications of Normality Test in Statistical Analysis. Open Journal of Statistics, 11(01), 113–122. DOI:10.4236/ojs.2021.111006





