An insight from machine learning perspective for COVID-19 survival prediction in Malaysia based on demographic factor

Nur Fatin Azwin A. Talib, Siti Meriam Zahari, Mahayaudin M. Mansor, Sumayyah Dzulkifly, Noryanti Nasir, S. Sarifah Radiah Shariff, Nurakmal Ahmad Mustaffa

Article ID: 9877
Vol 9, Issue 1, 2025


Abstract


Malaysia reported its first imported COVID-19 case on 23 January 2020, which marked the country’s first confirmed positive case. The first case in Malaysia was from eight close contacts in Johor. The global health landscape has been significantly impacted by the COVID-19 pandemic, with mortality or survival being critical outcomes of interest. This study aims to predict COVID-19 survival occurrences in Malaysia by utilizing machine learning approaches based on demographic factors. The dataset used in this study comprises demographic information of 2,151,315 COVID-19 patients, including nationality, regions, age groups, gender, medical history, vaccine brands, and the number of vaccine doses received between 2020 and 2022. Four machine learning algorithms, namely Logistic Regression, Naïve Bayes, Support Vector Machine, and Artificial Neural Network were employed to assess the relationship between demographic factors and COVID-19 survival. To evaluate the model performance, the datasets are categorized into imbalanced and balanced (down-sampling). The results indicate that the balanced dataset (down-sampling) outperforms the imbalanced dataset in terms of overall accuracy, sensitivity, specificity, precision, and Area Under the Curve (AUC). Based on the analysis, the Artificial Neural Network (ANN) classifier exhibited the highest performance with a specificity 95.2% on a balanced dataset. The model excels in accurately identifying survivors, thereby minimizing false mortality predictions and is selected as the best model for predicting COVID-19 survival. Its capacity to process larger sample sizes, combined with numerous interconnected nodes, enables it to identify complex patterns and extract meaningful insights from diverse datasets, such as demographic factors. Additionally, the optimization of parameters, including the number of layers, learning rate, and activation functions, significantly contributed to its superior accuracy. The study identifies that those of chronic diseases, male, and aged 45 and above as the notable factors associated with lower survival rates among COVID-19 patients. The findings underscore the importance of completing the vaccination series by obtaining at least the second dose, as the first dose alone may not offer sufficient protection. In conclusion, this study successfully achieves its objectives by identifying the optimal dataset configuration and predictive model for forecasting COVID-19 survival based on demographic factors. This network could serve as a benchmark model classifier, offering a valuable tool to predict and promote vaccinations, as well as optimize the general healthcare system during the pandemic outbreak. The study not only contributes to the theoretical understanding of effective COVID-19 prediction but also emphasizes the practical implications of integrating advanced machine learning techniques into pandemic management strategies. Future research can build upon these findings by exploring additional machine learning techniques and considering geographical and environmental factors to further enhance the accuracy of long-term predictions.

Keywords


COVID-19; public health; machine learning; vaccination; prediction model; demographic factors

Full Text:

PDF


References


Abdipour, M., Younessi-Hmazekhanlu, M., Ramazani, S. H. R., et al. (2019). Artificial Neural Networks and multiple linear regression as potential methods for modeling seed yield of safflower (Carthamus tinctorius L.). Industrial Crops and Products, 127, 185–194.

Ahmad, W. M. A. W., et al. (2021). COVID-19: A scenario of Malaysian mortality. International Medical Journal, 28.

Alimadadi, A., et al. (2020). Artificial intelligence and machine learning to fight COVID-19.

Almufty, H. B., Mohammed, S. A., Abdullah, A. M., & Merza, M. A. (2021). Potential adverse effects of COVID-19 vaccines among Iraqi population: A comparison between the three available vaccines in Iraq; A retrospective cross-sectional study. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 15, 102207.

An, X., et al. (2022). Economic burden of public health care and hospitalization associated with COVID-19 in China. Public Health, 203, 65–74.

Andonov, D., et al. (2023). Impact of the COVID-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality. BMC Medical Informatics and Decision Making, 23, 67.

Arifin, W. N., et al. (2021). A brief analysis of the COVID-19 death data in Malaysia. medRxiv.

Assaf, D., et al. (2020). Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Internal and Emergency Medicine, 15, 1435–1443.

Bae, J. K. (2012). Predicting financial distress of the South Korean manufacturing industries. Expert Systems with Applications, 39, 9159–9165.

Bhatraju, P. K., et al. (2020). COVID-19 in critically ill patients in the Seattle region—Case series. New England Journal of Medicine, 382, 2012–2022.

Bravata, D. M., et al. (2021). Association of intensive care unit patient load and demand with mortality rates in US Department of Veterans Affairs hospitals during the COVID-19 pandemic. JAMA Network Open, 4, e2034266.

De Souza, F. S. H., Hojo-Souza, N. S., Dos Santos, E. B., Da Silva, C. M., & Guidoni, D. L. (2021). Predicting the disease outcome in COVID-19 positive patients through machine learning: A retrospective cohort study with Brazilian data. Frontiers in Artificial Intelligence, 4, 579931.

Delmas, B. (2004). Pierre-François Verhulst et la loi logistique de la population. Mathématiques et Sciences Humaines. Mathématiques et Sciences Sociales.

Dessie, Z. G., & Zewotir, T. (2021). Mortality-related risk factors of COVID-19: A systematic review and meta-analysis of 42 studies and 423,117 patients. BMC Infectious Diseases, 21, 855.

European Centre for Disease Prevention and Control, et al. (2020). Latest updates on COVID-19 from the European Centre for Disease Prevention and Control. Eurosurveillance, 25, 2002131.

Figueredo, A. J., & Wolf, P. S. A. (2009). Assortative pairing and life history strategy – A cross-cultural study. Human Nature, 20, 317–330. https://doi.org/10.1007/s12110-009-9068-2.

He, Y.-F., et al. (2023). Correlation between COVID-19 vaccination and diabetes mellitus: A systematic review. World Journal of Diabetes, 14, 892–918.

Health Ontario. (2022). COVID-19 transmission through short and long-range respiratory particles.

Huang, C., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet, 395, 497–506.

Ivanciuc, O., et al. (2007). Applications of support vector machines in chemistry. Reviews in Computational Chemistry, 23, 291.

Jayaweera, M., Perera, H., Gunawardana, B., & Manatunge, J. (2020). Transmission of COVID-19 virus by droplets and aerosols: A critical review on the unresolved dichotomy. Environmental Research, 188, 109819.

Jefferson, L., Heathcote, C., & Bloor, K. (2023). General practitioner well-being during the COVID-19 pandemic: A qualitative interview study. BMJ Open, 13, e061531.

Kawka, M., Dawidziuk, A., Jiao, L. R., & Gall, T. M. (2022). Artificial intelligence in the detection, characterisation, and prediction of hepatocellular carcinoma: A narrative review. Translational Gastroenterology and Hepatology, 7.

Khan, S. A., et al. (2019). Lungs nodule detection framework from computed tomography images using support vector machine. Microscopy Research and Technique, 82, 1256–1266.

Kiang, R., et al. (2006). Meteorological, environmental remote sensing and neural network analysis of the epidemiology of malaria transmission in Thailand. Geospatial Health, 1, 71–84.

Kumar, R., et al. (2020). Accurate prediction of COVID-19 using chest X-ray images through deep feature learning model with SMOTE and machine learning classifiers. medRxiv.

Lal, A., Lim, C., Almeida, G., & Fitzgerald, J. (2022). Minimizing COVID-19 disruption: Ensuring the supply of essential health products for health emergencies and routine health services. The Lancet Regional Health, 6.

Lan, L., et al. (2020). Positive RT-PCR test results in patients recovered from COVID-19. JAMA, 323, 1502–1503.

Marohasy, J., & Abbot, J. (2015). Assessing the quality of eight different maximum temperature time series as inputs when using Artificial Neural Networks to forecast monthly rainfall at Cape Otway, Australia. Atmospheric Research, 166, 141–149.

Medhekar, D. S., Bote, M. P., & Deshmukh, S. D. (2013). Heart disease prediction system using naive Bayes. International Journal of Enhanced Research in Science Technology & Engineering, 2.

Mollalo, A., Rivera, K. M., & Vahedi, B. (2020). Artificial Neural Network modeling of novel coronavirus (COVID-19) incidence rates across the continental United States. International Journal of Environmental Research and Public Health, 17, 4204.

Papoutsi, E., Giannakoulis, V. G., Ntella, V., Pappa, S., & Katsaounou, P. (2020). Global burden of COVID-19 pandemic on healthcare workers.

Pourhomayoun, M., & Shakibi, M. (2021). Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making. Smart Health, 20, 100178.

Prakash, K. B., Imambi, S. S., Ismail, M., Kumar, T. P., & Pawan, Y. (2020). Analysis, prediction, and evaluation of COVID-19 datasets using machine learning algorithms. International Journal, 8, 2199–2204.

Rehman, A. (2021). Light microscopic iris classification using ensemble multi-class support vector machine. Microscopy Research and Technique, 84, 982–991.

Roland, L. T., Gurrola, J. G., Loftus, P. A., Cheung, S. W., & Chang, J. L. (2020). Smell and taste symptom-based predictive model for COVID-19 diagnosis. International Forum of Allergy & Rhinology, 10, 832–838.

Sagan, A., et al. (2020). COVID-19 and health systems resilience: Lessons going forwards. Eurohealth, 26, 20–24.

Sakagianni, A., et al. (2023). Prediction of COVID-19 mortality in the intensive care unit using machine learning. Caring Is Sharing–Exploiting the Value in Data for Health and Innovation, 536.

Schrimpf, A., Bleckwenn, M., & Braesigk, A. (2023). COVID-19 continues to burden general practitioners: Impact on workload, provision of care, and intention to leave. Healthcare, 11, 320.

Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526–534.

Shah, A. U. M., et al. (2020). COVID-19 outbreak in Malaysia: Actions taken by the Malaysian government. International Journal of Infectious Diseases, 97, 108–116.

Sharma, S., Alsmadi, I., Alkhawaldeh, R. S., & Al-Ahmad, B. (2022). Data-driven analysis and predictive modeling on COVID-19. Concurrency and Computation: Practice and Experience, 34, e7390.

Shipe, M. E., Deppen, S. A., Farjah, F., & Grogan, E. L. (2019). Developing prediction models for clinical use using Logistic Regression: An overview. Journal of Thoracic Disease, 11(Suppl 4), S574.

Soares, A., Thakker, P., Deych, E., Jain, S., & Bhayani, R. K. (2021). The impact of COVID-19 on dual-physician couples: A disproportionate burden on women physicians. Journal of Women’s Health, 30, 665–671.

Stadnytskyi, V., Bax, C. E., Bax, A., & Anfinrud, P. (2020). The airborne lifetime of small speech droplets and their potential importance in SARS-CoV-2 transmission. Proceedings of the National Academy of Sciences, 117, 11875–11877.

World Health Organization, et al. (2022). COVID-19 weekly epidemiological update, edition 101, 20 July 2022.

Yan, L., et al. (2020). A machine learning-based model for survival prediction in patients with severe COVID-19 infection.

Zahid, M. N., & Perna, S. (2021). Continent-wide analysis of COVID-19: Total cases, deaths, tests, socio-economic, and morbidity factors associated with the mortality rate, and forecasting analysis in 2020–2021. International Journal of Environmental Research and Public Health, 18, 5350.

Zamzuri, M. I. A., et al. (2020). Epidemiological characteristics of COVID-19 in Seremban, Negeri Sembilan, Malaysia. Open Access Macedonian Journal of Medical Sciences, 8, 471–475.

Zhu, X., et al. (2021). Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. Medical Image Analysis, 67, 101824.




DOI: https://doi.org/10.24294/jipd9877

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Nur Fatin Azwin A. Talib, Siti Meriam Zahari, Mahayaudin M. Mansor, Sumayyah Dzulkifly, Noryanti Nasir, S. Sarifah Radiah Shariff, Nurakmal Ahmad Mustaffa

License URL: https://creativecommons.org/licenses/by/4.0/

This site is licensed under a Creative Commons Attribution 4.0 International License.