A systematic review of algorithm auditing processes to assess bias and risks in AI systems

Vusumzi Funda

Article ID: 11489
Vol 9, Issue 2, 2025

VIEWS - 18 (Abstract)

Abstract


The expanding adoption of artificial intelligence systems across high-impact sectors has catalyzed concerns regarding inherent biases and discrimination, leading to calls for greater transparency and accountability. Algorithm auditing has emerged as a pivotal method to assess fairness and mitigate risks in applied machine learning models. This systematic literature review comprehensively analyzes contemporary techniques for auditing the biases of black-box AI systems beyond traditional software testing approaches. An extensive search across technology, law, and social sciences publications identified 22 recent studies exemplifying innovations in quantitative benchmarking, model inspections, adversarial evaluations, and participatory engagements situated in applied contexts like clinical predictions, lending decisions, and employment screenings. A rigorous analytical lens spotlighted considerable limitations in current approaches, including predominant technical orientations divorced from lived realities, lack of transparent value deliberations, overwhelming reliance on one-shot assessments, scarce participation of affected communities, and limited corrective actions instituted in response to audits. At the same time, directions like subsidiarity analyses, human-centered tools, and corrective programming offer templates to advance auditing processes as embedded socio-technical instruments supporting context-specific translation of signals into governing actions. Substantial innovation remains necessary for institutionalizing continuous, holistic, and participative auditing capabilities that can steward equitable algorithm development rather than remain detached arbiters.


Keywords


algorithm auditing; AI bias; machine learning fairness; algorithmic accountability; technical assessments; participatory auditing

Full Text:

PDF


References


Aggarwal, A., Lohia, P., Nagar, S., et al. (2019). Black box fairness testing of machine learning models. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering; 26–30 August 2019; Tallinn, Estonia. pp. 625-635. https://doi.org/10.1145/3338906.3338937

Anderson, C. (2008) The end of theory: The data deluge makes the scientific method obsolete. Wired magazine, 16(7), 16-07.

Aravantinos, S., Lavidas, K., Voulgari, I., et al. (2024). Educational Approaches with AΙ in Primary School Settings: A Systematic Review of the Literature Available in Scopus. Education Sciences, 14(7), 744. https://doi.org/10.3390/educsci14070744

Arnold, K., Gosling, J., Holmes, D. (2017). The Java programming language, 5th ed. Addison Wesley.

Bandy, J. (2021). Problematic machine behavior: A systematic literature review of algorithm audits. In: Proceedings of the ACM on Human-Computer Interaction, (CSCW1). pp. 1-34. https://doi.org/10.1145/3449148

Belk, R. (2014). Sharing versus pseudo-sharing in Web 2.0. The Anthropologist, 18(1), 7-23. https://doi.org/10.1080/09720073.2014.11891556

Bellamy, R. K., Dey, K., Hind, M., et al. (2018). AI fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv, arXiv:1810.01943.

Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim code. John Wiley & Sons.

Creswell, J. W., Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications.

DeVos, A., Dhabalia, A., Shen, H., et al. (2022). Toward User-Driven Algorithm Auditing: Investigating users’ strategies for uncovering harmful algorithmic behavior. CHI Conference on Human Factors in Computing Systems, 1–19. https://doi.org/10.1145/3491102.3517441

Ehsan, H., Beebe, C., Cardella, M. E. (2017). Promoting computational thinking in children using apps. In: Proceedings of the 2017 American Society for Engineering Education (ASEE) Annual Conference & Exposition; 24-28 June 2017; Columbus, Ohio. https://doi.org/10.18260/1-2—28772

European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending certain Union legislative acts (Artificial Intelligence Act). European Union.

Feynman, R. P. (2017). The character of physical law. MIT Press.

Fioretto, F., Pontelli, E., Yeoh, W. (2018). Distributed constraint optimization problems and applications: A survey. Journal of Artificial Intelligence Research, 61, 623-698. https://doi.org/10.1613/jair.1.11203

Galdon Clavell, G., Martín Zamorano, M., Castillo, C., et al. (2020). Auditing algorithms: On lessons learned and the risks of data minimization. In: Proceedings of the 2020 ACM AI, Ethics, and Society Conference (AIES ‘20). pp. 265–271. https://doi.org/10.1145/3375627.3375852

Gao, F., Blunier, B., Miraoui, A. (2013). Proton exchange membrane fuel cells modelling. John Wiley & Sons.

Gao, R., Cui, S., Wang, Y., et al. (2025). Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework. Financial Innovation, 11(1). https://doi.org/10.1186/s40854-024-00745-w

Groves, L., Metcalf, J., Kennedy, A. (2024). Auditing work: Exploring the New York City algorithmic bias audit regime. In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ‘24). pp. 1107-1120. https://doi.org/10.1145/3630106.3658959

Hasan, M., Sundsøy, P., Bjelland, J., Pentland, A. (2022). Algorithmic bias and risk assessments: lessons from practice. Philosophy & Technology, 35(1), 99-120. https://doi.org/10.1007/s13347-021-00500-0

Kacprzyk, J., Pedrycz, W. (2015). Springer handbook of computational intelligence. Springer. https://doi.org/10.1007/978-3-662-43505-2

Kane, C. L. (2010). ‘Programming the Beautiful’ Informatic Color and Aesthetic Transformations in Early Computer Art. Theory, Culture & Society, 27(1), 73-93. https://doi.org/10.1177/0263276409358447

Katell, M., Young, M., Dailey, D., et al. (2020). Toward situated interventions for algorithmic equity: lessons from the field. In: Proceedings of the Conference on Fairness, Accountability, and Transparency; 27-30 January 2020; Barcelona, Spain. pp. 1048-1059. https://doi.org/10.1145/3351095.3372874

Kolstø, S. D. (2001). Scientific literacy for citizenship: Tools for dealing with the science dimension of controversial socioscientific issues. Science education, 85(3), 291-310. https://doi.org/10.1002/sce.1020

Kou, G., & Lu, Y. (2025). FinTech: a literature review of emerging financial technologies and applications. Financial Innovation, 11(1). https://doi.org/10.1186/s40854-024-00668-6

Landers, R. N., Behrend, T. S. (2021). AI for algorithmic auditing: mitigating bias and improving fairness in big data systems. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(3), e1369. https://doi.org/10.1002/widm.1369

Lavidas, K., Voulgari, I., Papadakis, S., et al. (2024). Determinants of Humanities and Social Sciences Students’ Intentions to Use Artificial Intelligence Applications for Academic Purposes. Information, 15(6), 314. https://doi.org/10.3390/info15060314

Leite, A. R., Enembreck, F., Barthes, J. P. A. (2014). Distributed constraint optimization problems: Review and perspectives. Expert Systems with Applications, 41(11), 5139-5157. https://doi.org/10.1016/j.eswa.2014.03.016

Li, T., Kou, G., Peng, Y., et al. (2024). Feature Selection and Grouping Effect Analysis for Credit Evaluation via Regularized Diagonal Distance Metric Learning. INFORMS Journal on Computing. https://doi.org/10.1287/ijoc.2023.0322

Liu, J., Ji, S., Northcutt, C. G., et al. (2024). FairCompass: Operationalising Fairness in Machine learning. ArXiv.

Loder, C. Something to hide: individual strategies for personal privacy practices. Proceedings of the 9th iConference, 4-7 March 2014, Berlin. 814-819. https://doi.org/10.9776/14403

Lu, Z., Kazi, R. H., Wei, L. Y., et al. (2021). Streamsketch: Exploring multi-modal interactions in creative live streams. In: Proceedings of the ACM on Human-Computer Interaction. pp. 1-26. https://doi.org/: https://doi.org/10.1145/3449132

Macal, C. M. (2016). Everything you need to know about agent-based modelling and simulation. Journal of Simulation, 10, 144-156. https://doi.org/10.1057/s41273-016-0007-5

Mehrabi, N., Morstatter, F., Saxena, N., et al. (2021). A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607

Moher, D., Liberati, A., Tetzlaff, J., et al. (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Annals of Internal Medicine, 151(4), 264–269. https://doi.org/10.7326/0003-4819-151-4-200908180-00135

Morales-Navarro, L., Kafai, Y. B., Konda, V., & Metaxa, D. (2024). Youth as peer auditors: Engaging teenagers with algorithm auditing of machine learning applications. In: Proceedings of the 23rd Annual ACM Interaction Design and Children Conference. pp. 560–573. https://doi.org/10.1145/3628516.3655752

Ovalle, A., Dev, S., Zhao, J., et al. (2023). Auditing algorithmic fairness in machine learning for health with severity-based LOGAN. In: International Workshop on Health Intelligence. Cham: Springer Nature Switzerland. pp. 123-136.

Patel, N., Uddin, Z. (2022). AI for algorithmic auditing: mitigating bias and improving fairness in big data systems. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(1), e1457. https://doi.org/10.1002/widm.1457

Patterson, D. A., Hennessy, J. L. (2017). Computer organization and design: the hardware/software interface. ELSEVER.

Raji, I. D., Smart, A., White, R. N., et al. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT ‘20)*. pp. 33–44. https://doi.org/10.1145/3351095.3372873.

Sandhu, R., Coyne, E. J., Feinstein, H. L., Youman, C. E. (1996). Role-based access control models. Computer, 29(2), 38-47. https://doi.org/10.1109/2.485845

Sharma, A., Wehrheim, H. (2020). Automatic fairness testing of machine learning models. In: Proceedings of the 32nd International Conference on Testing Software and Systems (ICTSS); December 2020; Naples, Italy. pp. 255-271. https://doi.org/10.1007/978-3-030-64881-7_16

Shneiderman, B. (2020). Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human–Computer Interaction, 36(6), 495–504. https://doi.org/10.1080/10447318.2020.1741118

Slootjes, J. (2017). Narratives of meaningful endurance: The role of sense of coherence in the health and employment of ethnic minority women [PhD] thesis. Vrije Universiteit Amsterdam.

Van Dyk, D. A., Meng, X. L. (2001). The art of data augmentation (with discussion). Journal of Computational and Graphical Statistics, 10(1), 1-111. https://doi.org/10.1198/106186001526

Vecchione, B., Levy, K., & Barocas, S. (2021). Algorithmic Auditing and Social Justice: Lessons from the History of Audit Studies. Equity and Access in Algorithms, Mechanisms, and Optimization, 1–9. https://doi.org/10.1145/3465416.3483294

Woit, P., (2017). Quantum theory, groups and representations. New York, NY, USA: Springer International Publishing. https://doi.org/10.1007/978-3-319-47729-5

Yuan, Y. H., Liu, C. H., & Kuang, S. S. (2021). An Innovative and Interactive Teaching Model for Cultivating Talent’s Digital Literacy in Decision Making, Sustainability, and Computational Thinking. Sustainability, 13(9), 5117. https://doi.org/10.3390/su13095117




DOI: https://doi.org/10.24294/jipd11489

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Author(s)

License URL: https://creativecommons.org/licenses/by/4.0/

This site is licensed under a Creative Commons Attribution 4.0 International License.