ejeai Open Access Journal

European Journal of Emerging Artificial Intelligence

eISSN: Applied
Publication Frequency : 2 Issues per year.

  • Peer Reviewed & International Journal
Table of Content
Issues (Year-wise)
Loading…

Open Access iconOpen Access

ARTICLE

OPTIMIZING SHAP EXPLANATIONS: A COST-EFFECTIVE DATA SAMPLING METHOD FOR ENHANCED INTERPRETABILITY

1 School of Data Science, The Chinese University of Hong Kong, Hong Kong
2 Department of Computer Engineering, Politecnico di Milano, Italy

Citations: Loading…
ABSTRACT VIEWS: 153   |   FILE VIEWS: 49   |   PDF: 49   HTML: 0   OTHER: 0   |   TOTAL: 202
Views + Downloads (Last 90 days)
Cumulative % included

Abstract

The proliferation of complex machine learning (ML) models in critical domains such as healthcare, finance, and real estate has underscored the urgent need for Explainable Artificial Intelligence (XAI) [2, 3, 4, 31, 43, 45]. While these "black-box" models often achieve superior predictive performance, their inherent lack of transparency hinders trust, accountability, and the ability to effectively debug and refine them. SHapley Additive exPlanations (SHAP) is a widely recognized model-agnostic XAI method that provides detailed insights into individual feature contributions to model predictions, robustly grounded in cooperative game theory [39, 58, 61]. However, a significant drawback of SHAP, particularly when applied to large datasets or computationally intensive models, is its substantial computational overhead, often rendering its application impractical in resource-constrained or real-time operational environments [60, 64]. This article proposes and rigorously investigates a data-efficient strategy for achieving SHAP interpretability by leveraging intelligent data reduction techniques. Specifically, we explore the application of Slovin's formula, a statistical sampling technique traditionally employed in survey research, as a low-cost heuristic for data reduction. Unlike more complex feature selection or dimensionality reduction methods, Slovin's formula requires minimal prior statistical knowledge of the dataset's properties, offering a straightforward, accessible, and efficient alternative for subsampling without extensive preprocessing. Through controlled experiments on synthetically generated datasets, we demonstrate that by judiciously sampling a representative subset of the original data, SHAP explanations can be generated with significantly reduced computational cost while maintaining a high degree of fidelity to the explanations derived from the full dataset. Our findings highlight a U-shaped trade-off in SHAP value stability: mid-ranked features tend to remain more stable under subsampling, whereas features with extreme (very low or very high) importance exhibit higher fluctuations. Furthermore, we observe that categorical and non-skewed distributed features generally maintain greater robustness, while highly skewed target distributions introduce increased variability. Crucially, the effectiveness and reliability of Slovin's formula diminish when the subsample-to-sample ratio falls below a critical threshold of approximately 5%. This empirical evaluation underscores the potential of our cost-effective approach to democratize access to advanced interpretability, enabling faster model insights, improved debugging, and broader, more sustainable deployment of transparent AI systems in various domains.


Keywords

Cryptographic hash functions, Preimage attacks, Cube-and-Conquer, SAT solvers

References

[1] A. M. Abdullahi. 2023. The challenges of advancing inclusive education: the case of somalia’s higher education.Journal of Law andSustainable Development, 11, 2, e422–e422.

[2] A. A. Adeniran, A. P. Onebunne, and P. William. 2024. Explainable ai (xai) in healthcare: enhancing trust and transparency in criticaldecision-making.World Journal of Advanced Research and Reviews, 23, 2647–2658.

[3] Q. An, S. Rahman, J. Zhou, and J. J. Kang. 2023. A comprehensive review on machine learning in healthcare industry: classification,restrictions, opportunities and challenges.Sensors, 23, 9, 4178.

[4] Z. Asimiyu. 2024. Balancing explainable ai and security: machine learning for iot, finance, and real estate. Preprint. (2024).

[5] S. Athey and G. W. Imbens. 2019. Machine learning methods that economists should know about.Annual Review of Economics, 11, 1,685–725.


How to Cite

OPTIMIZING SHAP EXPLANATIONS: A COST-EFFECTIVE DATA SAMPLING METHOD FOR ENHANCED INTERPRETABILITY. (2024). European Journal of Emerging Artificial Intelligence, 1(01), 71-89. https://parthenonfrontiers.com/index.php/ejeai/article/view/49

Share Link