LEVERAGING ANALOGIES FOR AI EXPLAINABILITY: ENHANCING LAYPERSON UNDERSTANDING IN AI-ASSISTED DECISION MAKING

Dr. William Harper; Ethan Navarro; Dr. Luca Conti

Authors

Dr. William Harper School of Engineering and Applied Sciences, Harvard University, USA Author
Ethan Navarro Department of Computer Science, University of California, Berkeley, USA Author
Dr. Luca Conti Department of Informatics, Technical University of Munich, Germany Author

Keywords:

Explainable AI (XAI), Analogical Reasoning, Human-AI Collaboration, Layperson Understanding

Abstract

The integration of Artificial Intelligence (AI) into critical decision-making processes necessitates transparent and understandable explanations, especially for non-expert users (laypeople). While traditional Explainable AI (XAI) methods often present technical details that remain inscrutable to a lay audience, this article investigates the potential of analogy-based explanations to bridge this knowledge gap. We present a two-part empirical study. Study I focuses on the generation and qualitative assessment of analogy-based explanations using non-expert crowd workers, establishing a systematic framework for evaluating their quality across dimensions such as structural correspondence, relational similarity, and familiarity. Our findings highlight the subjective nature of analogy quality and the potential for leveraging crowdsourcing to generate diverse explanations. Study II evaluates the practical effectiveness of these analogy-based explanations in a high-stakes medical diagnosis task (skin cancer detection). Surprisingly, quantitative results did not show a significant improvement in understanding or appropriate reliance with analogy-based explanations compared to detailed concept-level explanations. However, qualitative feedback revealed that users found analogies helpful when they perceived a strong connection to a familiar source domain and when presented on demand. While explanations, including analogies, increased perceived cognitive load and decision-making time, our comprehensive analysis points to the crucial roles of human intuition and perceived plausibility in shaping user behavior. This research contributes actionable insights for designing human-centered XAI, emphasizing the need for personalized and carefully crafted analogies to truly enhance layperson understanding and foster appropriate reliance in AI-assisted decision-making.

References

1. Abdul, A., von der Weth, C., Kankanhalli, M., & Lim, B. Y. (2020). Cogam: measuring and moderating cognitive load in machine learning model explanations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–14.

2. Adams, T. L., Li, Y., & Liu, H. (2020). A replication of beyond the turk: Alternative platforms for crowdsourcing behavioral research–sometimes preferable to student groups. AIS Transactions on Replication Research, 6(1), 15.

3. Aroyo, L., & Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1), 15–24.

4. Arrieta, A. B., D ́ıaz-Rodr ́ıguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garc ́ıa, S., Gil-L ́opez, S., Molina, D., Benjamins, R., et al. (2020). Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58, 82–115.

5. Balayn, A., He, G., Hu, A., Yang, J., & Gadiraju, U. (2022a). Ready player one! eliciting diverse knowledge using A configurable game. In Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., & M ́edini, L. (Eds.), WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, pp. 1709–1719. ACM.

6. Balayn, A., Rikalo, N., Lofi, C., Yang, J., & Bozzon, A. (2022b). How can explainability methods be used to support bug identification in computer vision models?. In CHI Conference on Human Factors in Computing Systems, pp. 1–16.

7. Bansal, G., Wu, T., Zhou, J., Fok, R., Nushi, B., Kamar, E., Ribeiro, M. T., & Weld, D. (2021). Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–16.

8. Bartha, P. (2022). Analogy and Analogical Reasoning. In Zalta, E. N. (Ed.), The Stanford Encyclopedia of Philosophy (Summer 2022 edition). Metaphysics Research Lab, Stanford University.

9. Bertrand, A., Belloum, R., Eagan, J. R., & Maxwell, W. (2022). How cognitive biases affect xai-assisted decision-making: A systematic review. In Proceedings of the 2022 AAAI/ACM conference on AI, ethics, and society, pp. 78–91.

10. Bounhas, M., Pirlot, M., Prade, H., & Sobrie, O. (2019). Comparison of analogy-based methods for predicting preferences. In Amor, N. B., Quost, B., & Theobald, M. (Eds.), Scalable Uncertainty Management - 13th International Conference, SUM 2019, Compi`egne, France, December 16-18, 2019, Proceedings, Vol. 11940 of Lecture Notes in Computer Science, pp. 339–354. Springer.

11. Buccinca, Z., Lin, P., Gajos, K. Z., & Glassman, E. L. (2020). Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In IUI ’20: 25th International Conference on Intelligent User Interfaces, Cagliari, Italy, March 17-20, 2020, pp. 454–464. ACM.

12. Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., & Demartini, G. (2017). Let’s agree to disagree: Fixing agreement measures for crowdsourcing. In Fifth AAAI Conference on Human Computation and Crowdsourcing.

13. Chen, C., Feng, S., Sharma, A., & Tan, C. (2023a). Machine explanations and human understanding. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1–1.

14. Chen, V., Liao, Q. V., Wortman Vaughan, J., & Bansal, G. (2023b). Understanding the role of human intuition on reliance in human-ai decision-making with explanations. Proceedings of the ACM on Human-computer Interaction, 7(CSCW2), 1–32.

15. Chiang, C., & Yin, M. (2022). Exploring the effects of machine learning literacy interventions on laypeople’s reliance on machine learning models. In Jacucci, G., Kaski, S., Conati, C., Stumpf, S., Ruotsalo, T., & Gajos, K. (Eds.), IUI 2022: 27th International Conference on Intelligent User Interfaces, Helsinki, Finland, March 22 - 25, 2022, pp. 148–161. ACM.

16. Chiu, A., Poupart, P., & DiMarco, C. (2007). Generating lexical analogies using dependency relations. In Eisner, J. (Ed.), EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28-30, 2007, Prague, Czech Republic, pp. 561–570. ACL.

17. Chromik, M., Eiband, M., Buchner, F., Kr ̈uger, A., & Butz, A. (2021). I think i get your point, ai! the illusion of explanatory depth in explainable ai. In 26th International Conference on Intelligent User Interfaces, pp. 307–317.

18. Colligan, L., Potts, H. W., Finn, C. T., & Sinkin, R. A. (2015). Cognitive workload changes for nurses transitioning from a legacy system with paper documentation to a commercial electronic health record. International journal of medical informatics, 84(7), 469–476.

19. Cosgrove, M. (1995). A study of science-in-the-making as students generate an analogy for electricity. International journal of science education, 17(3), 295–310.

20. Dietvorst, B. J., Simmons, J. P., & Massey, C. (2018). Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science, 64(3), 1155–1170.

21. Dong, Z., & Dong, Q. (2003). Hownet-a hybrid language and knowledge resource. In International conference on natural language processing and knowledge engineering, 2003. Proceedings. 2003, pp. 820–824. IEEE.

22. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

23. Douglas, B. D., Ewell, P. J., & Brauer, M. (2023). Data quality in online human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona. Plos one, 18(3), e0279720.

24. Draws, T., Rieger, A., Inel, O., Gadiraju, U., & Tintarev, N. (2021). A checklist to combat cognitive biases in crowdsourcing. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 9, pp. 48–59.

25. Duit, R., Roth, W.-M., Komorek, M., & Wilbers, J. (2001). Fostering conceptual change by analogies—between scylla and charybdis. Learning and Instruction, 11(4-5), 283–303.

26. Ehrmann, D. E., Gallant, S. N., Nagaraj, S., Goodfellow, S. D., Eytan, D., Goldenberg, A., & Mazwi, M. L. (2022). Evaluating and reducing cognitive load should be a priority for machine learning in healthcare. Nature Medicine, 1–2.

27. Ehsan, U., & Riedl, M. O. (2020). Human-centered explainable ai: towards a reflective sociotechnical approach. In International Conference on Human-Computer Interaction, pp. 449–466. Springer.

28. Ehsan, U., Wintersberger, P., Liao, Q. V., Watkins, E. A., Manger, C., Daum ́e III, H., Riener, A., & Riedl, M. O. (2022). Human-centered explainable ai (hcxai): beyond opening the black-box of ai. In CHI Conference on Human Factors in Computing Systems Extended Abstracts, pp. 1–7.

29. Erlei, A., Sharma, A., & Gadiraju, U. (2024). Understanding choice independence and error types in human-ai collaboration. In In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24).

30. Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using g*power 3.1: Tests for correlation and regression analyses. Behavior research methods, 41(4), 1149–1160.

31. Fok, R., & Weld, D. S. (2023). In search of verifiability: Explanations rarely enable complementary performance in ai-advised decision making. AI Magazine.

32. Franke, T., Attig, C., & Wessel, D. (2019). A personal resource for technology interaction: development and validation of the affinity for technology interaction (ati) scale. International Journal of Human–Computer Interaction, 35(6), 456–467.

33. Gadiraju, U., Kawase, R., Dietze, S., & Demartini, G. (2015). Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 1631–1640.

34. Galesic, M., & Garcia-Retamero, R. (2013). Using analogies to communicate information about health risks. Applied Cognitive Psychology, 27(1), 33–42.

35. Geelan, D. (2012). Teacher explanations. Second international handbook of science education, 987–999.

36. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive science, 7(2), 155–170.

37. Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity.. American psychologist, 52(1), 45.

38. Ghorbani, A., Wexler, J., Zou, J. Y., & Kim, B. (2019). Towards automatic concept-based explanations. Advances in neural information processing systems, 32.

39. Gilbert, J. K., & Justi, R. (2016). Analogies in modelling-based teaching and learning. In Modelling-based teaching in science education, pp. 149–169. Springer Gordon, M. L., Lam, M. S., Park, J. S., Patel, K., Hancock, J. T., Hashimoto, T., & Bernstein, M. S. (2022). Jury learning: Integrating dissenting voices into machine learning models. In Barbosa, S. D. J., Lampe, C., Appert, C., Shamma, D. A., Drucker, S. M., Williamson, J. R., & Yatani, K. (Eds.), CHI ’22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022 - 5 May 2022, pp. 115:1–115:19. ACM.

40. Green, B., & Chen, Y. (2019a). Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In Proceedings of the conference on fairness, accountability, and transparency, pp. 90–99.

41. Green, B., & Chen, Y. (2019b). The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–24.

42. Halpern, D. F., Hansen, C., & Riefer, D. (1990). Analogies as an aid to understanding and memory.. Journal of educational psychology, 82(2), 298.

43. He, G., Balayn, A., Buijsman, S., Yang, J., & Gadiraju, U. (2022). It is like finding a polar bear in the savannah! concept-level ai explanations with analogical inference from commonsense knowledge. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 10, pp. 89–101.

44. He, G., Buijsman, S., & Gadiraju, U. (2023). How stated accuracy of an ai system and analogies to explain accuracy affect human reliance on the system. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 1–29.

45. He, G., & Gadiraju, U. (2022). Walking on eggshells: Using analogies to promote appropriate reliance in human-ai decision making. In Proceedings of the Workshop on Trust and Reliance on AI-Human Teams at the ACM Conference on Human Factors in Computing Systems (CHI’22).

46. He, G., Kuiper, L., & Gadiraju, U. (2023). Knowing about knowing: An illusion of human competence can hinder appropriate reliance on ai systems. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–18.

47. Hofstadter, D. R., & Sander, E. (2013). Surfaces and essences: Analogy as the fuel and fire of thinking. Basic Books.

48. Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cogn. Sci., 13(3), 295–355.

49. Hube, C., Fetahu, B., & Gadiraju, U. (2019). Understanding and mitigating worker biases in the crowdsourced collection of subjective judgments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–12.

50. H ̈ullermeier, E. (2020). Towards analogy-based explanations in machine learning. In Torra, V., Narukawa, Y., Nin, J., & Agell, N. (Eds.), Modeling Decisions for Artificial Intelligence - 17th International Conference, MDAI 2020, Sant Cugat, Spain, September 2-4, 2020, Proceedings, Vol. 12256 of Lecture Notes in Computer Science, pp. 205–217. Springer.

51. Ilievski, F., Oltramari, A., Ma, K., Zhang, B., McGuinness, D. L., & Szekely, P. A. (2021). Dimensions of commonsense knowledge. Knowl. Based Syst., 229, 107347.

52. Inel, O., Khamkham, K., Cristea, T., Dumitrache, A., Rutjes, A., Ploeg, J. v. d., Romaszko, L., Aroyo, L., & Sips, R.-J. (2014). Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data. In International semantic web conference, pp. 486–504. Springer.

53. Ji, H., Ke, P., Huang, S., Wei, F., & Huang, M. (2020). Generating commonsense explanation by extracting bridge concepts from reasoning paths. In Wong, K., Knight, K., & Wu, H. (Eds.), Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020, pp. 248–257. Association for Computational Linguistics.

54. Jin, W., Li, X., & Hamarneh, G. (2023). Rethinking ai explainability and plausibility. arXiv preprint arXiv:2303.17707.

55. Kawahara, J., Daneshvar, S., Argenziano, G., & Hamarneh, G. (2018). Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE journal of biomedical and health informatics, 23(2), 538–546.

56. Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering clinical impact with artificial intelligence. BMC medicine, 17(1), 1–9.

57. Kim, B., Wattenberg, M., Gilmer, J., Cai, C. J., Wexler, J., Vi ́egas, F. B., & Sayres, R. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Dy, J. G., & Krause, A. (Eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm ̈assan, Stockholm, Sweden, July 10-15, 2018, Vol. 80 of Proceedings of Machine Learning Research, pp. 2673–2682. PMLR.

58. K ̈orber, M. (2018). Theoretical considerations and development of a questionnaire to measure trust in automation. In Congress of the International Ergonomics Association, pp. 13–30. Springer.

59. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments.. Journal of personality and social psychology, 77(6), 1121.

60. Lai, V., Chen, C., Liao, Q. V., Smith-Renner, A., & Tan, C. (2021). Towards a science of human-ai decision making: a survey of empirical studies. arXiv preprint arXiv:2112.11471.

61. Lai, V., Liu, H., & Tan, C. (2020). ”why is ’chicago’ deceptive?” towards building model-driven tutorials for humans. In Bernhaupt, R., Mueller, F. F., Verweij, D., Andres, J., McGrenere, J., Cockburn, A., Avellino, I., Goguey, A., Bjøn, P., Zhao, S., Samson, B. P., & Kocielnik, R. (Eds.), CHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020, pp. 1–13. ACM.

62. Langer, M., Oster, D., Speith, T., Hermanns, H., K ̈astner, L., Schmidt, E., Sesing, A., & Baum, K. (2021). What do we want from explainable artificial intelligence (xai)?–a stakeholder perspective on xai and a conceptual model guiding interdisciplinary xai research. Artificial Intelligence, 296, 103473.

63. Law, M. T., Thome, N., & Cord, M. (2017). Learning a distance metric from relative comparisons between quadruplets of images. Int. J. Comput. Vis., 121(1), 65–94.

64. Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human factors, 46(1), 50–80.

65. Liao, Q. V., & Varshney, K. R. (2021). Human-centered explainable ai (xai): From algorithms to user experiences. arXiv preprint arXiv:2110.10790.

66. Lin, B. Y., Chen, X., Chen, J., & Ren, X. (2019). Kagnet: Knowledge-aware graph networks for commonsense reasoning. In Inui, K., Jiang, J., Ng, V., & Wan, X. (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp. 2829–2839. Association for Computational Linguistics.

67. Liu, H., Lai, V., & Tan, C. (2021). Understanding the effect of out-of-distribution examples and interactive explanations on human-ai decision making. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–45.

68. Liu, H., Wu, Y., & Yang, Y. (2017). Analogical inference for multi-relational embeddings. In Precup, D., & Teh, Y. W. (Eds.), Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, Vol. 70 of Proceedings of Machine Learning Research, pp. 2168–2178. PMLR.

69. Lu, Z., & Yin, M. (2021). Human reliance on machine learning models when performance feedback is limited: Heuristics and risks. In Kitamura, Y., Quigley, A., Isbister, K., Igarashi, T., Bjørn, P., & Drucker, S. M. (Eds.), CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, pp. 78:1–78:16. ACM.

70. Lucieri, A., Bajwa, M. N., Braun, S. A., Malik, M. I., Dengel, A., & Ahmed, S. (2022). Exaid: A multimodal explanation framework for computer-aided diagnosis of skin lesions. Computer Methods and Programs in Biomedicine, 215, 106620.

71. Lundberg, S. M., & Lee, S. (2017). A unified approach to interpreting model predictions. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., & Garnett, R. (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 4765–4774.

72. Majumder, B. P., Camburu, O., Lukasiewicz, T., & McAuley, J. J. (2021). Rationale-inspired natural language explanations with commonsense. CoRR, abs/2106.13876.

73. Marshall, C. C., & Shipman, F. M. (2013). Experiences surveying the crowd: Reflections on methods, participation, and reliability. In Proceedings of the 5th Annual ACM Web Science Conference, pp. 234–243.

74. Mozzer, N. B., & Justi, R. (2012). Students’ pre-and post-teaching analogical reasoning when they draw their analogies. International Journal of Science Education, 34(3), 429–458.

75. Nashon, S. M. (2004). The nature of analogical explanations: High school physics teachers use in kenya. Research in Science Education, 34(4), 475–502.

76. Nourani, M., Honeycutt, D. R., Block, J. E., Roy, C., Rahman, T., Ragan, E. D., & Gogate, V. (2020a). Investigating the importance of first impressions and explainable ai with interactive video analysis. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–8.

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section