[1] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. 338
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015. 339
[3] C. Li, P. Zheng, Y. Yin, B. Wang, and L. Wang, “Deep reinforcement learning in smart manufacturing: A review and prospects,” CIRP Journal of Manufacturing Science and Technology, vol. 40, pp. 75–101, 2023. 340
[4] A. Perera and P. Kamalaruban, “Applications of reinforcement learning in energy systems,” Renewable and Sustainable Energy Reviews, vol. 137, p. 110618, 2021. 341
[5] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013. 342
[6] A. Esteso, D. Peidro, J. Mula, and M. D´ıaz-Madronero, “Reinforcement learning applied to production planning and control,” International Journal of Production Research, vol. 61, no. 16, pp. 5772–5789, 2023. 343
[7] R. Nian, J. Liu, and B. Huang, “A review on reinforcement learning: Introduction and applications in industrial process control,” Computers & Chemical Engineering, vol. 139, p. 106886, 2020. 344
[8] R. N. Boute, J. Gijsbrechts, W. Van Jaarsveld, and N. Vanvuchelen, “Deep reinforcement learning for inventory control: A roadmap,” European Journal of Operational Research, vol. 298, no. 2, pp. 401–412, 2022. 345
[9] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization: Overview and conceptual comparison,” ACM computing surveys (CSUR), vol. 35, no. 3, pp. 268–308, 2003. 346
[10] Y. Li, “Deep reinforcement learning: An overview,” arXiv preprint arXiv:1701.07274, 2017. 347
[11] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017. 348
[12] C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, and J. M. Wassick, “A deep reinforcement learning approach for chemical production scheduling,”Computers & Chemical Engineering, vol. 141, p. 106982, 2020. 349
[13] D. Shi, W. Fan, Y. Xiao, T. Lin, and C. Xing, “Intelligent scheduling of discrete automated production line via deep reinforcement learning,” International journal of production research, vol. 58, no. 11, pp. 3362–3380, 2020. 350
[14] F. Guo, Y. Li, A. Liu, and Z. Liu, “A reinforcement learning method to scheduling problem of steel production process,” in Journal of Physics: Conference Series, vol. 1486, no. 7. IOP Publishing, 2020, p. 072035. 351
[15] M. Mowbray, D. Zhang, and E. A. D. R. Chanona, “Distributional reinforcement learning for scheduling of chemical production processes,” arXiv preprint arXiv:2203.00636, 2022. 352
[16] N. N. Sultana, H. Meisheri, V. Baniwal, S. Nath, B. Ravindran, and H. Khadilkar, “Reinforcement learning for multi-product multi-node inventory management in supply chains,” arXiv preprint arXiv:2006.04037, 2020. 353
[17] B. J. De Moor, J. Gijsbrechts, and R. N. Boute, “Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management,” European Journal of Operational Research, vol. 301, no. 2, pp. 535–545, 2022. 354
[18] M. Khirwar, K. S. Gurumoorthy, A. A. Jain, and S. Manchenahally, “Cooperative multi-agent reinforcement learning for inventory management,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2023, pp. 619–634. 355
[19] R. Leluc, E. Kadoche, A. Bertoncello, and S. Gourvenec, “Marlim: Multi-agent reinforcement learning for inventory management,” arXiv preprint arXiv:2308.01649, 2023. 356
[20] O. Ogunfowora and H. Najjaran, “Reinforcement and deep reinforcement learning-based solutions for machine maintenance planning, scheduling policies, and optimization,” Journal of Manufacturing Systems, vol. 70, pp. 244–263, 2023. 357
[21] N. Yousefi, S. Tsianikas, and D. W. Coit, “Reinforcement learning for dynamic condition-based maintenance of a system with individually repairable components,” Quality Engineering, vol. 32, no. 3, pp. 388–408, 2020. 358
[22] ——, “Dynamic maintenance model for a repairable multi-component system using deep reinforcement learning,” Quality Engineering, vol. 34, no. 1, pp. 16–35, 2022. 359
[23] P. Andrade, C. Silva, B. Ribeiro, and B. F. Santos, “Aircraft maintenance check scheduling using reinforcement learning,” Aerospace, vol. 8, no. 4, p. 113, 2021. 360
[24] J. Thomas, M. P. Hernandez, A. K. Parlikad, and R. Piechocki, “Network maintenance planning via multi-agent reinforcement learning,” in 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2021, pp. 2289–2295. 361
[25] Z. J. Viharos and R. Jakab, “Reinforcement learning for statistical process control in manufacturing,” Measurement, vol. 182, p. 109616, 2021. 362
[26] A. Kuhnle, M. C. May, L. Schafer, and G. Lanza, “Explainable reinforcement learning in production control of job shop manufacturing system,” International Journal of Production Research, vol. 60, no. 19, pp. 5812–5834, 2022. 363
[27] M. Mowbray, R. Smith, E. A. Del Rio-Chanona, and D. Zhang, “Using process data to generate an optimal control policy via apprenticeship and reinforcement learning,” AIChE Journal, vol. 67, no. 9, p. e17306, 2021. 364
[28] Y. Li, J. Du, and W. Jiang, “Reinforcement learning for process control with application in semiconductor manufacturing,” IISE Transactions, pp. 1–15, 2023. 365
[29] D. Azuatalam, W.-L. Lee, F. de Nijs, and A. Liebman, “Reinforcement learning for whole-building hvac control and demand response,” Energy and AI, vol. 2, p. 100020, 2020. 366
[30] D. Jang, L. Spangher, M. Khattar, U. Agwan, and C. Spanos, “Using meta reinforcement learning to bridge the gap between simulation and experiment in energy demand response,” in Proceedings of the Twelfth ACM International Conference on Future Energy Systems, 2021, pp. 483–487. 367
[31] M. Ahrarinouri, M. Rastegar, and A. R. Seifi, “Multiagent reinforcement learning for energy management in residential buildings,” IEEE Transactions on Industrial Informatics, vol. 17, no. 1, pp. 659–666, 2020. 368
[32] R. Lu, R. Bai, Z. Luo, J. Jiang, M. Sun, and H.-T. Zhang, “Deep reinforcement learning-based demand response for smart facilities energy management,” IEEE Transactions on Industrial Electronics, vol. 69, no. 8, pp. 8554–8565, 2021. 369
[33] R. Lu, Y.-C. Li, Y. Li, J. Jiang, and Y. Ding, “Multi-agent deep reinforcement learning based demand response for discrete manufacturing systems energy management,” Applied Energy, vol. 276, p. 115473, 2020. 370
[34] X. Zhang, R. Lu, J. Jiang, S. H. Hong, and W. S. Song, “Testbed implementation of reinforcement learning-based demand response energy management system,” Applied energy, vol. 297, p. 117131, 2021. 371
[35] T. A. Nakabi and P. Toivanen, “Deep reinforcement learning for energy management in a microgrid with flexible demand,” Sustainable Energy, Grids and Networks, vol. 25, p. 100413, 2021. 372
[36] R. Hu and A. Kwasinski, “Energy management for microgrids using a reinforcement learning algorithm,” in 2021 IEEE Green Energy and Smart Systems Conference (IGESSC). IEEE, 2021, pp. 1–6. 373
[37] B. Zhang, Z. Chen, and A. M. Ghias, “Deep reinforcement learning-based energy management strategy for a microgrid with flexible loads,” in 2023 International Conference on Power Energy Systems and Applications (ICoPESA). IEEE, 2023, pp. 187–191. 374
[38] W. Zhang, H. Qiao, X. Xu, J. Chen, J. Xiao, K. Zhang, Y. Long, and Y. Zuo, “Energy management in microgrid based on deep reinforcement learning with expert knowledge,” in International Workshop on Automation, Control, and Communication Engineering (IWACCE 2022), vol. 12492. SPIE, 2022, pp. 275–284. 375
[39] A. Shojaeighadikolaei, A. Ghasemi, A. G. Bardas, R. Ahmadi, and M. Hashemi, “Weather-aware data-driven microgrid energy management using deep reinforcement learning,” in 2021 North American Power Symposium (NAPS). IEEE, 2021, pp. 1–6. 376
[40] Y. Du and F. Li, “Intelligent multi-microgrid energy management based on deep neural network and model-free reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1066–1076, 2019. 377
[41] T. Yang, L. Zhao, W. Li, and A. Y. Zomaya, “Reinforcement learning in sustainable energy and electric systems: A survey,” Annual Reviews in Control, vol. 49, pp. 145–163, 2020. 378
[42] D. Cao, W. Hu, J. Zhao, G. Zhang, B. Zhang, Z. Liu, Z. Chen, and F. Blaabjerg, “Reinforcement learning and its applications in modern power and energy systems: A review,” Journal of modern power systems and clean energy, vol. 8, no. 6, pp. 1029–1042, 2020. 379
[43] X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning for selective key applications in power systems: Recent advances and future challenges,” IEEE Transactions on Smart Grid, vol. 13, no. 4, pp. 2935–2958, 2022. 380
[44] K. Sivamayil, E. Rajasekar, B. Aljafari, S. Nikolovski, S. Vairavasundaram, and I. Vairavasundaram, “A systematic study on reinforcement learning based applications,” Energies, vol. 16, no. 3, p. 1512, 2023. 381
[45] X. Zhong, Z. Zhang, R. Zhang, and C. Zhang, “End-to-end deep reinforcement learning control for hvac systems in office buildings,” Designs, vol. 6, no. 3, p. 52, 2022. 382
[46] S. Sierla, H. Ihasalo, and V. Vyatkin, “A review of reinforcement learning applications to control of heating, ventilation and air conditioning systems,” Energies, vol. 15, no. 10, p. 3526, 2022. 383
[47] H.-Y. Liu, B. Balaji, S. Gao, R. Gupta, and D. Hong, “Safe hvac control via batch reinforcement learning,” in 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 2022, pp. 181–192. 384
[48] X. Yuan, Y. Pan, J. Yang, W. Wang, and Z. Huang, “Study on the application of reinforcement learning in the operation optimization of hvac system,” in Building Simulation, vol. 14. Springer, 2021, pp. 75–87. 385
[49] M. Biemann, F. Scheller, X. Liu, and L. Huang, “Experimental evaluation of model-free reinforcement learning algorithms for continuous hvac control,” Applied Energy, vol. 298, p. 117164, 2021. 386
[50] D. Zhou, R. Jia, and H. Yao, “Robotic arm motion planning based on curriculum reinforcement learning,” in 2021 6th International Conference on Control and Robotics Engineering (ICCRE). IEEE, 2021, pp. 44–49. 387
[51] T. Yu and Q. Chang, “Reinforcement learning based user-guided motion planning for human-robot collaboration,” arXiv preprint arXiv:2207.00492, 2022. 388
[52] Y. Cao, S. Wang, X. Zheng, W. Ma, X. Xie, and L. Liu, “Reinforcement learning with prior policy guidance for motion planning of dual-arm free-floating space robot,” Aerospace Science and Technology, vol. 136, p. 108098, 2023. 389
[53] M. Schuck, J. Br¨udigam, A. Capone, S. Sosnowski, and S. Hirche, “Dext-gen: Dexterous grasping in sparse reward environments with full orientation control,” arXiv preprint arXiv:2206.13966, 2022. 390
[54] S. Joshi, S. Kumra, and F. Sahin, “Robotic grasping using deep reinforcement learning,” in 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE). IEEE, 2020, pp. 1461–1466. 391
[55] D. Wang, H. Deng, and Z. Pan, “Mrcdrl: Multi-robot coordination with deep reinforcement learning,” Neurocomputing, vol. 406, pp. 68–76, 2020. 392
[56] X. Lan, Y. Qiao, and B. Lee, “Towards pick and place multi robot coordination using multi-agent deep reinforcement learning,” in 2021 7th International Conference on Automation, Robotics and Applications (ICARA). IEEE, 2021, pp. 85–89. 393