A cross-border community for researchers with openness, equality and inclusion

ABSTRACT LIBRARY

Integrating Local and Global Frequency Attention for Multi-Teacher Knowledge Distillation

Publisher: USS

Authors: Yao Zhidi, Hosei UniversityDu Mengxin, Instrumentation Technology and Economy Institute Cheng Xin, Hosei University Zhang Zhiqiang, Southwest University of Science and Technology Yu Wenxin, Sichuan Civil-military Integration Institute;Fujiang Laboratory

Open Access

  • Favorite
  • Share:

Abstract:

Knowledge distillation, particularly in multi-teacher settings, presents significant challenges in effectively transferring knowledge from multiple complex models to a more compact student model. Traditional approaches often fall short in capturing the full spectrum of useful information. In this paper, we propose a novel method that integrates local and global frequency attention mechanisms to enhance the multi-teacher knowledge distillation process. By simultaneously addressing both fine-grained local details and broad global patterns, our approach improves the student model's ability to assimilate and generalize from the diverse knowledge provided by multiple teachers. Experimental evaluations on standard benchmarks demonstrate that our method consistently outperforms existing multi-teacher distillation techniques, achieving superior accuracy and robustness. Our results suggest that incorporating frequency-based attention mechanisms can significantly advance the effectiveness of knowledge distillation in multi-teacher scenarios, offering new insights and techniques for model compression and transfer learning.

Keywords: knowledge distillation, frequency attention mechanisms, model compression, deep learning

Published in: IEEE Transactions on Antennas and Propagation( Volume: 71, Issue: 4, April 2023)

Page(s): 2908 - 2921

Date of Publication: 2908 - 2921

DOI: 10.1109/TAP.2023.3240032

Publisher: UNITED SOCIETIES OF SCIENCE