Integrating Local and Global Frequency Attention for Multi-Teacher Knowledge Distillation
Time: 01 Jan 1970, 08:00
Session: [PS] Poster Session » [PS] Poster
Type: Poster Presentation
Abstract:
Knowledge distillation, particularly in multi-teacher settings, presents significant challenges in effectively transferring knowledge from multiple complex models to a more compact student model. Traditional approaches often fall short in capturing the full spectrum of useful information. In this paper, we propose a novel method that integrates local and global frequency attention mechanisms to enhance the multi-teacher knowledge distillation process. By simultaneously addressing both fine-grained local details and broad global patterns, our approach improves the student model's ability to assimilate and generalize from the diverse knowledge provided by multiple teachers. Experimental evaluations on standard benchmarks demonstrate that our method consistently outperforms existing multi-teacher distillation techniques, achieving superior accuracy and robustness. Our results suggest that incorporating frequency-based attention mechanisms can significantly advance the effectiveness of knowledge distillation in multi-teacher scenarios, offering new insights and techniques for model compression and transfer learning.
Keywords:
knowledge distillation, frequency attention mechanisms, model compression, deep learning
Speaker: