Home / USS Digital Library / Presentation / Details

Integrating Local and Global Frequency Attention for Multi-Teacher Knowledge Distillation

ID: 126 View Protection: Participants Only Updated time: 2024-09-19 17:57:45 Views: 431 Favorite Cancel favorite

Time: 01 Jan 1970, 08:00

Session: [PS] Poster Session » [PS] Poster

Source: 2024 Asian Conference on Communication and Networks

Type: Poster Presentation

File: Video No right to play Slide

Abstract:

Knowledge distillation, particularly in multi-teacher settings, presents significant challenges in effectively transferring knowledge from multiple complex models to a more compact student model. Traditional approaches often fall short in capturing the full spectrum of useful information. In this paper, we propose a novel method that integrates local and global frequency attention mechanisms to enhance the multi-teacher knowledge distillation process. By simultaneously addressing both fine-grained local details and broad global patterns, our approach improves the student model's ability to assimilate and generalize from the diverse knowledge provided by multiple teachers. Experimental evaluations on standard benchmarks demonstrate that our method consistently outperforms existing multi-teacher distillation techniques, achieving superior accuracy and robustness. Our results suggest that incorporating frequency-based attention mechanisms can significantly advance the effectiveness of knowledge distillation in multi-teacher scenarios, offering new insights and techniques for model compression and transfer learning.

Keywords: knowledge distillation, frequency attention mechanisms, model compression, deep learning

Speaker:

Yao Zhidi

Hosei University

PRESENTATION LIBRARY

Integrating Local and Global Frequency Attention for Multi-Teacher Knowledge Distillation

Abstract:

Yao Zhidi