wfz666/ICML26-attention-sink: Are attention sinks necessary in diffusion transformers? Code for dynamic sink detection and causal suppression experiments in SD3/SDXL.

Are attention sinks necessary in diffusion transformers? Code for dynamic sink detection and causal suppression experiments in SD3/SDXL.

Read Original

Related