MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training
Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By aligning intermediate diffusion featu...