Efficient Training with Denoised Neural Weights

Yifan Gong^1,2 Zheng Zhan² Yanyu Li^1,2 Yerlan Idelbayev¹ Andrey Zharkov¹ Kfir Aberman¹ Sergey Tulyakov¹ Yanzhi Wang² Jian Ren¹

¹Snap Inc. ²Northeastern University

Paper arXiv Video Github Dataset

The framework overview of our weight generator design. The standard diffusion process turns an image into noise in the forward pass and reverses a clean image from pure noise in the reverse process. Our weight generator is designed to turn a noise to weight initializations for efficient training purposes. Given the text information and block index, the weight generator provides the corresponding weight values.

Presentation Video

We show the presentation video, mostly with overview of motivations, our framework design, and visualization results.

Abstract

Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for initialization. We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights spanning a wide range. Specifically, we first collect a dataset with various image editing concepts and their corresponding trained weights, which are later used for the training of the weight generator. To address the different characteristics among layers and the substantial number of weights to be predicted, we divide the weights into equal-sized blocks and assign each block an index. Subsequently, a diffusion model is trained with such a dataset using both text conditions of the concept and the block indexes. By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds. Compared to training from scratch (i.e., Pix2pix), we achieve a 15× training time acceleration for a new concept while obtaining even better image generation quality.

Model Architecture Overview

The UNet Weight Generator >. The weight generator is composed of 1-d ResBlocks and 1-d Transformer blocks. The block embedding embn is combined with the time step embedding emdt to be leveraged in each ResBlock.

Qualitative Comparisons on Various Tasks

Quantitative Results

FID and time consumption comparison. FID is calculated between the images generated by GAN-based approaches and diffusion models. Reported FID is averaged across different concepts in the test prompt dataset.

The FID performance comparison between our method and baseline methods along with the training process on the test dataset for different concepts/styles.

Left: Ablation study on block size for weight division. Right: Ablation study on weight grouping.

Generated Images and Trained Weights Dataset

In order to effectively train a weight generator for generating weight initializations of GAN models across various concepts, we need to collect a large-scale ground-truth weight value dataset for different concepts. To obtain the ground-truth weight value dataset, a large-scale prompt dataset becomes crucial. By using the concepts/styles in the prompt dataset, we can achieve image collection with diffusion models to obtain a substantial collection of images representative of each target concept. The images for each concept/style are further leveraged to train the GANs for the obtaining of the ground-truth GAN weights.

Examples of collected text prompts of concepts/styles, generated with ChatGPT-3.5 and Vicuna.

Reference

[1] Image-to-Image Translation with Conditional Adversarial Networks

[2] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

[3] Zero-shot Image-to-Image Translation

BibTeX

@article{gong20242,
  title={Efficient Training with Denoised Neural Weights},
  author={Gong, Yifan and Zhan, Zheng and Li, Yanyu and Idelbayev, Yerlan and Zharkov, Andrey and Aberman, Kfir and Tulyakov, Sergey and Wang, Yanzhi and others},
  journal={arXiv preprint arXiv:2407.11966},
  year={2024}
}