Stable Diffusion

Large Self Supervised Learning (SSL) models have been very popular recently. One popular SSL method is Stable Diffusion, where this method uses a generator that adds Gaussian Noises and a discriminator that trys to remove this noise. This new latent text-to-image model runs with a 860M UNet and a 120 text encoder.

GitHub - CompVis/stable-diffusion: A latent text-to-image diffusion model

High-Resolution Image Synthesis with Latent Diffusion Models

The stable diffusion model, although demonstrated a great capability on image generation, is expensive to run. This project will look at how we can use Knowledge Distillation techniques to reduce the runtime of these models through pruning and fixed-point quantization.

Project overview

In particular, this project will look at:

Run and evaluate existing Stable Diffusion models and construct a Knowledge Distillation pipeline for the model.
Try to make a smaller student model in size by doing:
- Pruning
- Layer skipping
Try to further shrink the model inference cost through fixed-point quantization

Knowledge Distillation

General KD

Distilling the Knowledge in a Neural Network

The very very early KD paper by Hinton, method is old, but useful as a citation

Fitnets: Hints for thin deep nets

Old paper by Benjio