Vision transformers are now seen as the backbone for various vision tasks.
Training data-efficient image transformers & distillation through attention
ICCV 2021 Open Access Repository
PVT v2: Improved Baselines with Pyramid Vision Transformer
Distilling the Knowledge in a Neural Network
Fitnets: Hints for thin deep nets
TinyBERT: Distilling BERT for Natural Language Understanding
TernaryBERT: Distillation-aware Ultra-low Bit BERT
EBERT: Efficient BERT Inference with Dynamic Structured Pruning
Kernel Based Progressive Distillation for Adder Neural Networks