The motivation is from this early Madry paper, where they demonstrated that capacity plays an important role in adversarial robustness.

Towards Deep Learning Models Resistant to Adversarial Attacks

Also, in ICLR 2019, they pointed out that a classifier learns on both good-features (features with strong correlations to labels) and fragile features (features that are weakly correlated). This then creates a tension between the clean and robust accuracy.

Robustness May Be at Odds with Accuracy

In the meantime, we see that an ensemble of networks shows better robustness because apparently now the network is bigger and contains redundancies. It is then harder for attackers to effectively attack all of the fragile features.

Ensemble Adversarial Training: Attacks and Defenses

Neural Architecture Dilation for Adversarial Robustness

This paper then shows an interesting phenomenon: a main network sits in parallel to another network (the network in parallel is known as the dilation network) can maintain both high accuracy on clean and adversarial datasets.

It is then interesting to understand the limit of such an architecture formulation and let’s actually forget about the NAS space.

If we use the an ensemble for the main network ($N$ models for the backbone $f_b$) and an ensemble for the dilation network ($M$ models for dilation network $f_d$), it is interesting to understand the relationships between the different capability requirements.