Fast Neural Network Performance Estimator

Prior work has demonstrated that a good training speed estimator (TSE) plus classic optimizaiton techniques such as Bayesian Optimizaiton or Evolutionary Algorithms can help us to find optimal network architectures.

Speedy Performance Estimation for Neural Architecture Search

The idea of the TSE estimator is the following:

Notice in TSE, the loss is summed across $B$ that means all points in the dataset are traversed.

Bayesian Optimization for Hyperparameter Tuning

GitHub - rubinxin/TSE

Screen Shot 2022-09-30 at 10.57.10 PM.png

Previous BO approaches use GP to optimize both global and layer-wise hyperparameters in Neural Networks, all of these hyperparamters are discrete points (eg. layers can only be {1, 2, 3}).

Motivated by the fact that many hyperparamters are better to be tuned with a function. Let’s just use learning rate as an example, we are interested in

$$ lr = (a_0t+b_0)(a_1t+b_1)(a_2t+b_2) $$