What is the best way to perform hyperparameter optimization for a Pytorch model? Implement e.g. Random Search myself? Use Skicit Learn? Or is there anything else I am not aware
The simplest parameter-free way to do black box optimisation is random search, and it will explore high dimensional spaces faster than a grid search. There are papers on this but tl;dr with random search you get different values on every dimension each time, while with grid search you don't.
Bayesian optimisation has good theoretical guarantees (despite the approximations), and implementations like Spearmint can wrap any script you have; there are hyperparameters but users don't see them in practice. Hyperband got a lot of attention by showing faster convergence than Naive Bayesian optimisation. It was able to do this by running different networks for different numbers of iterations, and Bayesian optimisation doesn't support that naively. While it is possible to do better with a Bayesian optimisation algorithm that can take this into account, such as FABOLAS, in practice hyperband is so simple you're probably better using it and watching it to tune the search space at intervals.