Kitchen sink optimiser model
A typical loss function of a model is comprised of loss from data (E) and loss from weight or a regulariser (R)
gradient
velocity
momentum
weight update
The final term is weight decay. We can derive all the optimisers based on the value of constants and the nature of