Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread]

https://t.co/7QIx5CFdfJ

The paper is a good example of lots of elements of good experimental design. They validate their metric by showing lots of variants give consistent results. They tune hyperparamters separately for each condition, check that optimum isn't at the endpoints, and measure sensitivity.
They have separate experiments where the hold fixed # iterations and # epochs, which (as they explain) measure very different things. They avoid confounds, such as batch norm's artificial dependence between batch size and regularization strength.
When the experiments are done carefully enough, the results are remarkably consistent between different datasets and architectures. Qualitatively, MNIST behaves just like ImageNet.
Importantly, they don't find any evidence for a "sharp/flat optima" effect whereby better optimization leads to worse final results. They have a good discussion of experimental artifacts/confounds in past papers where such effects were reported.
The time-to-target-validation is explained purely by optimization considerations. There's a regime where variance dominates, and you get linear speedups w/ batch size. Then there's a regime where curvature dominates and larger batches don't help. As theory would predict.
Incidentally, this paper must have been absurdly expensive, even by Google's standards. Doing careful empirical work on optimizers requires many, many runs of the algorithm. (I think surprising phenomena on ImageNet are often due to the difficulty of running proper experiments.)

More from Machine learning

You May Also Like

The chorus of this song uses the shlokas taken from Sundarkand of Ramayana.

It is a series of Sanskrit shlokas recited by Jambavant to Hanuman to remind Him of his true potential.

1. धीवर प्रसार शौर्य भरा: The brave persevering one, your bravery is taking you forward.


2. उतसारा स्थिरा घम्भीरा: The one who is leaping higher and higher, who is firm and stable and seriously determined.

3. ुग्रामा असामा शौर्या भावा: He is strong, and without an equal in the ability/mentality to fight

4. रौद्रमा नवा भीतिर्मा: His anger will cause new fears in his foes.

5.विजिटरीपुरु धीरधारा, कलोथरा शिखरा कठोरा: This is a complex expression seen only in Indic language poetry. The poet is stating that Shivudu is experiencing the intensity of climbing a tough peak, and likening

it to the feeling in a hard battle, when you see your enemy defeated, and blood flowing like a rivulet. This is classical Veera rasa.

6.कुलकु थारथिलीथा गम्भीरा, जाया विराट वीरा: His rough body itself is like a sharp weapon (because he is determined to win). Hail this complete

hero of the world.

7.विलयगागनथाला भिकारा, गरज्जद्धरा गारा: The hero is destructive in the air/sky as well (because he can leap at an enemy from a great height). He can defeat the enemy (simply) with his fearsome roar of war.