Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread]

https://t.co/7QIx5CFdfJ

The paper is a good example of lots of elements of good experimental design. They validate their metric by showing lots of variants give consistent results. They tune hyperparamters separately for each condition, check that optimum isn't at the endpoints, and measure sensitivity.
They have separate experiments where the hold fixed # iterations and # epochs, which (as they explain) measure very different things. They avoid confounds, such as batch norm's artificial dependence between batch size and regularization strength.
When the experiments are done carefully enough, the results are remarkably consistent between different datasets and architectures. Qualitatively, MNIST behaves just like ImageNet.
Importantly, they don't find any evidence for a "sharp/flat optima" effect whereby better optimization leads to worse final results. They have a good discussion of experimental artifacts/confounds in past papers where such effects were reported.
The time-to-target-validation is explained purely by optimization considerations. There's a regime where variance dominates, and you get linear speedups w/ batch size. Then there's a regime where curvature dominates and larger batches don't help. As theory would predict.
Incidentally, this paper must have been absurdly expensive, even by Google's standards. Doing careful empirical work on optimizers requires many, many runs of the algorithm. (I think surprising phenomena on ImageNet are often due to the difficulty of running proper experiments.)

More from Machine learning

Starting a new project using #Angular? Here is a list of all the stuff i use to launch my projects the fastest i can.

A THREAD 👇

Have you heard about Monorepo? I created one with all my Angular (and Nest) projects using
https://t.co/aY5llDtXg8.

I can share A LOT of code with it. Ex: Everytime i start a new project, i just need to import an Auth lib, that i created, and all Auth related stuff is set up.

Everyone in the Angular community knows about https://t.co/kDnunQZnxE. It's not the most beautiful component library out there, but it's good and easy to work with.

There's a bunch of state management solutions for Angular, but https://t.co/RJwpn74Qev is by far my favorite.

There's a lot of boilerplate, but you can solve this with the built-in schematics and/or with your own schematics

Are you not using custom schematics yet? Take a look at this:

https://t.co/iLrIaHVafm
https://t.co/3382Tn2k7C

You can automate all the boilerplate with hundreds of files associates with creating a new feature.

You May Also Like

🌿𝑻𝒉𝒆 𝒔𝒕𝒐𝒓𝒚 𝒐𝒇 𝒂 𝑺𝒕𝒂𝒓 : 𝑫𝒉𝒓𝒖𝒗𝒂 & 𝑽𝒊𝒔𝒉𝒏𝒖

Once upon a time there was a Raja named Uttānapāda born of Svayambhuva Manu,1st man on earth.He had 2 beautiful wives - Suniti & Suruchi & two sons were born of them Dhruva & Uttama respectively.
#talesofkrishna https://t.co/E85MTPkF9W


Now Suniti was the daughter of a tribal chief while Suruchi was the daughter of a rich king. Hence Suruchi was always favored the most by Raja while Suniti was ignored. But while Suniti was gentle & kind hearted by nature Suruchi was venomous inside.
#KrishnaLeela


The story is of a time when ideally the eldest son of the king becomes the heir to the throne. Hence the sinhasan of the Raja belonged to Dhruva.This is why Suruchi who was the 2nd wife nourished poison in her heart for Dhruva as she knew her son will never get the throne.


One day when Dhruva was just 5 years old he went on to sit on his father's lap. Suruchi, the jealous queen, got enraged and shoved him away from Raja as she never wanted Raja to shower Dhruva with his fatherly affection.


Dhruva protested questioning his step mother "why can't i sit on my own father's lap?" A furious Suruchi berated him saying "only God can allow him that privilege. Go ask him"