So you think you know distillation; it's easy, right?

We thought so too with @XiaohuaZhai @__kolesnikov__ @_arohan_ and the amazing @royaleerieme and Larisa Markeeva.

Until we didn't. But now we do again. Hop on for a ride (+the best ever ResNet50?)

🧵👇https://t.co/3SlkXVZcG3

This is not a fancy novel method. It's plain old distillation.

But we investigate it thoroughly, for model compression, via the lens of *function matching*.

We highligh two crucial principles that are often missed: Consistency and Patience. Only both jointly give good results!
0. Intuition: Want the student to replicate _the whole function_ represented by the teacher, everywhere that we expect data in input space.

This is a much stronger view than the commonly used "teacher generates better/more informative labels for the data". See pic above.
1. Consistency: to achieve this, teacher and student need to see the same view (crop) of the image. For example, this means no pre-computed teacher logits! We can generate many more views via mixup.

Other approaches may look good early, but eventually fall behind consistency.
2. Patience: The function matching task is HARD! We need to train *a lot* longer than typical, and actually we were not able to reach saturation yet. Overfitting does not happen, as when function-matching, an "overfit" student is great! (Note: w/ pre-computed teacher, we overfit)
2b. Excessively long training may mean optim struggle. We try advanced optimization via Shampoo, and get 4x faster convergence.

We believe this setting is a great test-bed for optimizer research: No concern of overfitting, and reducing training error means generalizing better!
3. By distilling a couple large BiT R152x2 models into a ResNet-50, we get a ResNet-50 on ImageNet that gets 82.8% at 224px resolution, and 80.5% at 160px! 😎

No "tricks" just plain distillation, patiently matching functions.
4. Importantly, this simple strategy works on many datasets of various sizes, down to only 1020 training images, where anything else we tried overfit horribly.

Be patient, be consistent, that's it. Eventually, you'll reach or outperform your teacher!
2c. We can't stress patience enough. Multiple strategies, for example initializing the student with a pre-trained model shown here, look promising at first, but eventually plateau and are outperformed by patient, consistent function matching.
5. We have a lot more content. MobileNet students, distilling on on "random other" data (shown below), very thorough baselines, a teacher ensemble, and.... BiT download statistics!
PS: we are working on releasing a bunch of the models, including the best ones, ... but we're also on vacation. Watch https://t.co/Age8NXgS1D and stay tuned, we're aiming for next week!

More from All

कुंडली में 12 भाव होते हैं। कैसे ज्योतिष द्वारा रोग के आंकलन करते समय कुंडली के विभिन्न भावों से गणना करते हैं आज इस पर चर्चा करेंगे।
कुण्डली को कालपुरुष की संज्ञा देकर इसमें शरीर के अंगों को स्थापित कर उनसे रोग, रोगेश, रोग को बढ़ाने घटाने वाले ग्रह


रोग की स्थिति में उत्प्रेरक का कार्य करने वाले ग्रह, आयुर्वेदिक/ऐलोपैथी/होमियोपैथी में से कौन कारगर होगा इसका आँकलन, रक्त विकार, रक्त और आपरेशन की स्थिति, कौन सा आंतरिक या बाहरी अंग प्रभावित होगा इत्यादि गणना करने में कुंडली का प्रयोग किया जाता है।


मेडिकल ज्योतिष में आज के समय में Dr. K. S. Charak का नाम निर्विवाद रूप से प्रथम स्थान रखता है। उनकी लिखी कई पुस्तकें आज इस क्षेत्र में नए ज्योतिषों का मार्गदर्शन कर रही हैं।
प्रथम भाव -
इस भाव से हम व्यक्ति की रोगप्रतिरोधक क्षमता, सिर, मष्तिस्क का विचार करते हैं।


द्वितीय भाव-
दाहिना नेत्र, मुख, वाणी, नाक, गर्दन व गले के ऊपरी भाग का विचार होता है।
तृतीय भाव-
अस्थि, गला,कान, हाथ, कंधे व छाती के आंतरिक अंगों का शुरुआती भाग इत्यादि।

चतुर्थ भाव- छाती व इसके आंतरिक अंग, जातक की मानसिक स्थिति/प्रकृति, स्तन आदि की गणना की जाती है


पंचम भाव-
जातक की बुद्धि व उसकी तीव्रता,पीठ, पसलियां,पेट, हृदय की स्थिति आंकलन में प्रयोग होता है।

षष्ठ भाव-
रोग भाव कहा जाता है। कुंडली मे इसके तत्कालिक भाव स्वामी, कालपुरुष कुंडली के स्वामी, दृष्टि संबंध, रोगेश की स्थिति, रोगेश के नक्षत्र औऱ रोगेश व भाव की डिग्री इत्यादि।
MASTER THREAD on Short Strangles.

Curated the best tweets from the best traders who are exceptional at managing strangles.

• Positional Strangles
• Intraday Strangles
• Position Sizing
• How to do Adjustments
• Plenty of Examples
• When to avoid
• Exit Criteria

How to sell Strangles in weekly expiry as explained by boss himself. @Mitesh_Engr

• When to sell
• How to do Adjustments
• Exit


Beautiful explanation on positional option selling by @Mitesh_Engr
Sir on how to sell low premium strangles yourself without paying anyone. This is a free mini course in


1st Live example of managing a strangle by Mitesh Sir. @Mitesh_Engr

• Sold Strangles 20% cap used
• Added 20% cap more when in profit
• Booked profitable leg and rolled up
• Kept rolling up profitable leg
• Booked loss in calls
• Sold only


2nd example by @Mitesh_Engr Sir on converting a directional trade into strangles. Option Sellers can use this for consistent profit.

• Identified a reversal and sold puts

• Puts decayed a lot

• When achieved 2% profit through puts then sold

You May Also Like

So the cryptocurrency industry has basically two products, one which is relatively benign and doesn't have product market fit, and one which is malignant and does. The industry has a weird superposition of understanding this fact and (strategically?) not understanding it.


The benign product is sovereign programmable money, which is historically a niche interest of folks with a relatively clustered set of beliefs about the state, the literary merit of Snow Crash, and the utility of gold to the modern economy.

This product has narrow appeal and, accordingly, is worth about as much as everything else on a 486 sitting in someone's basement is worth.

The other product is investment scams, which have approximately the best product market fit of anything produced by humans. In no age, in no country, in no city, at no level of sophistication do people consistently say "Actually I would prefer not to get money for nothing."

This product needs the exchanges like they need oxygen, because the value of it is directly tied to having payment rails to move real currency into the ecosystem and some jurisdictional and regulatory legerdemain to stay one step ahead of the banhammer.