Hello! Pleasure is mine. Original tweet by @buzz_chronicles: "@SaveToBookmarks Hi! Pleasure is mine. Original tweet by @buzz_chronicles: "@Sav...". Thanks! ✌

More from Buzz Chronicles

More from All

How can we use language supervision to learn better visual representations for robotics?

Introducing Voltron: Language-Driven Representation Learning for Robotics!

Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z

🧵👇(1 / 12)


Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.

Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)

The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (
https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).

The secret is *balance* (3/12)

Starting with a masked autoencoder over frames from these video clips, make a choice:

1) Condition on language and improve our ability to reconstruct the scene.

2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)

By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.

Why is the ability to shape this balance important? (5/12)

You May Also Like

Ivor Cummins has been wrong (or lying) almost entirely throughout this pandemic and got paid handsomly for it.

He has been wrong (or lying) so often that it will be nearly impossible for me to track every grift, lie, deceit, manipulation he has pulled. I will use...


... other sources who have been trying to shine on light on this grifter (as I have tried to do, time and again:


Example #1: "Still not seeing Sweden signal versus Denmark really"... There it was (Images attached).
19 to 80 is an over 300% difference.

Tweet: https://t.co/36FnYnsRT9


Example #2 - "Yes, I'm comparing the Noridcs / No, you cannot compare the Nordics."

I wonder why...

Tweets: https://t.co/XLfoX4rpck / https://t.co/vjE1ctLU5x


Example #3 - "I'm only looking at what makes the data fit in my favour" a.k.a moving the goalposts.

Tweets: https://t.co/vcDpTu3qyj / https://t.co/CA3N6hC2Lq