1/ I've met so many founders in the last 6 mo who confidentially told me their recent war stories with many top-tier VC firms & their treatment. Two sides to every story but holy moley does it make me sad. I hope the GPs understand that we all talk too. Examples...

2/ Firms (an previous investor) playing hard ball & issuing 3x liquidation preference at tough moments.
3/ Pitting founders against each other during a conflict.
4/ GPs undermining the CEO with their executive team.
5/ Flying secretly to a potential acquirer to blow up an M&A deal.
6/ Accepting onerous terms at acquisition that primarily financially reward themselves & the brand-new executives. Founders & employees get next to nothing.
7/ Limiting founder rights significantly after the founder gracefully steps down.
8/ Last minute dark & silent exist from the board.
9/ It all makes me appreciative of my board. 🙏
10/ We all did this long, arduous, & painful journey together. Sometimes it works out, sometimes it doesn't. Be nice!

More from All

How can we use language supervision to learn better visual representations for robotics?

Introducing Voltron: Language-Driven Representation Learning for Robotics!

Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z

🧵👇(1 / 12)


Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.

Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)

The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (
https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).

The secret is *balance* (3/12)

Starting with a masked autoencoder over frames from these video clips, make a choice:

1) Condition on language and improve our ability to reconstruct the scene.

2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)

By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.

Why is the ability to shape this balance important? (5/12)

You May Also Like