Here's the story of ‘the most famous argument’ in reality-TV history.

Via @MeredithBlake https://t.co/ptNe27oIxe

In 1992, during the first season of “The Real World," cast members Kevin Powell and Julie Gentry got into a heated argument about racism and white privilege.

It became an iconic moment. Here's why https://t.co/ptNe27oIxe
Some of his roommates — and many viewers at the time — accused him of being overly confrontational.

But viewed in 2021, the era of Black Lives Matter, Powell’s words seem more prescient than anything. https://t.co/ptNe27oIxe
The original “7 strangers” returned to the old loft to film “The Real World: Homecoming."

The new reunion wasn't all happy.

A cast member walked out during an episode. We explain:
https://t.co/ptNe27oIxe
Here are some behind-the-scenes moments from the first season of MTV’s groundbreaking reality show “The Real World" https://t.co/URaPz4LPJN

More from All

How can we use language supervision to learn better visual representations for robotics?

Introducing Voltron: Language-Driven Representation Learning for Robotics!

Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z

🧵👇(1 / 12)


Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.

Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)

The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (
https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).

The secret is *balance* (3/12)

Starting with a masked autoencoder over frames from these video clips, make a choice:

1) Condition on language and improve our ability to reconstruct the scene.

2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)

By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.

Why is the ability to shape this balance important? (5/12)

You May Also Like