Jamal Khashoggi, a veteran Saudi journalist who Turkish officials say was killed in Istanbul this week after walking into the consulate of Saudi Arabia, has been writing columns for The Post since last year.

Here are some excerpts from his columns:

“When I speak of the fear, intimidation, arrests and public shaming of intellectuals and religious leaders who dare to speak their minds, and then I tell you that I’m from Saudi Arabia, are you surprised?” – Sept. 18, 2017 https://t.co/xt17c9NhJ5
“How can we become more moderate when such extremist views are tolerated? How can we progress as a nation when those offering constructive feedback and (often humorous) dissent are banished?” – Oct. 31, 2017 https://t.co/p3uRtkLUuc
“MBS’s downsizing and relative humbling of the House of Saud is welcome news. But maybe he should learn from the British royal house that has earned true stature, respect and success by trying a little humility himself.” – Feb. 28, 2018 https://t.co/krtJkGCWY2
“At the end of [‘Black Panther’], the young king of Wakanda chooses to use his country’s power to engage with the world for the greater good. Will Crown Prince Mohammed bin Salman .... use his power to bring peace to the world around him?" - April 17, 2018 https://t.co/HxUAAipQdo

More from All

How can we use language supervision to learn better visual representations for robotics?

Introducing Voltron: Language-Driven Representation Learning for Robotics!

Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z

🧵👇(1 / 12)


Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.

Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)

The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (
https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).

The secret is *balance* (3/12)

Starting with a masked autoencoder over frames from these video clips, make a choice:

1) Condition on language and improve our ability to reconstruct the scene.

2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)

By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.

Why is the ability to shape this balance important? (5/12)

You May Also Like