More from All
How can we use language supervision to learn better visual representations for robotics?
Introducing Voltron: Language-Driven Representation Learning for Robotics!
Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z
🧵👇(1 / 12)
Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.
Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)
The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).
The secret is *balance* (3/12)
Starting with a masked autoencoder over frames from these video clips, make a choice:
1) Condition on language and improve our ability to reconstruct the scene.
2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)
By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.
Why is the ability to shape this balance important? (5/12)
Introducing Voltron: Language-Driven Representation Learning for Robotics!
Paper: https://t.co/gIsRPtSjKz
Models: https://t.co/NOB3cpATYG
Evaluation: https://t.co/aOzQu95J8z
🧵👇(1 / 12)
![](https://pbs.twimg.com/media/Fp_Pp79agAA36b8.jpg)
Videos of humans performing everyday tasks (Something-Something-v2, Ego4D) offer a rich and diverse resource for learning representations for robotic manipulation.
Yet, an underused part of these datasets are the rich, natural language annotations accompanying each video. (2/12)
The Voltron framework offers a simple way to use language supervision to shape representation learning, building off of prior work in representations for robotics like MVP (https://t.co/Pb0mk9hb4i) and R3M (https://t.co/o2Fkc3fP0e).
The secret is *balance* (3/12)
Starting with a masked autoencoder over frames from these video clips, make a choice:
1) Condition on language and improve our ability to reconstruct the scene.
2) Generate language given the visual representation and improve our ability to describe what's happening. (4/12)
By trading off *conditioning* and *generation* we show that we can learn 1) better representations than prior methods, and 2) explicitly shape the balance of low and high-level features captured.
Why is the ability to shape this balance important? (5/12)
You May Also Like
Keep dwelling on this:
Further Examination of the Motif near PRRA Reveals Close Structural Similarity to the SEB Superantigen as well as Sequence Similarities to Neurotoxins and a Viral SAg.
The insertion PRRA together with 7 sequentially preceding residues & succeeding R685 (conserved in β-CoVs) form a motif, Y674QTQTNSPRRAR685, homologous to those of neurotoxins from Ophiophagus (cobra) and Bungarus genera, as well as neurotoxin-like regions from three RABV strains
(20) (Fig. 2D). We further noticed that the same segment bears close similarity to the HIV-1 glycoprotein gp120 SAg motif F164 to V174.
https://t.co/EwwJOSa8RK
In (B), the segment S680PPRAR685 including the PRRA insert and highly conserved cleavage site *R685* is shown in van der Waals representation (black labels) and nearby CDR residues of the TCRVβ domain are labeled in blue/white
https://t.co/BsY8BAIzDa
Sequence Identity %
https://t.co/BsY8BAIzDa
Y674 - QTQTNSPRRA - R685
Similar to neurotoxins from Ophiophagus (cobra) & Bungarus genera & neurotoxin-like regions from three RABV strains
T678 - NSPRRA- R685
Superantigenic core, consistently aligned against bacterial or viral SAgs
Further Examination of the Motif near PRRA Reveals Close Structural Similarity to the SEB Superantigen as well as Sequence Similarities to Neurotoxins and a Viral SAg.
The insertion PRRA together with 7 sequentially preceding residues & succeeding R685 (conserved in β-CoVs) form a motif, Y674QTQTNSPRRAR685, homologous to those of neurotoxins from Ophiophagus (cobra) and Bungarus genera, as well as neurotoxin-like regions from three RABV strains
(20) (Fig. 2D). We further noticed that the same segment bears close similarity to the HIV-1 glycoprotein gp120 SAg motif F164 to V174.
https://t.co/EwwJOSa8RK
![](https://pbs.twimg.com/media/Ew0HvalUYAYijlY.png)
In (B), the segment S680PPRAR685 including the PRRA insert and highly conserved cleavage site *R685* is shown in van der Waals representation (black labels) and nearby CDR residues of the TCRVβ domain are labeled in blue/white
https://t.co/BsY8BAIzDa
![](https://pbs.twimg.com/media/Ew0MGQlVoAYB2ZO.png)
Sequence Identity %
https://t.co/BsY8BAIzDa
Y674 - QTQTNSPRRA - R685
Similar to neurotoxins from Ophiophagus (cobra) & Bungarus genera & neurotoxin-like regions from three RABV strains
T678 - NSPRRA- R685
Superantigenic core, consistently aligned against bacterial or viral SAgs
![](https://pbs.twimg.com/media/Ew0MumbUcAMRHTO.jpg)