Neural Volume Rendering for Dynamic Scenes

NeRF has shown incredible view synthesis results, but it requires multi-view captures for STATIC scenes.

How can we achieve view synthesis for DYNAMIC scenes from a single video? Here is what I learned from several recent efforts.

Instead of presenting Video-NeRF, Nerfie, NR-NeRF, D-NeRF, NeRFlow, NSFF (and many others!) as individual algorithms, here I try to view them from a unifying perspective and understand the pros/cons of various design choices.

Okay, here we go.
*Background*

NeRF represents the scene as a 5D continuous volumetric scene function that maps the spatial position and viewing direction to color and density. It then projects the colors/densities to form an image with volume rendering.

Volumetric + Implicit -> Awesome!
*Model*

Building on NeRF, one can extend it for handling dynamic scenes with two types of approaches.

A) 4D (or 6D with views) function.

One direct approach is to include TIME as an additional input to learn a DYNAMIC radiance field.

e.g., Video-NeRF, NSFF, NeRFlow
B) 3D Template with Deformation.

Inspired by non-rigid reconstruction methods, this type of approach learns a radiance field in a canonical frame (template) and predicts deformation for each frame to account for dynamics over time.

e.g., Nerfie, NR-NeRF, D-NeRF
*Deformation Model*

All the methods use an MLP to encode the deformation field. But, how do they differ?

A) INPUT: How to encode the additional time dimension as input?

B) OUTPUT: How to parametrize the deformation field?
A) Input conditioning

One can choose to use EXPLICIT conditioning by treating the frame index t as input.

Alternatively, one can use a learnable LATENT vector for each frame.
B) Output parametrization

We can either use the MLP to predict
- dense 3D translation vectors (aka scene flow) or
- dense rigid motion field
With these design choices in mind, we can mix-n-match to synthesize all the methods.
*Regularization*

Adding the deformation field introduces ambiguities. So we need to make it "well-behaved", e.g., the deformation field should be spatially smooth, temporally smooth, sparse, and avoid contraction and expansion.

More from Tech

The entire discussion around Facebook’s disclosures of what happened in 2016 is very frustrating. No exec stopped any investigations, but there were a lot of heated discussions about what to publish and when.


In the spring and summer of 2016, as reported by the Times, activity we traced to GRU was reported to the FBI. This was the standard model of interaction companies used for nation-state attacks against likely US targeted.

In the Spring of 2017, after a deep dive into the Fake News phenomena, the security team wanted to publish an update that covered what we had learned. At this point, we didn’t have any advertising content or the big IRA cluster, but we did know about the GRU model.

This report when through dozens of edits as different equities were represented. I did not have any meetings with Sheryl on the paper, but I can’t speak to whether she was in the loop with my higher-ups.

In the end, the difficult question of attribution was settled by us pointing to the DNI report instead of saying Russia or GRU directly. In my pre-briefs with members of Congress, I made it clear that we believed this action was GRU.
Ok, I’ve told this story a few times, but maybe never here. Here we go. 🧵👇


I was about 6. I was in the car with my mother. We were driving a few hours from home to go to Orlando. My parents were letting me audition for a tv show. It would end up being my first job. I was very excited. But, in the meantime we drove and listened to Rush’s show.

There was some sort of trivia question they posed to the audience. I don’t remember what the riddle was, but I remember I knew the answer right away. It was phrased in this way that was somehow just simpler to see from a kid’s perspective. The answer was CAROUSEL. I was elated.

My mother was THRILLED. She insisted that we call Into the show using her “for emergencies only” giant cell phone. It was this phone:


I called in. The phone rang for a while, but someone answered. It was an impatient-sounding dude. The screener. I said I had the trivia answer. He wasn’t charmed, I could hear him rolling his eyes. He asked me what it was. I told him. “Please hold.”
What an amazing presentation! Loved how @ravidharamshi77 brilliantly started off with global macros & capital markets, and then gradually migrated to Indian equities, summing up his thesis for a bull market case!

@MadhusudanKela @VQIndia @sameervq

My key learnings: ⬇️⬇️⬇️


First, the BEAR case:

1. Bitcoin has surpassed all the bubbles of the last 45 years in extent that includes Gold, Nikkei, dotcom bubble.

2. Cyclically adjusted PE ratio for S&P 500 almost at 1929 (The Great Depression) peaks, at highest levels except the dotcom crisis in 2000.

3. World market cap to GDP ratio presently at 124% vs last 5 years average of 92% & last 10 years average of 85%.
US market cap to GDP nearing 200%.

4. Bitcoin (as an asset class) has moved to the 3rd place in terms of price gains in preceding 3 years before peak (900%); 1st was Tulip bubble in 17th century (rising 2200%).

You May Also Like

Recently, the @CNIL issued a decision regarding the GDPR compliance of an unknown French adtech company named "Vectaury". It may seem like small fry, but the decision has potential wide-ranging impacts for Google, the IAB framework, and today's adtech. It's thread time! 👇

It's all in French, but if you're up for it you can read:
• Their blog post (lacks the most interesting details):
https://t.co/PHkDcOT1hy
• Their high-level legal decision: https://t.co/hwpiEvjodt
• The full notification: https://t.co/QQB7rfynha

I've read it so you needn't!

Vectaury was collecting geolocation data in order to create profiles (eg. people who often go to this or that type of shop) so as to power ad targeting. They operate through embedded SDKs and ad bidding, making them invisible to users.

The @CNIL notes that profiling based off of geolocation presents particular risks since it reveals people's movements and habits. As risky, the processing requires consent — this will be the heart of their assessment.

Interesting point: they justify the decision in part because of how many people COULD be targeted in this way (rather than how many have — though they note that too). Because it's on a phone, and many have phones, it is considered large-scale processing no matter what.