NeRFs & Gaussian Splatting

1. Inverse Rendering

Traditional rendering is slow, & creating 3D assets is hard. Inverse Rendering takes a rendered image & tries to reconstruct the 3D scene.

The input is many images of a static scene, with unchanging lighting. The output is a function that can render the scene from any viewpoint.

1.1 Optimization with SGD

Where:

We want to minimize the difference between the rendered image and the actual image across all camera positions.

1.2 Initial Conditions

The initial conditions very important for reaching a good local minimum.

1.3 Volume Rendering

The functions for volumetric rendering happen to be extremely convenient for gradient descent optimization. We can simplify it here by ignoring scattering, assuming all particles either absorbs or emit light:

Where:

The directional component allows us to capture view-dependent effects, such as specular highlights and reflections, which are crucial for realistic rendering.

To approximate this integral, we take samples along the ray and sum their contributions, which can be efficiently computed using numerical integration techniques. We also have , which accounts for the cumulative absorption along the ray. So:

Where:

2. 3D Representations

An explicit representation defines each point, while an implicit representation states a point with a condition function.

With -level sets, we can represent a surface as the set of points where a function . For example, an implicit surface can be defined as the set of points where , which allows us to represent complex geometries without explicitly defining each point. is a signed distance function (SDF), which gives the distance from a point to the surface, with the sign indicating whether the point is inside or outside the surface.

A constructive solid geometry represents surfaces by boolean operations on primitives.

A neural network can represent a continuous function of the entire scene.

3. NeRFs

We have a volumetric cloud, and a set of posed images. To represent the volume, we use a Neural Radiance Field (NeRF), which is a fully connected neural network that takes in a 3D position and a viewing direction, and outputs the density and emitted radiance at that point.

Positional Encoding is used to help the network learn high-frequency details

3.1 Representations

4. Gaussian Splatting

Primitive Based representations use rasterization instead of raytracing. It uses a point cloud (unstructured geometry). It has a multi-chart manifold, meaning it can represent complex surfaces without needing a single continuous parameterization. Its independent to permutations, so we can sort points in screen space. It also has a view-dependent representation, allowing for better handling of view-dependent effects like specular highlights.

4.1 Surface Splatting

  1. Consider oriented points (surfels) as discrete samples of a texture function on a surface.
  2. A Gaussian reconstruction kernel is used to recover a continuous signal.
  3. This is then sampled in screen space.

4.2 Volume Splatting

Instead, we can use oriented ellipsoids as primitives, which can represent volumetric data. This allows for better handling of complex scenes with varying levels of detail, and can capture view-dependent effects more effectively than surface splatting.

To blend points in screen space, use alpha blending (as this is differentiable). This allows us to give each point an opacity value, which can be used to create smooth transitions between points and capture fine details in the scene.

To render:

  1. Splat: compute the shape of the Gaussian after projection. The center is projected as before, but the shape's (covariance matrix) transformation must be approximated using the first terms of the Taylor Series to ensure affine transformations (Gaussians are closed after affine transformations).
  2. Sort: globally sort the points by depth.
  3. Blend: alpha composite.

To optimise a covariance matrix, reparameterize with a rotation and scaling matrix, which are easier to optimize than the covariance matrix directly: .

To splat:

  1. Given point cloud , and point on the surface.
  2. Create local parameterization for neighbours of .
  3. Each 3D point is a ssociated to a local 2D coord , .
  4. A continuous surface function is then , where is a weight and is a Gaussian kernel.
Back to Home