A Field Guide To Gaussian Splatting

Gaussian splatting holds a lot of promise for 3D recreation and spatial storytelling. It’s faster and more photorealistic than photogrammetry, and much easier to process and interact with than neural radiance fields — giving journalists and readers the best of both worlds.

These advantages are due to the novel way that splats reconstruct 3D scenes. In a splat, people, places, and objects are made up of a point cloud defined by gaussian functions. Each gaussian function is essentially a 2D disc assigned to a point in 3D space with attributes for orientation, color, transparency, and size. When viewed in aggregate, these coalesce into a 3D scene that can very accurately represent certain things that other volumetric captures cannot: reflection, transparency, fine detail and the qualities of the light in a scene.

In this guide, we give an overview of the practical takeaways we learned exploring exploring gaussian splatting for spatial journalism. We tested a variety of capture and processing techniques using a variety of hardware and software, and found solutions on desktop and mobile devices. We also assessed splatting against the benchmarks we typically associate with photogrammetry, the current standard for 3D recreation.

Capture Basics

Capturing data for gaussian splats is very similar to capture for photogrammetry and NeRF, and R&D has actually created splats from previous datasets used to make photogrammetry models and NeRF renders. If you have experience with 3D capture, you’ll understand how to capture a gaussian splat relatively quickly.
Capture starts with a camera or a phone, and can be photo or video based. In general, you want to ensure that you have enough coverage, that is, enough views from different angles to fully reconstruct the scene, person, or object; and you also want to ensure enough overlap, or adequate similarity between frames to allow the reconstruction to build.

By A.J. Chavar, Oscar Durand, Mint Boonyapanachoti

December 16, 2024

Some of the cameras we used to capture photos and videos for splats.

Gear Selection

Gaussian Splats can be created with source images from video or photographs from nearly any camera, however, certain settings and hardware tend to yield better results. In general:

Wide lenses maximize coverage and overlap, giving the model more data to help with reconstruction.
High depth of field keeps more of the environment in focus so the model can reconstruct more of the scene.
Fast shutter speeds minimize blur from movement and camera shake, which aids reconstruction.

Mirrorless and DSLR

The same principles apply to capture using professional mirrorless, DSLR and other cameras with interchangeable lenses:

Use a lens that is 24mm or wider, or even a fisheye. However, circular fisheye lenses or other lenses so wide that they heavily vignette the image will not work well.
Use a high aperture value to resolve more detail for reconstruction.
Use a high enough ISO to allow a shutter speed that eliminates motion blur and camera shake.
Manually set a white balance or batch process images afterward to achieve consistent color and tone.

iPhones and GoPros

GoPros as well as smartphones that have built-in wide angle cameras (like the .5 iPhone lens) are very good at capturing gaussian splats. These devices yield very deep focus and their small size makes them maneuverable around complex objects and tight spaces.

Using a rig made of three GoPros mounted at different heights and orientations vastly speeds up capture time.

Video vs. Photo

Video and photographs work equally well for gaussian splats, provided you adhere to the basic capture principles outlined above. Consider ahead of time what method fits into your production workflow. Some things to think about:

Experience: If you are new to creating 3D captures, capturing video tends to be simpler.
Quality: You might opt for high resolution RAW stills when photographing a stationary object or space that requires lots of detail. More resolution generally equates to a more detailed splat — up to a point. We tested images up to 45 megapixels and video up to 8K and found little increase in quality above 20 megapixels or 6K video. If you opt for extreme resolutions, you may need to rescale your images or video before processing.
Speed: Using a high frame rate during video capture (e.g., 120fps vs 24fps) can ensure less subject movement between frames, more overlap between adjacent frames, decrease motion blur, and help guarantee sufficient coverage. Using a burst mode or automated continuous capture mode when capturing still images similarly speeds up the process.

Capture Patterns

Capturing at least three "orbits" of a person or object helps ensure adequate coverage and overlap between frames for a complete reconstruction.

People and Things

When capturing, you’ll want to ensure each frame has at least ⅓ overlap with the previous frame and the following frame to help with alignment. You’ll also want to make sure you cover as much of your subject as possible. A capture pattern we call the “three donuts” is an extremely efficient way to ensure overlap and coverage:

Orbit around the person or object you are recreating with the camera facing in, held over your head, and angled down at them.
Do a second orbit at medium height with the camera pointed straight ahead; e.g., with the camera parallel to ground, and for a person, at roughly chest height.
Finally, complete a third orbit at ground level or the base of the object/person with the camera angled up.

In addition, you may want to photograph a 180° x 360° panorama or a simple “skybox” from the same location as the person or object you are capturing. This allows the environment to be projected around the person or object reconstruction for a more fully immersive result.

Take photographs in 360° from the viewpoint of the person or object you just captured.
Ensure that at least ⅓ of each frame overlaps with the previous frame to aid reconstruction.
Remember to capture images looking up and down in 360° as well.

a graphic depicting three camera paths through an asymmetrically shaped room

When capturing spaces, the camera is oriented to face outside of the orbit, instead of inward.

Places

The “three donuts” approach still works here, but you’ll be facing outwards instead of in.

Your donut does not need to be a perfect circle — follow the boundaries of the space.
Spend more time capturing areas with high or fine detail.
Capture wide enough imagery to reconstruct the full floor and ceiling of your environment.
Depending on the size of the environment you may need additional coverage of the middle of the space, or around objects in the space.

This capture pattern also lends itself to the use of a multiple camera rig. Mounting several cameras at varying heights and angles to a monopod or dolly rig can produce high fidelity captures in a fraction of the time.

For complex captures, or to have more data on hand for reconstruction, a single orbit with many sine-wave-style undulations captures lots of overlap between frames in an efficient movement.

Additional tips

We have also had success using a capture pattern where the camera orbits a scene in its entirety at least once while also moving in a slow, deliberate sine wave oscillating from top to bottom continuously. Combining this pattern with the “three donuts” when working with highly detailed spaces or objects may be necessary when capturing and modeling “messy” scenes.

Some more tips for making more complex captures or capturing a difficult source:

Plan your route: Can you physically reach all the areas you need to capture? Can your camera? ID any problems before you start capturing to reduce headaches.
Vary the “orbit distance”: before or after your “three donuts” try repeating some areas from further away and very close to add texture and fine detail.
Location, location, location: Scenes that include foreground and background in addition to the subject translate into 3D a little bit better. So do scenes with textured, closer backgrounds, as opposed to a landscape background far in the distance.
Work swiftly: When capturing people, the faster you can work, the less they will move, and the better the reconstruction will go.

Sync your equipment: If you are working with a multicam rig or capture environment, make sure settings are synced across all your cameras: framerate, exposure settings, and most importantly: your white balance.

Extracting a person from a larger scene using SuperSplat.

Apps, Processing and Software

Scaniverse

Scaniverse, a free gaussian splatting app produced by Niantic, computes splats natively and locally without need for an internet or mobile network connection. When using a compatible phone, it incorporates LiDAR data into the point cloud. Scaniverse can only be used on a smartphone with the integrated smartphone camera. In the field, Scaniverse can be used to quickly assess whether model creation is feasible or worthwhile before beginning a more involved capture process. Scaniverse is also useful to capture a “quick and dirty” model with accurate geometry that a more complex capture can later be referenced against for scaling.

Lumalabs AI

Luma is a powerful mobile and web platform for processing NeRF and Gaussian Splats, however, processing in the cloud may take a long time. The software also has a very well-featured tool for plotting and exporting camera paths. Our testing with Luma found that it somewhat automatically generates a skybox image from the capture data to make the splats it renders feel more immersive.

Polycam

Polycam is a mobile- and web-accessible cloud-processing platform for photogrammetry, 360 photography, LiDAR scanning, and Gaussian splats. The mobile app allows you to perform guided captures on your phone, which is excellent for those new to 3D capture. Images can be processed in the cloud and downloaded to be reprocessed later. Users can edit and crop splats online and add 360 photographs or skyboxes to renders. Polycam is one of the most established apps in mobile 3D capture and processing.

PostShot

A simplified Windows-only implementation of 3D Gaussian Splatting, PostShot allows you to locally train data without the need for command-line installation. This software is currently in beta. PostShot allows more control over a wider variety of processing parameters than any other solution mentioned. Each project will need some tweaking of these settings to yield the best results from any given input, but in our testing, we found little difference increasing the processing steps above 30,000. You will need a machine with powerful GPU video rendering. You can speed up processing by using Reality Capture to align your images first, and then importing that data.

SuperSplat

Different from the other tools listed here, but in our experience integral to finalizing your scene, is SuperSplat. Splats are not currently fully supported by industry-standard desktop 3D editing software like Unreal or Blender, and SuperSplat fills in that critical gap in the post-production workflow. This web-based tool allows you to edit, combine and “clean up” existing splat ply files with very precise and easy to learn tools. It also allows you to compress splat ply files for easier web delivery. Finally, it can be used as a quick splat viewer for PLY files exported from other programs or apps. Combining splat cleaning and cropping with SuperSplat’s compression option can yield substantial results: with these techniques we were able to compress an original 739MB file to a much friendlier 23MB model with no noticeable visual degradation.

While the table has a discrete volume in the photogrammetry model (left), the glass cloche completely fails to be captured. Conversely, the gaussian splat (right) has no problem recreating the thin details, the transparency of the glass, and the varied textures throughout the scene, but this point cloud does not have discrete bounds that can be accurately measured.

Splats vs. Photogrammetry

Photogrammetry and Gaussian splatting create two fundamentally different types of 3D representations. Photogrammetry generates a mesh: a solid 3D object defined by a series of 2D polygons. Gaussian splatting creates a point cloud: a collection of points in 3D space coalesce into a model when viewed in aggregate. The point clouds used to render gaussian splats are 2D discs with values for radius, color, opacity, gradient, and orientation. Meshes on the other hand have discrete volumes, and so can be measured and scaled. High resolution imagery can be mapped to the surface of a mesh to create a high fidelity model. Splat point clouds are arranged for visual accuracy, but can’t be measured in the same way.

With photogrammetry, if we have reference measurements, we can infer other measurements. E.g., if we know the dimensions of one object in the scene are accurate, we can measure an unknown object in the scene and trust the general accuracy of those measurements. While we could make similar inferences with a gaussian splat model, the issue is that measuring within a splats is somewhat position dependent. Because the point cloud is recreating a novel viewpoint, and not a defined mesh, measurements can be inaccurate if taken from a vantage point that departs from the original camera path.

Use photogrammetry when…

… you need high fidelity geometry

… high resolution surfaces are important

… you are reconstructing simple objects or scenes with lots of capture data

Use gaussian splatting when…

…. you need to accurately display reflections, light, fog, translucent objects, or textures

… only sparse datasets are available

… you aim to show humans in 3D

…creating a 3D representation quickly is most important

Humans from a variety of separate captures composited together in SuperSplat.

Ensuring more geometrically accurate splats

In our testing, Scaniverse consistently created accurately scaled models of objects with known dimensions, allowing us to obtain relatively accurate measurements of objects with unknown dimensions. For example: in a scene containing a ruler, the ruler is accurately sized, and virtual measurements of other items in the same capture appear to align with their real-world counterparts. While it’s currently possible to create highly detailed splats with a great degree of geometric accuracy, it’s best to view this accuracy as perceptual rather than absolute. You can consistently rely on objects, people and spaces to be scaled correctly in relation to each other in a splat but we would not recommend measuring splats to obtain accurate measurements of objects or scenes with unknown geometry.

In addition to Scaniverse, there is one further step journalists in the field can take to help ensure an accurately scaled splat. Because all splats can be rescaled in post-processing, as long as we know the measurements of a single 3D object in the scene, we can rescale the splat to 1:1 with ease. While there are commercially available 3D calibration objects for this need, a Rubik’s cube is a durable and cost effective 3D object that can fit easily in many camera bags in a pinch. Capturing a Scaniverse model with the Rubik’s cube or calibration object present and using that as a guide to rescale a more complex splat of the same scene in post-production is a simple and effective way to increase accuracy without sacrificing other aspects of visual fidelity.

This splat, reconstructed from video footage used for a Times documentary, illustrates the limits of splatting when reconstructing from limited data.

Looking Ahead

The available software and capabilities for gaussian splatting are evolving quickly, and R&D plans to continue to explore this space. Read more about our work with splats: Pushing the Limits of Gaussian Splatting for Spatial Storytelling.

If you’re working on similar problems or are interested in working together, reach out to rd@nytimes.com.

Related Projects

A.J. Chavar

Spatial Journalism Creating 3D Video Journalism on a Deadline

Spatial Journalism Pushing the Limits of Gaussian Splatting for Spatial Storytelling

Spatial Journalism Adapting Spatial Audio for Browser-Based 3D Storytelling