Gaussian splatting is a method for real-time radiance field rendering that represents 3D scenes as collections of millions of anisotropic 3D Gaussian primitives, each defined by a position, covariance matrix, opacity, and color (encoded via spherical harmonics). Introduced by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuhler, and George Drettakis from INRIA and the Max Planck Institute for Informatics at SIGGRAPH 2023, the technique achieves state-of-the-art visual quality for novel view synthesis while rendering at over 100 frames per second at 1080p resolution. This combination of quality and speed represented a major departure from neural radiance fields (NeRF), which typically required seconds to minutes per frame. The original paper, "3D Gaussian Splatting for Real-Time Radiance Field Rendering," was published in ACM Transactions on Graphics and quickly became one of the most influential papers in computer vision and graphics of the decade [1].
To understand why Gaussian splatting had such an impact, it helps to understand the state of the art it displaced. Neural radiance fields (NeRFs), introduced by Mildenhall et al. in 2020, represented a breakthrough in novel view synthesis. A NeRF encodes a 3D scene as a continuous function, mapping a 5D input (3D spatial coordinates plus 2D viewing direction) to an output of color and density. This function is parameterized by a multilayer perceptron (MLP) that is optimized to reproduce a set of input photographs of the scene [2].
NeRFs produced remarkably photorealistic renderings, but they had significant practical limitations. Rendering a single image required casting rays through the scene and evaluating the MLP at hundreds of sample points along each ray, a process called volumetric ray marching. This was computationally expensive: even on high-end GPUs, rendering a single 1080p frame from a NeRF could take 30 seconds or more. Training a NeRF also required hours of optimization. These speed limitations made NeRFs impractical for real-time applications like virtual reality, augmented reality, or interactive 3D visualization [2].
Several follow-up works attempted to accelerate NeRFs. Instant-NGP (Muller et al., 2022) introduced multi-resolution hash encoding to speed up both training and inference, reducing training time from hours to minutes and rendering time to near-interactive rates. Plenoxels (Yu et al., 2022) replaced the MLP with a sparse voxel grid, eliminating neural network evaluation entirely during rendering. These approaches improved speed substantially but still fell short of consistent real-time rendering at high resolutions [3].
Splatting is a rendering technique with roots in the computer graphics literature of the 1990s. Rather than casting rays from the camera into the scene (as in ray marching), splatting projects primitives from the scene onto the image plane. Each primitive "splats" its contribution onto nearby pixels, and the final image is formed by compositing all contributions.
Westover (1990) introduced splatting for volume rendering, and Zwicker et al. (2001) developed the mathematical framework for rendering elliptical Gaussians ("EWA splatting"). The key insight of these methods was that Gaussians are well-suited as rendering primitives because they are smooth, differentiable, and their projection from 3D to 2D has a closed-form solution: a 3D Gaussian projected onto a 2D image plane is simply a 2D Gaussian [4].
Kerbl et al. combined this classical splatting approach with modern differentiable rendering and gradient-based optimization, creating a system that could be trained from photographs (like a NeRF) but rendered in real time (unlike a NeRF).
In 3D Gaussian splatting, a scene is represented as a set of 3D Gaussian primitives, typically numbering in the hundreds of thousands to millions depending on scene complexity. Each Gaussian is defined by the following parameters:
| Parameter | Description | Representation |
|---|---|---|
| Position (mean) | The 3D center of the Gaussian | 3D vector (x, y, z) |
| Covariance matrix | Controls the shape, size, and orientation of the Gaussian (an ellipsoid in 3D) | 3x3 symmetric positive semi-definite matrix, parameterized as a rotation quaternion + scale vector |
| Opacity | Controls how opaque the Gaussian is | Scalar in [0, 1] |
| Color | View-dependent appearance of the Gaussian | Spherical harmonics coefficients (typically degree 3, giving 48 coefficients for RGB) |
The covariance matrix is particularly important because it determines the shape of each Gaussian. A spherical covariance produces a round blob, while an anisotropic covariance produces an elongated ellipsoid that can represent surfaces, edges, and other geometric features. To ensure the covariance matrix remains valid (positive semi-definite) during optimization, it is parameterized as a rotation (represented by a quaternion) and a scale vector, from which the covariance is reconstructed [1].
Color is encoded using spherical harmonics (SH), a set of basis functions defined on the sphere. This allows each Gaussian to have view-dependent color: the color changes depending on the viewing angle, enabling the representation of specular highlights, reflections, and other view-dependent effects. Degree-3 spherical harmonics use 16 coefficients per color channel (48 total for RGB), providing a good balance between expressiveness and compactness [1].
Training a Gaussian splatting model begins with a sparse point cloud, typically produced by Structure-from-Motion (SfM) using a tool like COLMAP. Each point in the SfM output becomes the initial position of a Gaussian. The training process then optimizes all Gaussian parameters to minimize the difference between rendered images and the ground-truth photographs [1].
The optimization proceeds as follows:
Initialization. Sparse SfM points are used as initial Gaussian positions. Each Gaussian is initialized with a small isotropic covariance, uniform opacity, and random SH coefficients.
Differentiable rendering. For each training view, the Gaussians are projected onto the image plane using a differentiable splatting operation. The 3D covariance of each Gaussian is projected to a 2D covariance using the camera projection matrix, and the resulting 2D Gaussians are alpha-composited in depth order to produce the rendered image.
Loss computation. The rendered image is compared to the ground-truth photograph using a combination of L1 loss and a structural similarity (SSIM) loss.
Gradient-based optimization. Gradients are backpropagated through the differentiable renderer to all Gaussian parameters (position, covariance, opacity, SH coefficients), and parameters are updated using the Adam optimizer.
Adaptive density control. Periodically during training, the system performs densification and pruning. Gaussians in under-reconstructed regions (indicated by large positional gradients) are split or cloned to add detail. Gaussians that have become nearly transparent (low opacity) or excessively large are removed. This adaptive process allows the model to allocate more Gaussians to complex regions and fewer to simple ones.
Training on a typical scene (such as those in the Mip-NeRF 360 dataset) takes approximately 20 to 40 minutes on a single NVIDIA GPU, comparable to or faster than the fastest NeRF variants [1].
The key technical contribution that enables real-time rendering is a custom CUDA-based tile-based rasterizer. Rather than using volumetric ray marching (which requires evaluating many samples per pixel), the rasterizer projects each Gaussian onto the image plane, determines which screen tiles it overlaps, and sorts Gaussians by depth within each tile. The contribution of each Gaussian to each pixel is then computed using the projected 2D Gaussian function, and Gaussians are alpha-composited front to back [1].
This rasterization approach has two major advantages over ray marching. First, it avoids the need to sample the scene densely along each ray, which is the primary bottleneck in NeRF rendering. Second, it is naturally parallelizable across tiles and pixels, making it well-suited to GPU execution. The entire rasterizer is differentiable, enabling end-to-end gradient-based optimization during training.
The following table compares 3D Gaussian splatting with NeRF-based methods across several dimensions:
| Feature | 3D Gaussian splatting | NeRF (original) | Instant-NGP |
|---|---|---|---|
| Scene representation | Millions of explicit 3D Gaussians | Implicit MLP | Hash grid + small MLP |
| Rendering method | Tile-based rasterization (splatting) | Volumetric ray marching | Volumetric ray marching |
| Rendering speed (1080p) | 100-200+ FPS | 0.03-0.1 FPS | 5-15 FPS |
| Training time | 20-40 minutes | 12-48 hours | 5-10 minutes |
| Visual quality (PSNR on Mip-NeRF 360) | 27-28 dB | 25-27 dB | 26-28 dB |
| Real-time capable | Yes | No | Near-interactive |
| Editability | High (explicit primitives can be directly manipulated) | Low (implicit representation is difficult to edit) | Low to moderate |
| Memory usage | High (hundreds of MB to several GB for complex scenes) | Low (tens of MB for the MLP) | Moderate (hash grid storage) |
| View-dependent effects | Yes (via spherical harmonics) | Yes (via directional input to MLP) | Yes |
| Dynamic scenes | Requires extensions (4D-GS) | Requires extensions (D-NeRF) | Requires extensions |
The most significant advantage of Gaussian splatting is rendering speed. At over 100 FPS at 1080p, it is the first radiance field method suitable for real-time applications like VR headsets (which require 90+ FPS per eye) and interactive 3D viewers. The tradeoff is higher memory consumption: a Gaussian splatting model of a complex scene may occupy several gigabytes, compared to tens of megabytes for a NeRF [1] [2].
Another notable advantage is editability. Because the scene is represented as an explicit collection of primitives (rather than encoded implicitly in neural network weights), individual Gaussians can be selected, moved, deleted, or modified. This makes scene editing, object removal, and composition much more straightforward than with NeRFs, where editing the scene typically requires retraining the entire model.
The real-time rendering capability of Gaussian splatting makes it particularly suited for VR and AR applications. VR headsets require consistent frame rates of 90 FPS or higher per eye to avoid motion sickness, a threshold that NeRF-based methods cannot meet but Gaussian splatting comfortably exceeds. Several VR platforms have integrated Gaussian splatting for creating immersive environments from real-world captures [5].
AR applications benefit from the speed of Gaussian splatting for real-time scene overlay and reconstruction. Researchers have demonstrated mobile AR applications that use Gaussian splatting to reconstruct and render 3D environments in real time, enabling more convincing placement of virtual objects in physical spaces.
Gaussian splatting has found applications in autonomous driving for scene reconstruction and simulation. DrivingGaussian (Peking University, 2024) presents a framework for reconstructing surrounding dynamic autonomous driving scenes that models static backgrounds and handles multiple moving objects separately, using LiDAR priors for greater detail and panoramic consistency [6].
SplatAD (CVPR 2025) demonstrates real-time LiDAR and camera rendering with 3D Gaussian splatting for autonomous driving, addressing the challenge of efficiently rendering multi-sensor data. Creating digital twins of driving scenes using Gaussian splatting enables simulation of safety-critical corner cases that are difficult and costly to capture in the real world. These reconstructed scenes can be used for closed-loop evaluation, improving the reliability of autonomous driving systems [6].
Gaussian splatting enables rapid creation of photorealistic digital twins from imagery. UAVTwin, a 2025 project, creates digital twins from real-world environments captured by unmanned aerial vehicles, integrating 3D Gaussian splatting for background reconstruction with controllable synthetic human models for data augmentation. These digital twins can be used to train and evaluate computer vision models for downstream tasks [7].
For mapping applications, researchers have combined Gaussian splatting with LiDAR data. GS-SDF (2025) augments Gaussian splatting with LiDAR data and neural signed distance functions for geometrically consistent rendering and reconstruction, improving surface accuracy for applications that require precise geometry in addition to photorealistic appearance [7].
Gaussian splatting offers a practical tool for documenting cultural heritage sites. By capturing a set of photographs of a historical building, sculpture, or archaeological site, researchers can create a photorealistic 3D model that can be viewed interactively from any angle. Compared to traditional photogrammetry, Gaussian splatting produces more visually convincing results (particularly for view-dependent effects like the sheen of marble or the translucency of stained glass) and can be rendered in real time for virtual museum exhibits.
Several cultural heritage projects have adopted Gaussian splatting for digitizing artifacts and sites that are deteriorating or at risk, creating permanent records that can be explored interactively.
The visual effects industry has taken notice of Gaussian splatting as a tool for set reconstruction, environment capture, and previz. By scanning a real location with photographs or video, a production team can create a photorealistic digital double of the set that can be viewed from any angle in real time. This is useful for planning camera moves, lighting setups, and visual effects integration before committing to expensive on-set shooting days [5].
Since the original SIGGRAPH 2023 paper, the Gaussian splatting ecosystem has expanded rapidly. By 2025, hundreds of papers extending the technique had been published, covering dynamic scenes, scene editing, 3D generation, SLAM, and more.
The original 3D Gaussian splatting method assumes a static scene. Extending it to dynamic scenes with moving objects or changing environments requires modeling how Gaussians change over time.
4D Gaussian Splatting (4D-GS), presented at CVPR 2024 by Wu et al. from Huazhong University of Science and Technology, introduced a holistic representation for dynamic scenes. Rather than applying 3D-GS independently per frame, 4D-GS uses a decomposed neural voxel encoding algorithm (inspired by HexPlane) to build Gaussian features from 4D neural voxels. A lightweight MLP predicts Gaussian deformations at novel timestamps, enabling smooth interpolation between observed frames [8].
4D-GS achieves real-time rendering at 82 FPS at 800x800 resolution on an RTX 3090 GPU while maintaining quality comparable to or better than previous methods for dynamic scene rendering. The approach has been further refined in subsequent work, including Anchored 4D Gaussian Splatting (SIGGRAPH Asia 2025), which uses anchor points to regulate 4D Gaussian attributes, achieving better rendering quality with reasonable storage requirements [8].
GaussianEditor, presented at CVPR 2024, is a framework for editing 3D Gaussian splatting scenes using text instructions. Given a reconstructed Gaussian splatting scene and a text prompt (e.g., "make the car red" or "add snow to the roof"), GaussianEditor identifies the region of interest corresponding to the instruction, aligns it to the relevant 3D Gaussians, and applies targeted edits [9].
The system uses InstructPix2Pix to generate edited 2D images as editing guidance and ControlNet-Inpainting to transfer 2D edits into 3D. Benefiting from the explicit nature of 3D Gaussians, GaussianEditor achieves more precise edits than methods that operate on implicit representations. Editing can be completed within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF, which requires 45 minutes to 2 hours [9].
DreamGaussian, presented as an oral paper at ICLR 2024, tackles 3D content generation rather than reconstruction. Given a single image or text prompt, DreamGaussian generates a textured 3D mesh in approximately 2 minutes, achieving roughly 10 times the speed of previous methods like DreamFusion [10].
The key insight is that the progressive densification of 3D Gaussians converges significantly faster for generative tasks compared to the occupancy pruning used in NeRF-based generation. DreamGaussian pairs the Gaussian optimization with a mesh extraction step and UV-space texture refinement, producing photo-realistic 3D assets with explicit mesh and texture maps suitable for use in game engines and other standard 3D pipelines [10].
DreamGaussian4D extends this framework to generate dynamic 4D scenes, using Gaussian splatting's explicit modeling of spatial transformations to enable flexible control of generated 3D motion. The optimization time is reduced from several hours (for implicit methods) to a few minutes [10].
Simultaneous Localization and Mapping (SLAM) is the problem of building a map of an unknown environment while simultaneously tracking the camera's position within it. Several research groups have combined Gaussian splatting with SLAM systems to create dense, photorealistic maps in real time.
Gaussian-SLAM approaches represent the map as a Gaussian splatting model that is updated incrementally as new frames arrive. The explicit nature of the Gaussian representation makes it straightforward to add new primitives for newly observed regions and to refine existing ones. Compared to traditional SLAM systems that produce sparse point clouds or mesh reconstructions, Gaussian-SLAM produces maps that can be rendered photorealistically from any viewpoint, which is useful for VR telepresence, robotic navigation, and remote inspection [11].
The following table summarizes additional notable extensions of Gaussian splatting:
| Variant | Description | Venue / Year |
|---|---|---|
| Mip-Splatting | Anti-aliased Gaussian splatting that handles multi-scale rendering without artifacts | CVPR 2024 |
| SuGaR | Surface-Aligned Gaussian Splatting for efficient mesh extraction from Gaussian scenes | CVPR 2024 |
| Scaffold-GS | Uses neural Gaussians anchored to a scaffold structure for better scene coverage | CVPR 2024 |
| GaussianDreamer | Text-to-3D generation combining 3D Gaussian splatting with 2D and 3D diffusion models | ICLR 2024 |
| PhysGaussian | Physically-based simulation with Gaussian splatting for deformable objects | CVPR 2024 |
| LangSplat | Language-embedded Gaussian splatting for open-vocabulary 3D scene understanding | CVPR 2024 |
| GaussianAvatar | Animatable human avatar creation using Gaussian splatting | CVPR 2024 |
| Spacetime Gaussian Features | Gaussian splatting for dynamic novel view synthesis of real scenes | CVPR 2024 |
Gaussian splatting models can be large. A scene represented by several million Gaussians, each with position (3 floats), covariance parameters (7 floats for quaternion + scale), opacity (1 float), and SH coefficients (48 floats for degree-3 RGB), requires significant storage. Complex scenes can occupy several gigabytes in memory, which is a constraint for mobile and embedded applications. Compression techniques, including vector quantization of SH coefficients and pruning of low-contribution Gaussians, are active areas of research [1].
While Gaussian splatting produces excellent rendering quality, the underlying geometry (the positions and shapes of the Gaussians) does not always correspond to clean, watertight surfaces. Extracting a traditional mesh from a Gaussian splatting representation requires additional processing (as in SuGaR, which aligns Gaussians to surfaces). For applications that require precise geometry, such as 3D printing or physics simulation, additional post-processing is needed [12].
Like NeRFs, Gaussian splatting requires a set of calibrated input images with known camera poses. The quality of the reconstruction depends on adequate view coverage; sparse or poorly distributed input views can result in artifacts and missing regions. SfM preprocessing with tools like COLMAP is a standard requirement, and the quality of the SfM output directly affects the initialization and final quality of the Gaussian model.
The original Gaussian splatting method uses spherical harmonics for view-dependent color, which works well for diffuse and moderately specular surfaces but struggles with mirror-like reflections, transparency, and complex light transport. Extensions like 3DGS-DR (3D Gaussian Splatting with Deferred Reflection) have been developed to address these limitations, separating reflection components for more accurate rendering of reflective surfaces [12].
As of early 2026, Gaussian splatting has become one of the dominant paradigms in 3D scene representation and rendering, alongside (and increasingly displacing) NeRF-based approaches. Several trends characterize the current state of the field:
Industry adoption. Gaussian splatting has moved beyond academic research into commercial products. Companies in the VR/AR, visual effects, real estate, and autonomous driving sectors are integrating Gaussian splatting into their pipelines. Tools like Luma AI, Polycam, and others offer consumer-facing Gaussian splatting capture and viewing.
Convergence with generative AI. The intersection of Gaussian splatting with generative models is producing tools that can create 3D content from text or images in minutes rather than hours. DreamGaussian, GaussianDreamer, and related methods are making 3D content creation accessible to non-experts, with implications for gaming, e-commerce, and digital media.
Mobile and web deployment. Efforts to compress Gaussian splatting models for mobile devices and web browsers are progressing. WebGL and WebGPU-based Gaussian splatting viewers allow interactive 3D scene exploration directly in a browser, and lightweight variants are being optimized for smartphones.
Dynamic and interactive scenes. The extension of Gaussian splatting to dynamic scenes (4D-GS), physics simulation (PhysGaussian), and interactive editing (GaussianEditor) is expanding the range of applications beyond static reconstruction. Real-time rendering of dynamic Gaussian scenes at high frame rates opens possibilities for live event capture, sports replay, and interactive training simulations.
Research volume. The pace of research is extraordinary. A curated list of Gaussian splatting papers on arXiv for 2025 alone contains hundreds of entries, covering topics from few-shot reconstruction and relighting to audio-driven avatars and text-to-4D generation. The field is evolving so quickly that techniques considered state-of-the-art at the beginning of 2025 may already be superseded by mid-year.
Gaussian splatting has, in less than three years since its introduction, fundamentally changed how researchers and practitioners think about 3D scene representation. Its combination of real-time performance, high visual quality, and explicit editability has addressed many of the practical limitations that held back NeRF-based methods from real-world deployment. While challenges remain (particularly around memory efficiency, geometry quality, and handling of complex light transport), the trajectory of the field suggests that these limitations will continue to be addressed through the rapid pace of ongoing research.