fvdb_reality_capture.radiance_fieldsο
Gaussian Splat Radiance Field Reconstructionο
- class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionConfig(seed: int = 42, max_epochs: int = 200, max_steps: int | None = None, eval_at_percent: ~typing.List[int] = <factory>, save_at_percent: ~typing.List[int] = <factory>, batch_size: int = 1, crops_per_image: int = 1, sh_degree: int = 3, increase_sh_degree_every_epoch: int = 5, initial_opacity: float = 0.1, initial_covariance_scale: float = 1.0, ssim_lambda: float = 0.2, lpips_net: ~typing.Literal['vgg', 'alex'] = 'alex', opacity_reg: float = 0.0, scale_reg: float = 0.0, random_bkgd: bool = False, refine_start_epoch: int = 3, refine_stop_epoch: int = 100, refine_every_epoch: float = 0.65, ignore_masks: bool = False, remove_gaussians_outside_scene_bbox: bool = False, optimize_camera_poses: bool = True, pose_opt_lr: float = 1e-05, pose_opt_reg: float = 1e-06, pose_opt_lr_decay: float = 1.0, pose_opt_start_epoch: int = 0, pose_opt_stop_epoch: int = 200, pose_opt_init_std: float = 0.0001, near_plane: float = 0.01, far_plane: float = 10000000000.0, min_radius_2d: float = 0.0, eps_2d: float = 0.3, antialias: bool = False, tile_size: int = 16)[source]ο
Configuration parameters for reconstructing a Gaussian splat radiance field from posed images.
See
GaussianSplatReconstructionfor details on how these parameters are used.- antialias: bool = Falseο
Whether to use anti-aliasing when rendering the Gaussians.
Default:
False
- batch_size: int = 1ο
Batch size for optimization. Each step of optimization will compute losses on
batch_sizeimages. Note that learning rates are scaled automatically based on the batch size.Default:
1
- crops_per_image: int = 1ο
Number of crops to use per image during reconstruction. If youβre using very large images, you can set this to a value greater than 1 to run the forward pass on crops and accumulate gradients. This can help reduce memory usage.
Default:
1(no cropping, use full images).
- eps_2d: float = 0.3ο
Amount of padding (in pixels) to add to the screen space bounding box of each Gaussian when determining which pixels it affects.
Default:
0.3
- eval_at_percent: List[int]ο
Percentage of the total optimization epochs at which to perform evaluation on the validation set.
For example, if eval_at_percent is set to [10, 50, 100] and max_epochs is set to 200, then evaluation will be performed after 20, 100, and 200 epochs.
Default: [10, 20, 30, 40, 50, 75, 100]
- far_plane: float = 10000000000.0ο
Far plane clipping distance when rendering the Gaussians.
Default:
1e10
- ignore_masks: bool = Falseο
If set to
True, then ignore any masks in the data and treat all pixels as valid during optimization.Default:
False
- increase_sh_degree_every_epoch: int = 5ο
When reconstructing a Gaussian splat radiance field, we start by only optimizing the diffuse (degree 0) spherical harmonics coefficients per Gaussian, and progressively increase the degree of spherical harmonics used every
increase_sh_degree_every_epochepochs until we reachsh_degree. This helps stabilize optimization in the early stages of optimization.Default:
5
- initial_covariance_scale: float = 1.0ο
Initial scale of each Gaussian. This controls the initial size of the Gaussians in the scene. Each Gaussianβs covariance matrix will be initialized to a diagonal matrix with this value on the diagonal.
Default:
1.0
- initial_opacity: float = 0.1ο
Initial opacity of each Gaussian. This is the alpha value used when rendering the Gaussians at the start of optimization.
Default:
0.1
- lpips_net: Literal['vgg', 'alex'] = 'alex'ο
During evaluation, we compute the Learned Perceptual Image Patch Similarity (LPIPS) metric as a measure of quality of the reconstruction. This parameter controls which network architecture is used for the LPIPS metric.
Default:
"alex"meaning the AlexNet architecture.
- max_epochs: int = 200ο
The maximum number of optimization epochs, i.e., the number of times each image in the dataset will be visited.
An epoch is defined as one full pass through the dataset. If you have a dataset with 100 images and a batch size of 10, then one epoch corresponds to 10 steps.
Default: 200
- max_steps: int | None = Noneο
The maximum number of optimization steps. If set, this overrides the number of steps calculated from max_epochs and the dataset size.
You shouldnβt use this parameter unless you have a specific reason to do so.
Default: None
- min_radius_2d: float = 0.0ο
Minimum screen space radius (in pixels) below which Gaussians are ignored after projection.
Default:
0.0
- near_plane: float = 0.01ο
Near plane clipping distance when rendering the Gaussians.
Default:
0.01
- opacity_reg: float = 0.0ο
Weight for opacity regularization loss \(L_{opacity} = \frac{1}{N} \sum_i |opacity_i|\).
If set to a value greater than 0, this will encourage the opacities of the Gaussians to be small.
Default:
0.0(no opacity regularization).
- optimize_camera_poses: bool = Trueο
If set to
True, optimize camera poses during reconstruction. This can help improve the quality of the reconstruction if the initial poses are not accurate.Default:
True
- pose_opt_init_std: float = 0.0001ο
Standard deviation for the normal distribution used to initialize the embeddings for camera pose optimization.
Default:
1e-4
- pose_opt_lr: float = 1e-05ο
Learning rate for camera pose optimization.
Default:
1e-5
- pose_opt_lr_decay: float = 1.0ο
Learning rate decay factor for camera pose optimization (will decay to this fraction of initial lr).
Default:
1.0(no decay).
- pose_opt_reg: float = 1e-06ο
Weight for regularization of camera pose optimization. This encourages small changes to the initial camera poses.
The pose regularization loss is defined as \(L_{pose}\) = frac{1}{M} sum_j ||Delta R_j||^2 + ||Delta t_j||^2`, i.e. the Frobenius norm of the change in rotation and translation for each of the
Mcamera poses in the dataset.Default:
1e-6
- pose_opt_start_epoch: int = 0ο
At which epoch to start optimizing camera poses.
Default:
0(start from beginning of optimization).
- pose_opt_stop_epoch: int = 200ο
At which epoch to stop optimizing camera poses.
Default:
max_epochs(optimize poses for the entire duration of optimization).
- random_bkgd: bool = Falseο
Whether to render images with the radiance field against a background of random values during optimization. This discourages the model from using transparency to minimize loss.
Default:
False
- refine_every_epoch: float = 0.65ο
How often to refine Gaussians during optimization, in terms of epochs. For example, a value of 0.65 means refinement occurs approximately every 0.65 epochs.
Default:
0.65
- refine_start_epoch: int = 3ο
At which epoch to start refining the Gaussians by inserting and deleting Gaussians based on their contribution to the optimization. e.g. If this value is 3, the first refinement will occur at the start of epoch 3.
Default:
3
- refine_stop_epoch: int = 100ο
At which epoch to stop refining the Gaussians by inserting and deleting Gaussians based on their contribution to the optimization.
Default:
100
- remove_gaussians_outside_scene_bbox: bool = Falseο
If set to
True, then Gaussians that fall outside the scene bounding box will be removed during refinement.Default:
False
- save_at_percent: List[int]ο
Percentage of the total optimization epochs at which to save model checkpoints.
For example, if save_at_percent is set to [50, 100] and max_epochs is set to 200, then checkpoints will be saved after 100 and 200 epochs.
Default: [20, 100]
- scale_reg: float = 0.0ο
Weight for scale regularization loss \(L_{scale} = \frac{1}{N} \sum_i |scale_i|\).
If set to a value greater than 0, this will encourage the scales of the Gaussians to be small.
Default:
0.0(no scale regularization).
- seed: int = 42ο
A random seed for reproducibility.
Default: 42 (the meaning of life, the universe, and everything).
- sh_degree: int = 3ο
Maximum degree of spherical harmonics to use for each Gaussianβs view-dependent color. Higher degrees allow for more complex view-dependent effects, but increase memory usage and computation time.
Default:
3
- ssim_lambda: float = 0.2ο
Weight for SSIM loss. Reconstruction aims to minimize the Structural Similarity Index Measure (SSIM) between rendered images with the radiance field and ground truth images. This weight applies to the SSIM loss term.
Default:
0.2
- tile_size: int = 16ο
Tile size (in pixels) to use when rendering the Gaussians. You should generally leave this at the default value unless you have a specific reason to change it.
Default:
16
- class fvdb_reality_capture.radiance_fields.GaussianSplatReconstruction(model: GaussianSplat3d, sfm_scene: SfmScene, optimizer: BaseGaussianSplatOptimizer, config: GaussianSplatReconstructionConfig, train_indices: ndarray, val_indices: ndarray, pose_adjust_model: CameraPoseAdjustment | None, pose_adjust_optimizer: Adam | None, pose_adjust_scheduler: ExponentialLR | None, writer: GaussianSplatReconstructionBaseWriter, start_step: int, viz_scene: Scene | None, log_interval_steps: int, viz_update_interval_epochs: float, _private: object | None = None)[source]ο
Engine for reconstructing a Gaussian splat radiance field from posed images in an
SfmScene.This class implements the reconstruction algorithm using a
fvdb.GaussianSplat3dmodel and a differentiable rendering pipeline.The reconstruction process optimizes the parameters of the Gaussian splats to minimize the difference between rendered images and the input images. The optimization process can be configured using a
GaussianSplatReconstructionConfiginstance, and the underlyingfvdb.GaussianSplat3dmodel can be customized as well.The reconstruction can also optionally optimize camera poses if they are not accurate, using a simple pose adjustment model which stores a per-camera embedding which is decoded into a small change in rotation and translation for each camera.
To create a
GaussianSplatReconstructioninstance, use thefrom_sfm_scene()class method, which initializes the model and optimizer from anSfmSceneand aGaussianSplatReconstructionConfig.You can configure logging and checkpointing during optimization process using an instance of
GaussianSplatReconstructionBaseWriter. By default, this class uses aGaussianSplatReconstructionWriterwhich logs metrics, images, and checkpoints to a directory.You can also visualize the optimization process using an optional
fvdb.viz.Sceneinstance, which can display the current state of the Gaussian splat radiance field interactively in a web browser or notebook.The reconstruction process is started by calling the
reconstruct()method, which runs the optimization loop.To get the reconstructed model, use the
model()attribute, which is afvdb.GaussianSplat3dinstance.You can also get a dictionary of metadata about the reconstruction using the
reconstruction_metadata()attribute. This metadata is useful for downstream tasks such as extracting meshes or exporting to USDZ.The state of the reconstruction can be saved and loaded using the
state_dict()andfrom_state_dict()methods. These methods allow you to pause and resume reconstructions from checkpoints.- property config: GaussianSplatReconstructionConfigο
Get the configuration object for the current reconstruction. See
GaussianSplatReconstructionConfigfor details.- Returns:
config (GaussianSplatReconstructionConfig) β The configuration object containing all parameters for the reconstruction.
- eval(show_progress: bool = True, log_tag: str = 'eval') None[source]ο
Evaluate the quality of the Gaussian Splat radiance field on the validation dataset.
This method evaluates the model by rendering images from the Gaussian Splat radiance field and computing various image quality metrics including PSNR, SSIM, and LPIPS. It also saves the rendered images and ground truth images to the log writer for visualization.
- Parameters:
show_progress (bool) β Whether to display a progress bar during evaluation.
log_tag (str) β Tag to use for logging metrics and images. Data logged will use this tag as a prefix. For metrics, this will be
"{log_tag}/metric_name". For images, this will be"{log_tag}/predicted_imageXXXX.jpg"and"{log_tag}/ground_truth_imageXXXX.jpg".
- classmethod from_sfm_scene(sfm_scene: ~fvdb_reality_capture.sfm_scene.sfm_scene.SfmScene, writer: ~fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionBaseWriter = <fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionWriter object>, viz_scene: ~fvdb.viz._scene.Scene | None = None, config: ~fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction.GaussianSplatReconstructionConfig = GaussianSplatReconstructionConfig(seed=42, max_epochs=200, max_steps=None, eval_at_percent=[10, 20, 30, 40, 50, 75, 100], save_at_percent=[20, 100], batch_size=1, crops_per_image=1, sh_degree=3, increase_sh_degree_every_epoch=5, initial_opacity=0.1, initial_covariance_scale=1.0, ssim_lambda=0.2, lpips_net='alex', opacity_reg=0.0, scale_reg=0.0, random_bkgd=False, refine_start_epoch=3, refine_stop_epoch=100, refine_every_epoch=0.65, ignore_masks=False, remove_gaussians_outside_scene_bbox=False, optimize_camera_poses=True, pose_opt_lr=1e-05, pose_opt_reg=1e-06, pose_opt_lr_decay=1.0, pose_opt_start_epoch=0, pose_opt_stop_epoch=200, pose_opt_init_std=0.0001, near_plane=0.01, far_plane=10000000000.0, min_radius_2d=0.0, eps_2d=0.3, antialias=False, tile_size=16), optimizer_config: ~fvdb_reality_capture.radiance_fields.gaussian_splat_optimizer.GaussianSplatOptimizerConfig = GaussianSplatOptimizerConfig(max_gaussians=-1, insertion_grad_2d_threshold_mode=<InsertionGrad2dThresholdMode.CONSTANT: 'constant'>, deletion_opacity_threshold=0.005, deletion_scale_3d_threshold=0.1, deletion_scale_2d_threshold=0.15, insertion_grad_2d_threshold=0.0002, insertion_scale_3d_threshold=0.01, insertion_scale_2d_threshold=0.05, opacity_updates_use_revised_formulation=False, insertion_split_factor=2, insertion_duplication_factor=2, reset_opacities_every_n_refinements=30, use_scales_for_deletion_after_n_refinements=30, use_screen_space_scales_for_refinement_until=0, spatial_scale_mode=<SpatialScaleMode.MEDIAN_CAMERA_DEPTH: 'median_camera_depth'>, spatial_scale_multiplier=1.1, means_lr=0.00016, log_scales_lr=0.005, quats_lr=0.001, logit_opacities_lr=0.05, sh0_lr=0.0025, shN_lr=0.000125), use_every_n_as_val: int = -1, viz_update_interval_epochs: float = 10, log_interval_steps: int = 10, device: str | ~torch.device = 'cuda')[source]ο
Create a
GaussianSplatReconstructioninstance from anSfmScene, used to reconstruct a 3D Gaussian Splat radiance field from posed images. The reconstruction process and optimizer can be configured using theconfig(seeGaussianSplatReconstructionConfig) andoptimizer_config(seeGaussianSplatOptimizerConfig) parameters, though the defaults should produce acceptable results.You can also configure logging and checkpointing during the reconstruction process using an instance of
GaussianSplatReconstructionBaseWriter. By default, this class uses aGaussianSplatReconstructionWriterwhich logs metrics, images, and checkpoints to a directory.You can interactively visualize the state of the current reconstruction using an optional
fvdb.viz.Sceneinstance, which can display the current Gaussian splat radiance field in a web browser or notebook.- Parameters:
sfm_scene (SfmScene) β The Structure-from-Motion scene containing images and camera poses.
config (GaussianSplatReconstructionConfig) β Configuration for the reconstruction process.
optimizer_config (GaussianSplatOptimizerConfig) β Configuration for the optimizer.
writer (GaussianSplatReconstructionBaseWriter) β Writer instance to handle logging metrics, saving images, checkpoints, PLY, files, and other results.
viz_scene (Scene | None) β Optional
fvdb.viz.Sceneinstance for visualizing optimization progress. If None, no visualization is performed.use_every_n_as_val (int) β Use every n-th image as a validation image. Default of
-1means no validation images are used.viz_update_interval_epochs (float) β Interval in epochs at which to update the visualization if
viz_sceneis not None. An epoch is one full pass through the dataset.log_interval_steps (int) β Interval (in steps) to log metrics to the
writer.device (str | torch.device) β Device to run the reconstruction on.
- Returns:
gaussian_splat_reconstruction (GaussianSplatReconstruction) β An
GaussianSplatReconstructioninstance ready to reconstruct the scene.
- classmethod from_state_dict(state_dict: dict[str, ~typing.Any], override_sfm_scene: ~fvdb_reality_capture.sfm_scene.sfm_scene.SfmScene | None = None, override_use_every_n_as_val: int | None = None, writer: ~fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionBaseWriter = <fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionWriter object>, viz_scene: ~fvdb.viz._scene.Scene | None = None, viz_update_interval_epochs: float = 1.0, log_interval_steps: int = 10, device: str | ~torch.device = 'cuda')[source]ο
Load a
GaussianSplatReconstructioninstance from a state dictionary (extracted with thestate_dict()method). This will restore the model, optimizer, and configuration. You can optionally override theSfmSceneand the train/validation split (via theoverride_use_every_n_as_valparameter). This is useful for resuming reconstruction on a different dataset or with a different train/validation split.- Parameters:
state_dict (dict) β State dictionary containing the model, optimizer, and configuration state. Generated by the
state_dict()method.override_sfm_scene (SfmScene | None) β Optional
SfmSceneto use instead of the one in the state_dict.override_use_every_n_as_val (int | None) β If specified, will override the train/validation split using this value. Default of None means to use the train/validation split from the state_dict.
writer (GaussianSplatReconstructionBaseWriter) β
GaussianSplatReconstructionBaseWriterinstance to handle logging metrics, saving images, checkpoints, PLY, files, and other results.viz_scene (Scene | None) β Optional
fvdb.viz.Sceneinstance for visualizing optimization progress. IfNone, no visualization is performed.viz_update_interval_epochs (float) β Interval in epochs at which to update the visualization if
viz_sceneis not None. An epoch is one full pass through the dataset.log_interval_steps (int) β Interval in steps to log metrics to the
writer.device (str | torch.device) β Device to run the reconstruction on.
- property model: GaussianSplat3dο
Get the Gaussian Splatting model being optimized.
- Returns:
model (GaussianSplat3d) β The
fvdb.GaussianSplat3dinstance being optimized.
- optimize(show_progress: bool = True, log_tag: str = 'reconstruct') None[source]ο
Run the reconstruction optimization loop to optimize reconstruct a Gaussian Splatting radiance field from a set of posed images.
The optimization loop iterates over the images and poses in the dataset, computes losses, updates the Gaussianβs parameters, and logs metrics at each step. It also handles scheduling refinement steps at specified intervals.
- Parameters:
show_progress (bool) β Whether to display a progress bar during reconstruction.
log_tag (str) β Tag to use for logging metrics (e.g.,
"train"). Data logged will use this tag as a prefix. For metrics, this will be"{log_tag}/metric_name". For checkpoints, this will be"{log_tag}_ckpt.pt". For PLY files, this will be"{log_tag}_ckpt.ply".
Note
When calling evaluation from the reconstruction loop, the log_tag for evaluation will be
log_tag+"_eval".
- property optimizer: BaseGaussianSplatOptimizerο
Get the optimizer used for optimizing the Gaussian Splat radiance fieldβs parameters.
- Returns:
optimizer (BaseGaussianSplatOptimizer) β The optimizer instance. See
GaussianSplatOptimizerfor details.
- property pose_adjust_model: CameraPoseAdjustment | Noneο
Get the camera pose adjustment model used for optimizing camera poses during reconstruction.
- Returns:
pose_adjust_model (CameraPoseAdjustment | None) β The pose adjustment model instance, or None if not used.
- property pose_adjust_optimizer: Adam | Noneο
Get the optimizer used for adjusting camera poses during reconstruction.
- Returns:
pose_adjust_optimizer (torch.optim.Optimizer | None) β The pose adjustment optimizer instance, or
Noneif not used.
- property pose_adjust_scheduler: ExponentialLR | Noneο
Get the learning rate scheduler used for adjusting camera poses during reconstruction.
- Returns:
pose_adjust_scheduler (torch.optim.lr_scheduler.ExponentialLR | None) β The pose adjustment scheduler instance, or
Noneif not used.
- property reconstruction_metadata: dict[str, Tensor | float | int | str]ο
Get metadata about the reconstruction, including camera parameters and Gaussian rendering parameters.
This metadata is useful for downstream tasks such as extracting meshes or point clouds. It includes:
normalization_transform: The transformation matrix used to normalize the scene.camera_to_world_matrices: The optimized camera-to-world matrices for the images used during reconstruction.projection_matrices: The projection matrices for the images used during reconstruction.image_sizes: The sizes of the images used during reconstruction.median_depths: The median depth values (distance from camera to scene) for each image used during reconstruction.eps2d: The 2D epsilon value used when rendering the Gaussian splat radiance field.near_plane: The near plane distance used when rendering the Gaussian splat radiance field.far_plane: The far plane distance used when rendering the Gaussian splat radiance field.min_radius_2d: The minimum 2D radius below which splats are not rendered.antialias: Whether anti-aliasing is enabled (1) or not (0).tile_size: The tile size used to render the Gaussian splat radiance field.
- Returns:
metadata (dict[str, torch.Tensor | float | int | str]) β A dictionary containing metadata about the reconstruction.
- save_ply(path: str | Path) None[source]ο
Save the current Gaussian Splatting model to a PLY file.
- Parameters:
path (str | Path) β The file path where the PLY file will be saved.
- save_usdz(path: str | Path) None[source]ο
Save the current Gaussian Splatting model to a USDZ file.
- Parameters:
path (str | Path) β The file path where the USDZ file will be saved.
- state_dict() dict[str, Any][source]ο
Get the state dictionary of the current optimization state, including model, optimizer, and configuration parameters.
The state dictionary can be used to save and resume optimization from checkpoints. Its keys include:
"magic": A magic string to identify the checkpoint type."version": The version of the checkpoint format."step": The current global optimization step."config": The configuration parameters used for optimization."sfm_scene": The state dictionary of the SfM scene."model": The state dictionary of the Gaussian Splatting model."optimizer": The state dictionary of the optimizer."train_indices": The indices of the training images in the dataset."val_indices": The indices of the validation images in the dataset."num_training_poses": The number of training poses if pose adjustment is used, otherwise None."pose_adjust_model": The state dictionary of the camera pose adjustment model if used, otherwise None."pose_adjust_optimizer": The state dictionary of the pose adjustment optimizer if used, otherwise None."pose_adjust_scheduler": The state dictionary of the pose adjustment scheduler if used, otherwise None.
- Returns:
state_dict (dict[str, Any]) β A dictionary containing the state of the optimization process.
- property training_dataset: SfmDatasetο
Get the training dataset used for training the Gaussian Splatting model.
- Returns:
training_dataset (SfmDataset) β The training dataset instance.
- property validation_dataset: SfmDatasetο
Get the validation dataset used for evaluating the Gaussian Splatting model.
- Returns:
validation_dataset (SfmDataset) β The validation dataset instance.
- version = '0.1.0'ο
Gaussian Splat Optimizersο
- class fvdb_reality_capture.radiance_fields.BaseGaussianSplatOptimizer[source]ο
Base class for optimizers that reconstruct a scene using Gaussian Splat radiance fields over a collection of posed images.
This class defines the interface for optimizers that optimize the parameters of a fvdb.GaussianSplat3d model, and provides utilities to refine the model by inserting and deleting Gaussians based on their contribution to the optimization.
Currently, the only concrete implementation is
GaussianSplatOptimizer, which implements the algorithm in the original Gaussian Splatting paper.- abstractmethod filter_gaussians(indices_or_mask: Tensor)[source]ο
Abstract method to filter the Gaussians in the model based on the given indices or mask, and update the corresponding optimizer state accordingly. This can be used to delete, shuffle, or duplicate the Gaussians during optimization.
- Parameters:
indices_or_mask (torch.Tensor) β A 1D tensor of indices or a boolean mask indicating which Gaussians to keep.
- abstractmethod classmethod from_state_dict(model: GaussianSplat3d, state_dict: dict[str, Any]) BaseGaussianSplatOptimizer[source]ο
Abstract method to create a new
BaseGaussianSplatOptimizerinstance from a model and a state dict (obtained fromstate_dict()).- Parameters:
model (GaussianSplat3d) β The GaussianSplat3d model to optimize.
state_dict (dict[str, Any]) β A state dict previously obtained from
state_dict().
- Returns:
optimizer (BaseGaussianSplatOptimizer) β A new
BaseGaussianSplatOptimizerinstance.
- abstractmethod refine(zero_gradients: bool = True) dict[str, Any][source]ο
Abstract method to refine the model by inserting and deleting Gaussians based on their contribution to the optimization.
- Parameters:
zero_gradients (bool) β If True, zero the gradients of all tensors being optimized after refining.
- Returns:
refinement_stat (dict[str, Any]) β A dictionary containing statistics about the refinement step.
- abstractmethod reset_learning_rates_and_decay(batch_size: int, expected_steps: int) None[source]ο
Abstract method to set the learning rates and learning rate decay factor based on the batch size and the expected number of optimization steps (times
step()is called).This is useful if you want to change the batch size or expected number of steps after creating the optimizer.
- Parameters:
batch_size (int) β The batch size used for training. This is used to scale the learning rates.
expected_steps (int) β The expected number of optimization steps.
- abstractmethod state_dict() dict[str, Any][source]ο
Abstract method to return a serializable state dict for the optimizer.
- Returns:
state_dict (dict[str, Any]) β A state dict containing the state of the optimizer.
- class fvdb_reality_capture.radiance_fields.GaussianSplatOptimizerConfig(max_gaussians: int = -1, insertion_grad_2d_threshold_mode: InsertionGrad2dThresholdMode = InsertionGrad2dThresholdMode.CONSTANT, deletion_opacity_threshold: float = 0.005, deletion_scale_3d_threshold: float = 0.1, deletion_scale_2d_threshold: float = 0.15, insertion_grad_2d_threshold: float = 0.0002, insertion_scale_3d_threshold: float = 0.01, insertion_scale_2d_threshold: float = 0.05, opacity_updates_use_revised_formulation: bool = False, insertion_split_factor: int = 2, insertion_duplication_factor: int = 2, reset_opacities_every_n_refinements: int = 30, use_scales_for_deletion_after_n_refinements: int = 30, use_screen_space_scales_for_refinement_until: int = 0, spatial_scale_mode: SpatialScaleMode = SpatialScaleMode.MEDIAN_CAMERA_DEPTH, spatial_scale_multiplier: float = 1.1, means_lr: float = 0.00016, log_scales_lr: float = 0.005, quats_lr: float = 0.001, logit_opacities_lr: float = 0.05, sh0_lr: float = 0.0025, shN_lr: float = 0.000125)[source]ο
Parameters for configuring the
GaussianSplatOptimizer.- deletion_opacity_threshold: float = 0.005ο
If a Gaussianβs opacity drops below this value, delete it during refinement.
- deletion_scale_2d_threshold: float = 0.15ο
If the maximum projected size of a Gaussian between refinement steps exceeds this value then delete it during refinement.
Note
This parameter is only used if set
use_screen_space_scales_for_refinement_untilis greater than 0.
- deletion_scale_3d_threshold: float = 0.1ο
If a Gaussianβs 3d scale is above this value, then delete it during refinement.
- insertion_duplication_factor: int = 2ο
When duplicating Gaussians during insertion, this value specifies the total number of copies (including the original) that will result for each selected source Gaussian. The original is kept, and
insertion_duplication_factor - 1new identical copies are added. e.g. if this value is 3, each duplicated Gaussian becomes 3 copies of itself (the original plus 2 new). This value must be >= 2.
- insertion_grad_2d_threshold: float = 0.0002ο
Threshold value on the accumulated norm of projected mean gradients between refinement steps to determine whether a Gaussian has high error and is a candidate for duplication or splitting.
Note
If
insertion_grad_2d_threshold_modeisInsertionGrad2dThresholdMode.CONSTANT, then this value is used directly as the threshold, and must be positive.Note
If
insertion_grad_2d_threshold_modeisInsertionGrad2dThresholdMode.PERCENTILE_FIRST_ITERATIONorInsertionGrad2dThresholdMode.PERCENTILE_EVERY_ITERATION, then this value must be in the range(0.0, 1.0)(exclusive).
- insertion_grad_2d_threshold_mode: InsertionGrad2dThresholdMode = 'constant'ο
Whether to use a fixed threshold for
insertion_grad_2d_threshold(constant), a value computed as a percentile of the distribution of screen space mean gradients on the first iteration, or a percentile value computed at each refinement step.See
InsertionGrad2dThresholdModefor details on the available modes.
- insertion_scale_2d_threshold: float = 0.05ο
Split high-error (determined by
insertion_grad_2d_threshold) Gaussians whose maximum projected size exceeds this value. These Gaussians are too large to capture the detail in the region they cover, so we split them to allow them to specialize.Note
This parameter is only used if set
use_screen_space_scales_for_refinement_untilis greater than 0.
- insertion_scale_3d_threshold: float = 0.01ο
Duplicate high-error (determined by
insertion_grad_2d_threshold) Gaussians whose 3d scale is below this value. These Gaussians are too small to capture the detail in the region they cover, so we duplicate them to allow them to specialize.
- insertion_split_factor: int = 2ο
When splitting Gaussians during insertion, this value specifies the total number of new Gaussians that will replace each selected source Gaussian. The original is removed and replaced by
insertion_split_factornew Gaussians. e.g. if this value is 2, each split Gaussian is replaced by 2 new smaller Gaussians (the original is removed). This value must be >= 2.
- log_scales_lr: float = 0.005ο
Learning rate for the log scales of the Gaussians.
- logit_opacities_lr: float = 0.05ο
Learning rate for the logit opacities of the Gaussians.
- max_gaussians: int = -1ο
The maximum number of Gaussians to allow in the model. If -1, no limit.
- means_lr: float = 0.00016ο
Learning rate for the means of the Gaussians. This is also scaled by the spatial scale computed from the scene.
See
spatial_scale_modefor details on how the spatial scale is computed.
- opacity_updates_use_revised_formulation: bool = Falseο
When splitting Gaussians, whether to update the opacities of the new Gaussians using the revised formulation from *βRevising Densification in Gaussian Splattingβ*. This removes a bias which weighs newly split Gaussians contribution to the image more heavily than older Gaussians.
- quats_lr: float = 0.001ο
Learning rate for the quaternions of the Gaussians.
- reset_opacities_every_n_refinements: int = 30ο
If set to a positive value, then clamp all opacities to be at most twice the value of
deletion_opacity_thresholdevery timeGaussianSplatOptimizer.refine()is calledreset_opacities_every_n_refinementstimes. This prevents Gaussians from becoming completely occluded by denser Gaussians and thus unable to be optimized.
- sh0_lr: float = 0.0025ο
Learning rate for the diffuse spherical harmonics (order 0).
- shN_lr: float = 0.000125ο
Learning rate for the specular spherical harmonics (order > 0).
- spatial_scale_mode: SpatialScaleMode = 'median_camera_depth'ο
How to interpret 3D optimization scale thresholds and learning rates (i.e.
insertion_scale_3d_threshold,deletion_scale_3d_threshold, andmeans_lr). These are scaled by a spatial scale computed from the scene, so they are relative to the size of the scene being optimized.See
SpatialScaleModefor details on the available modes.
- spatial_scale_multiplier: float = 1.1ο
Multiplier to apply to the spatial scale computed from the scene to get a slightly larger scale.
- use_scales_for_deletion_after_n_refinements: int = 30ο
If set to a positive value, then after
use_scales_for_deletion_after_n_refinementscalls toGaussianSplatOptimizer.refine(), use the 3D scales of the Gaussians to determine whether to delete them. This will delete Gaussians that have grown too large in 3D space and are not contributing to the optimization.By default, this value matches
reset_opacities_every_n_refinementsso that both behaviors are enabled at the same time.
- use_screen_space_scales_for_refinement_until: int = 0ο
If set to a positive value, then use threshold the maximum projected size of Gaussians between refinement steps to decide whether to split or delete Gaussians that are too large. This behavior is enabled until
GaussianSplatOptimizer.refine()has been calleduse_screen_space_scales_for_refinement_untiltimes. After that, only 3D scales are used for refinement.
- class fvdb_reality_capture.radiance_fields.SpatialScaleMode(*values)[source]ο
How to interpret 3D optimization scale thresholds (insertion_scale_3d_threshold, deletion_scale_3d_threshold) and learning rates. These thresholds specified in a unitless space, and are subsequently multipled by a spatial scale computed from the scene being optimized. There are several heuristics for computing this spatial scale, specified by the config:
- ABSOLUTE_UNITS = 'absolute_units'ο
Use the thresholds and learning rates as-is, in absolute world units (e.g. meters).
- MAX_CAMERA_DEPTH = 'max_camera_depth'ο
Compute the maximum depth of SfmPoints across all cameras in the scene, and use that as the spatial scale.
- MAX_CAMERA_TO_CENTROID = 'max_camera_diagonal'ο
Compute the maximum distance from any camera to the centroid of all camera positions (good for orbits around an object).
- MEDIAN_CAMERA_DEPTH = 'median_camera_depth'ο
Compute the median depth of SfmPoints across of all cameras in the scene, and use that as the spatial scale.
- SCENE_DIAGONAL_PERCENTILE = 'relative_to_scene_diagonal'ο
Compute the axis-aligned bounding box of all points within the 5th to 95th percentile range along each axis, and use the given percentile of the length of the diagonal of this box as the spatial scale.
- class fvdb_reality_capture.radiance_fields.InsertionGrad2dThresholdMode(*values)[source]ο
The GaussianSplatOptimizer uses a threshold on the accumulated norm of 2D mean gradients to use during refinement.
There are several modes for computing this threshold, specified by the config. These modes let you adapt the refinement behavior to the statistics of the gradients during training.
- CONSTANT = 'constant'ο
Always use the fixed threshold specified by
self._config.insertion_grad_2d_threshold. This mode with a default value (0.0002) will produce okay results, but may not be optimal for all types of captures.
- PERCENTILE_EVERY_ITERATION = 'percentile_every_iteration'ο
During every refinement step, set the threshold to the given percentile of the gradients. For highly detailed scenes, this mode may be useful to adaptively insert more Gaussians as the model learns more detail. This generally produces many more Gaussians and more detailed results at the cost of more memory and compute.
- PERCENTILE_FIRST_ITERATION = 'percentile_first_iteration'ο
During the first refinement step, set the threshold to the given percentile of the gradients. For all subsequent refinement steps, use that fixed threshold. Using this mode will have similar behavior to
CONSTANTbut will adapt to the scale of the gradients which can be more robust across different capture types.
- class fvdb_reality_capture.radiance_fields.GaussianSplatOptimizer(model: GaussianSplat3d, optimizer: Adam, config: GaussianSplatOptimizerConfig, spatial_scale: float, refine_count: int, step_count: int, _private: Any = None)[source]ο
Optimizer for reconstructing a scene using Gaussian Splat radiance fields over a collection of posed images.
The optimizer uses an Adam optimizer to optimize the parameters of a
fvdb.GaussianSplat3dmodel, and provides utilities to refine the model by inserting and deleting Gaussians based on their contribution to the optimization. The tools here mostly follow the algorithm in the original Gaussian Splatting paper (https://arxiv.org/abs/2308.04079).Note
You should not call the constructor of this class directly. Instead use
from_model_and_config()orfrom_state_dict().- filter_gaussians(indices_or_mask: Tensor)[source]ο
Filter the Gaussians in the model to only those specified by the given indices or mask and update the optimizer state accordingly. This can be used to delete, shuffle, or duplicate the Gaussians during optimization.
- Parameters:
indices_or_mask (torch.Tensor) β A 1D tensor of indices or a boolean mask indicating which Gaussians to keep.
- classmethod from_model_and_scene(model: ~fvdb.gaussian_splatting.GaussianSplat3d, sfm_scene: ~fvdb_reality_capture.sfm_scene.sfm_scene.SfmScene, config: ~fvdb_reality_capture.radiance_fields.gaussian_splat_optimizer.GaussianSplatOptimizerConfig = GaussianSplatOptimizerConfig(max_gaussians=-1, insertion_grad_2d_threshold_mode=<InsertionGrad2dThresholdMode.CONSTANT: 'constant'>, deletion_opacity_threshold=0.005, deletion_scale_3d_threshold=0.1, deletion_scale_2d_threshold=0.15, insertion_grad_2d_threshold=0.0002, insertion_scale_3d_threshold=0.01, insertion_scale_2d_threshold=0.05, opacity_updates_use_revised_formulation=False, insertion_split_factor=2, insertion_duplication_factor=2, reset_opacities_every_n_refinements=30, use_scales_for_deletion_after_n_refinements=30, use_screen_space_scales_for_refinement_until=0, spatial_scale_mode=<SpatialScaleMode.MEDIAN_CAMERA_DEPTH: 'median_camera_depth'>, spatial_scale_multiplier=1.1, means_lr=0.00016, log_scales_lr=0.005, quats_lr=0.001, logit_opacities_lr=0.05, sh0_lr=0.0025, shN_lr=0.000125)) GaussianSplatOptimizer[source]ο
Create a new
GaussianSplatOptimizerinstance from a model and config.- Parameters:
model (GaussianSplat3d) β The
GaussianSplat3dmodel to optimize.config (GaussianSplatOptimizerConfig) β Configuration options for the optimizer.
means_lr_scale (float) β A scale factor to apply to the means learning rate.
means_lr_decay_exponent (float) β The exponent used for decaying the means learning rate.
batch_size (int) β The batch size used for training. This is used to scale the learning rates.
- Returns:
GaussianSplatOptimizer β A new
GaussianSplatOptimizerinstance.
- classmethod from_state_dict(model: GaussianSplat3d, state_dict: dict[str, Any]) GaussianSplatOptimizer[source]ο
Create a new
GaussianSplatOptimizerinstance from a model and a state dict.- Parameters:
model (GaussianSplat3d) β The
GaussianSplat3dmodel to optimize.state_dict (dict[str, Any]) β A state dict previously obtained from
state_dict().
- Returns:
optimizer (GaussianSplatOptimizer) β A new
GaussianSplatOptimizerinstance.
- refine(zero_gradients: bool = True) dict[str, int][source]ο
Perform a step of refinement by inserting Gaussians where more detail is needed and deleting Gaussians that are not contributing to the optimization. Refinement happens via three mechanisms:
Duplication: Make
insertion_duplication_factorcopies of a Gaussian.We duplicate a Gaussian if its 3D size is below some threshold and the gradient of its projected means over time is high on average. Intuitively, this means the Gaussian is not taking up a lot of space in the scene, but consistently wants to change positions when viewed from different cameras. Likely this Gaussian is stuck trying to represent too much of the scene and should be split into multiple copies.
Splitting: Split a Gaussian into
insertion_split_factorsmaller ones.We split a Gaussian when its 3D size exceeds a threshold value and the gradient of its projected mean over time is high on average. In this case, a Gaussian is likely too large for the amount of detail it represents and should be split to capture detail in the image.
Deletion: Removing a Gaussian from the scene.
We delete a Gaussian if its opacity falls below a threshold since it is not contributing much to rendered images.
- Parameters:
zero_gradients (bool) β If
True, zero the gradients after refinement.- Returns:
refine_stats (dict[str, int]) β A dictionary containing statistics about the refinement step with the keys:
"num_duplicated": The number of Gaussians that were duplicated."num_split": The number of Gaussians that were split."num_deleted": The number of Gaussians that were deleted.
- reset_learning_rates_and_decay(batch_size: int, expected_steps: int)[source]ο
Set the learning rates and learning rate decay factor based on the batch size and the expected number of optimization steps (i.e. the number of times
step()is called).This is useful if you want to change the batch size or expected number of steps after creating the optimizer.
- Parameters:
batch_size (int) β The batch size used for training. This is used to scale the learning rates.
expected_steps (int) β The expected number of optimization steps.
- state_dict() dict[str, Any][source]ο
Return a serializable state dict for the optimizer.
- Returns:
state_dict (dict[str, Any]) β A state dict containing the state of the optimizer.
Gaussian Splat Logging and Checkpointingο
- class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionBaseWriter[source]ο
Base class for logging and saving data during Gaussian splat reconstruction.
This class defines the interface for logging metrics, saving images, checkpoints, and PLY files during Gaussian splat reconstruction. Concrete implementations must implement all abstract methods.
To implement custom logging/saving behavior, subclass this class and implement the abstract methods.
- abstractmethod log_metric(global_step: int, metric_name: str, metric_value: Tensor | ndarray | int | float | integer | floating) None[source]ο
Abstract method to log a scalar metric value. This function is called during reconstruction to log metrics such as loss, PSNR, etc.
- Parameters:
global_step (int) β The global step at which the metric is being logged.
metric_name (str) β The name of the metric being logged.
metric_value (NumericScalar) β The value of the metric being logged. Must be a scalar type (int, float, np.number, torch.number, etc.).
- abstractmethod save_checkpoint(global_step: int, checkpoint_name: str, checkpoint: dict[str, Any]) None[source]ο
Abstract method to save a checkpoint. This function is called during reconstruction to save model checkpoints.
- Parameters:
global_step (int) β The global step at which the checkpoint is being saved.
checkpoint_name (str) β The name of the checkpoint being saved.
checkpoint (dict[str, Any]) β The checkpoint data to be saved.
- abstractmethod save_image(global_step: int, image_name: str, image: Tensor) None[source]ο
Abstract method to save an image. This function is called during reconstruction to save images such as rendered outputs or intermediate results.
- Parameters:
global_step (int) β The global step at which the image is being saved.
image_name (str) β The name of the image being saved.
image (torch.Tensor) β The image tensor to be saved.
- abstractmethod save_ply(global_step: int, ply_name: str, model: GaussianSplat3d, metadata: dict[str, Any] | None = None) None[source]ο
Abstract method to save a Gaussian splat model to a PLY file. This function is called during reconstruction to save the current state of the model.
- Parameters:
global_step (int) β The global step at which the PLY file is being saved.
ply_name (str) β The name of the PLY file being saved.
model (GaussianSplat3d) β The Gaussian splat model to be saved.
metadata (dict[str, Any] | None) β Optional metadata to be saved with the PLY file.
- class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionWriterConfig(save_images: bool = False, save_checkpoints: bool = True, save_plys: bool = True, save_metrics: bool = True, metrics_file_buffer_size: int = 8388608, use_tensorboard: bool = False, save_images_to_tensorboard: bool = False)[source]ο
Parameters for configuring the behavior of a
GaussianSplatReconstructionWriter. Controls what data gets saved to disk, how much buffering to use, and whether to use TensorBoard.- metrics_file_buffer_size: int = 8388608ο
How much buffering (in bytes) to use for metrics file logging. Larger values can improve performance when logging many metrics.
Default is 8 MiB.
- save_checkpoints: bool = Trueο
Whether to save checkpoints to disk. If
False, checkpoints will not be saved to disk.Default is
True.
- save_images: bool = Falseο
Whether to save images to disk. If
False, images will not be saved to disk.Default is
False.
- save_images_to_tensorboard: bool = Falseο
Whether to also save images to TensorBoard if
use_tensorboardisTrue. IfTrue, images will be saved to TensorBoard.Default is
False.
- save_metrics: bool = Trueο
Whether to save metrics to a CSV file. If
False, metrics will not be saved to a CSV file.Default is
True.
- save_plys: bool = Trueο
Whether to save PLY files to disk. If
False, PLY files will not be saved to disk.Default is
True.
- use_tensorboard: bool = Falseο
Whether to use TensorBoard for logging metrics and images. If
True, metrics and images will be logged to TensorBoard.Default is
False.
- class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionWriter(run_name: str | None, save_path: Path | None, exist_ok: bool = False, config: GaussianSplatReconstructionWriterConfig = GaussianSplatReconstructionWriterConfig(save_images=False, save_checkpoints=True, save_plys=True, save_metrics=True, metrics_file_buffer_size=8388608, use_tensorboard=False, save_images_to_tensorboard=False))[source]ο
Class to handle logging and saving data during Gaussian splat reconstruction. This class is responsible for saving, checkpoints, PLY files, images, and metrics. It can also log metrics and images to TensorBoard if requested.
save_path/run_name/ checkpoints/ <step>/ <first_checkpoint>.pth <second_checkpoint>.pth ... <step>/ ... ply/ <step>/ <first_ply>.ply <second_ply>.ply ... <step>/ ... images/ <step>/ <first_image>.png <second_image>.png ... <step>/ ... tensorboard/ events.out.tfevents... metrics_log.csv- log_metric(global_step: int, metric_name: str, metric_value: Tensor | ndarray | int | float | integer | floating) None[source]ο
Log a scalar metric value. This function is called during reconstruction to log metrics such as loss, PSNR, etc.
- Parameters:
global_step (int) β The global step at which the metric is being logged.
metric_name (str) β The name of the metric being logged.
metric_value (NumericScalar) β The value of the metric being logged.
- property log_path: Path | Noneο
Return the path where logged results are being saved, or
Noneif no results are being saved.- Returns:
log_path (pathlib.Path | None) β The path where logged results are being saved, or
Noneif no logged results are being saved.
- property run_name: str | Noneο
Return the name of this reconstruction run, or
Noneif the writer is not saving any data. The name of the run matches the name of the directory where logged results are being saved.- Returns:
str | None β The name of this reconstruction run, or
Noneif the writer is not saving any data.
- save_checkpoint(global_step: int, checkpoint_name: str, checkpoint: dict[str, Any]) None[source]ο
Save a reconstruction checkpoint to disk. This function is called during reconstruction to save model and optimizer state.
- Parameters:
global_step (int) β The global step at which the checkpoint is being saved.
checkpoint_name (str) β The name of the checkpoint file. This will be used as the file name. Must have a
.pthor.ptsuffix.checkpoint (dict[str, Any]) β The checkpoint dictionary to be saved. Typically contains model state, optimizer state, etc.
- save_image(global_step: int, image_name: str, image: Tensor, jpeg_quality: int = 98)[source]ο
Save an image to disk and/or TensorBoard. This function is called during reconstruction to save rendered images, error maps, etc.
- Parameters:
global_step (int) β The global step at which the image is being saved.
image_name (str) β The name of the image being saved. This will be used as the file name. Must have a
.pngor.jpg/.jpegsuffix.image (torch.Tensor) β The image tensor to be saved. Must have shape
(H, W),(H, W, C)or(B, H, W, C)and have a floating point or uint8 dtype.jpeg_quality (int) β Quality of JPEG images if saving as JPEG. Must be between 0 and 100. Default is 98.
- save_ply(global_step: int, ply_name: str, model: GaussianSplat3d, metadata: dict[str, Any] | None = None) None[source]ο
Save the current Gaussian splat model to a PLY file. This function is called during reconstruction to save the reconstructed model at various stages.
- Parameters:
global_step (int) β The global step at which the PLY file is being saved.
ply_name (str) β The name of the PLY file. This will be used as the file name. Must have a
.plysuffix.model (GaussianSplat3d) β The Gaussian splat model to be saved.
metadata (dict[str, Any] | None) β Optional metadata to include in the PLY file (e.g. camera parameters, reconstruction config, etc.).