`fvdb_reality_capture.radiance_fields`

Gaussian Splat Radiance Field Reconstruction

class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionConfig(seed: int = 42, max_epochs: int = 200, max_steps: int | None = None, eval_at_percent: ~typing.List[int] = <factory>, save_at_percent: ~typing.List[int] = <factory>, batch_size: int = 1, crops_per_image: int = 1, sh_degree: int = 3, increase_sh_degree_every_epoch: int = 5, initial_opacity: float = 0.1, initial_covariance_scale: float = 1.0, ssim_lambda: float = 0.2, lpips_net: ~typing.Literal['vgg', 'alex'] = 'alex', opacity_reg: float = 0.0, scale_reg: float = 0.0, random_bkgd: bool = False, refine_start_epoch: int = 3, refine_stop_epoch: int = 100, refine_every_epoch: float = 0.65, ignore_masks: bool = False, remove_gaussians_outside_scene_bbox: bool = False, optimize_camera_poses: bool = True, pose_opt_lr: float = 1e-05, pose_opt_reg: float = 1e-06, pose_opt_lr_decay: float = 1.0, pose_opt_start_epoch: int = 0, pose_opt_stop_epoch: int = 200, pose_opt_init_std: float = 0.0001, near_plane: float = 0.01, far_plane: float = 10000000000.0, min_radius_2d: float = 0.0, eps_2d: float = 0.3, antialias: bool = False, tile_size: int = 16)[source]

Configuration parameters for reconstructing a Gaussian splat radiance field from posed images.

See GaussianSplatReconstruction for details on how these parameters are used.

antialias: bool = False

Whether to use anti-aliasing when rendering the Gaussians.

Default: False

batch_size: int = 1

Batch size for optimization. Each step of optimization will compute losses on batch_size images. Note that learning rates are scaled automatically based on the batch size.

Default: 1

crops_per_image: int = 1

Number of crops to use per image during reconstruction. If you’re using very large images, you can set this to a value greater than 1 to run the forward pass on crops and accumulate gradients. This can help reduce memory usage.

Default: 1 (no cropping, use full images).

eps_2d: float = 0.3

Amount of padding (in pixels) to add to the screen space bounding box of each Gaussian when determining which pixels it affects.

Default: 0.3

eval_at_percent: List[int]

Percentage of the total optimization epochs at which to perform evaluation on the validation set.

For example, if eval_at_percent is set to [10, 50, 100] and max_epochs is set to 200, then evaluation will be performed after 20, 100, and 200 epochs.

Default: [10, 20, 30, 40, 50, 75, 100]

far_plane: float = 10000000000.0

Far plane clipping distance when rendering the Gaussians.

Default: 1e10

ignore_masks: bool = False

If set to True, then ignore any masks in the data and treat all pixels as valid during optimization.

Default: False

increase_sh_degree_every_epoch: int = 5

When reconstructing a Gaussian splat radiance field, we start by only optimizing the diffuse (degree 0) spherical harmonics coefficients per Gaussian, and progressively increase the degree of spherical harmonics used every increase_sh_degree_every_epoch epochs until we reach sh_degree. This helps stabilize optimization in the early stages of optimization.

Default: 5

initial_covariance_scale: float = 1.0

Initial scale of each Gaussian. This controls the initial size of the Gaussians in the scene. Each Gaussian’s covariance matrix will be initialized to a diagonal matrix with this value on the diagonal.

Default: 1.0

initial_opacity: float = 0.1

Initial opacity of each Gaussian. This is the alpha value used when rendering the Gaussians at the start of optimization.

Default: 0.1

lpips_net: Literal['vgg', 'alex'] = 'alex'

During evaluation, we compute the Learned Perceptual Image Patch Similarity (LPIPS) metric as a measure of quality of the reconstruction. This parameter controls which network architecture is used for the LPIPS metric.

Default: "alex" meaning the AlexNet architecture.

max_epochs: int = 200

The maximum number of optimization epochs, i.e., the number of times each image in the dataset will be visited.

An epoch is defined as one full pass through the dataset. If you have a dataset with 100 images and a batch size of 10, then one epoch corresponds to 10 steps.

Default: 200

max_steps: int | None = None

The maximum number of optimization steps. If set, this overrides the number of steps calculated from max_epochs and the dataset size.

You shouldn’t use this parameter unless you have a specific reason to do so.

Default: None

min_radius_2d: float = 0.0

Minimum screen space radius (in pixels) below which Gaussians are ignored after projection.

Default: 0.0

near_plane: float = 0.01

Near plane clipping distance when rendering the Gaussians.

Default: 0.01

opacity_reg: float = 0.0

Weight for opacity regularization loss \(L_{opacity} = \frac{1}{N} \sum_i |opacity_i|\).

If set to a value greater than 0, this will encourage the opacities of the Gaussians to be small.

Default: 0.0 (no opacity regularization).

optimize_camera_poses: bool = True

If set to True, optimize camera poses during reconstruction. This can help improve the quality of the reconstruction if the initial poses are not accurate.

Default: True

pose_opt_init_std: float = 0.0001

Standard deviation for the normal distribution used to initialize the embeddings for camera pose optimization.

Default: 1e-4

pose_opt_lr: float = 1e-05

Learning rate for camera pose optimization.

Default: 1e-5

pose_opt_lr_decay: float = 1.0

Learning rate decay factor for camera pose optimization (will decay to this fraction of initial lr).

Default: 1.0 (no decay).

pose_opt_reg: float = 1e-06

Weight for regularization of camera pose optimization. This encourages small changes to the initial camera poses.

The pose regularization loss is defined as \(L_{pose}\) = frac{1}{M} sum_j ||Delta R_j||^2 + ||Delta t_j||^2`, i.e. the Frobenius norm of the change in rotation and translation for each of the M camera poses in the dataset.

Default: 1e-6

pose_opt_start_epoch: int = 0

At which epoch to start optimizing camera poses.

Default: 0 (start from beginning of optimization).

pose_opt_stop_epoch: int = 200

At which epoch to stop optimizing camera poses.

Default: max_epochs (optimize poses for the entire duration of optimization).

random_bkgd: bool = False

Whether to render images with the radiance field against a background of random values during optimization. This discourages the model from using transparency to minimize loss.

Default: False

refine_every_epoch: float = 0.65

How often to refine Gaussians during optimization, in terms of epochs. For example, a value of 0.65 means refinement occurs approximately every 0.65 epochs.

Default: 0.65

refine_start_epoch: int = 3

At which epoch to start refining the Gaussians by inserting and deleting Gaussians based on their contribution to the optimization. e.g. If this value is 3, the first refinement will occur at the start of epoch 3.

Default: 3

refine_stop_epoch: int = 100

At which epoch to stop refining the Gaussians by inserting and deleting Gaussians based on their contribution to the optimization.

Default: 100

remove_gaussians_outside_scene_bbox: bool = False

If set to True, then Gaussians that fall outside the scene bounding box will be removed during refinement.

Default: False

save_at_percent: List[int]

Percentage of the total optimization epochs at which to save model checkpoints.

For example, if save_at_percent is set to [50, 100] and max_epochs is set to 200, then checkpoints will be saved after 100 and 200 epochs.

Default: [20, 100]

scale_reg: float = 0.0

Weight for scale regularization loss \(L_{scale} = \frac{1}{N} \sum_i |scale_i|\).

If set to a value greater than 0, this will encourage the scales of the Gaussians to be small.

Default: 0.0 (no scale regularization).

seed: int = 42

A random seed for reproducibility.

Default: 42 (the meaning of life, the universe, and everything).

sh_degree: int = 3

Maximum degree of spherical harmonics to use for each Gaussian’s view-dependent color. Higher degrees allow for more complex view-dependent effects, but increase memory usage and computation time.

Default: 3

ssim_lambda: float = 0.2

Weight for SSIM loss. Reconstruction aims to minimize the Structural Similarity Index Measure (SSIM) between rendered images with the radiance field and ground truth images. This weight applies to the SSIM loss term.

Default: 0.2

tile_size: int = 16

Tile size (in pixels) to use when rendering the Gaussians. You should generally leave this at the default value unless you have a specific reason to change it.

Default: 16

class fvdb_reality_capture.radiance_fields.GaussianSplatReconstruction(model: GaussianSplat3d, sfm_scene: SfmScene, optimizer: BaseGaussianSplatOptimizer, config: GaussianSplatReconstructionConfig, train_indices: ndarray, val_indices: ndarray, pose_adjust_model: CameraPoseAdjustment | None, pose_adjust_optimizer: Adam | None, pose_adjust_scheduler: ExponentialLR | None, writer: GaussianSplatReconstructionBaseWriter, start_step: int, viz_scene: Scene | None, log_interval_steps: int, viz_update_interval_epochs: float, _private: object | None = None)[source]

Engine for reconstructing a Gaussian splat radiance field from posed images in an SfmScene.

This class implements the reconstruction algorithm using a fvdb.GaussianSplat3d model and a differentiable rendering pipeline.

The reconstruction process optimizes the parameters of the Gaussian splats to minimize the difference between rendered images and the input images. The optimization process can be configured using a GaussianSplatReconstructionConfig instance, and the underlying fvdb.GaussianSplat3d model can be customized as well.

The reconstruction can also optionally optimize camera poses if they are not accurate, using a simple pose adjustment model which stores a per-camera embedding which is decoded into a small change in rotation and translation for each camera.

To create a GaussianSplatReconstruction instance, use the from_sfm_scene() class method, which initializes the model and optimizer from an SfmScene and a GaussianSplatReconstructionConfig.

You can configure logging and checkpointing during optimization process using an instance of GaussianSplatReconstructionBaseWriter. By default, this class uses a GaussianSplatReconstructionWriter which logs metrics, images, and checkpoints to a directory.

You can also visualize the optimization process using an optional fvdb.viz.Scene instance, which can display the current state of the Gaussian splat radiance field interactively in a web browser or notebook.

The reconstruction process is started by calling the reconstruct() method, which runs the optimization loop.

To get the reconstructed model, use the model() attribute, which is a fvdb.GaussianSplat3d instance.

You can also get a dictionary of metadata about the reconstruction using the reconstruction_metadata() attribute. This metadata is useful for downstream tasks such as extracting meshes or exporting to USDZ.

The state of the reconstruction can be saved and loaded using the state_dict() and from_state_dict() methods. These methods allow you to pause and resume reconstructions from checkpoints.

property config: GaussianSplatReconstructionConfig

Get the configuration object for the current reconstruction. See GaussianSplatReconstructionConfig for details.

Returns:: config (GaussianSplatReconstructionConfig) – The configuration object containing all parameters for the reconstruction.

eval(show_progress: bool = True, log_tag: str = 'eval') → None[source]

Evaluate the quality of the Gaussian Splat radiance field on the validation dataset.

This method evaluates the model by rendering images from the Gaussian Splat radiance field and computing various image quality metrics including PSNR, SSIM, and LPIPS. It also saves the rendered images and ground truth images to the log writer for visualization.

Parameters:

show_progress (bool) – Whether to display a progress bar during evaluation.
log_tag (str) – Tag to use for logging metrics and images. Data logged will use this tag as a prefix. For metrics, this will be "{log_tag}/metric_name". For images, this will be "{log_tag}/predicted_imageXXXX.jpg" and "{log_tag}/ground_truth_imageXXXX.jpg".

classmethod from_sfm_scene(sfm_scene: ~fvdb_reality_capture.sfm_scene.sfm_scene.SfmScene, writer: ~fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionBaseWriter = <fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionWriter object>, viz_scene: ~fvdb.viz._scene.Scene | None = None, config: ~fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction.GaussianSplatReconstructionConfig = GaussianSplatReconstructionConfig(seed=42, max_epochs=200, max_steps=None, eval_at_percent=[10, 20, 30, 40, 50, 75, 100], save_at_percent=[20, 100], batch_size=1, crops_per_image=1, sh_degree=3, increase_sh_degree_every_epoch=5, initial_opacity=0.1, initial_covariance_scale=1.0, ssim_lambda=0.2, lpips_net='alex', opacity_reg=0.0, scale_reg=0.0, random_bkgd=False, refine_start_epoch=3, refine_stop_epoch=100, refine_every_epoch=0.65, ignore_masks=False, remove_gaussians_outside_scene_bbox=False, optimize_camera_poses=True, pose_opt_lr=1e-05, pose_opt_reg=1e-06, pose_opt_lr_decay=1.0, pose_opt_start_epoch=0, pose_opt_stop_epoch=200, pose_opt_init_std=0.0001, near_plane=0.01, far_plane=10000000000.0, min_radius_2d=0.0, eps_2d=0.3, antialias=False, tile_size=16), optimizer_config: ~fvdb_reality_capture.radiance_fields.gaussian_splat_optimizer.GaussianSplatOptimizerConfig = GaussianSplatOptimizerConfig(max_gaussians=-1, insertion_grad_2d_threshold_mode=<InsertionGrad2dThresholdMode.CONSTANT: 'constant'>, deletion_opacity_threshold=0.005, deletion_scale_3d_threshold=0.1, deletion_scale_2d_threshold=0.15, insertion_grad_2d_threshold=0.0002, insertion_scale_3d_threshold=0.01, insertion_scale_2d_threshold=0.05, opacity_updates_use_revised_formulation=False, insertion_split_factor=2, insertion_duplication_factor=2, reset_opacities_every_n_refinements=30, use_scales_for_deletion_after_n_refinements=30, use_screen_space_scales_for_refinement_until=0, spatial_scale_mode=<SpatialScaleMode.MEDIAN_CAMERA_DEPTH: 'median_camera_depth'>, spatial_scale_multiplier=1.1, means_lr=0.00016, log_scales_lr=0.005, quats_lr=0.001, logit_opacities_lr=0.05, sh0_lr=0.0025, shN_lr=0.000125), use_every_n_as_val: int = -1, viz_update_interval_epochs: float = 10, log_interval_steps: int = 10, device: str | ~torch.device = 'cuda')[source]

Create a GaussianSplatReconstruction instance from an SfmScene, used to reconstruct a 3D Gaussian Splat radiance field from posed images. The reconstruction process and optimizer can be configured using the config (see GaussianSplatReconstructionConfig) and optimizer_config (see GaussianSplatOptimizerConfig) parameters, though the defaults should produce acceptable results.

You can also configure logging and checkpointing during the reconstruction process using an instance of GaussianSplatReconstructionBaseWriter. By default, this class uses a GaussianSplatReconstructionWriter which logs metrics, images, and checkpoints to a directory.

You can interactively visualize the state of the current reconstruction using an optional fvdb.viz.Scene instance, which can display the current Gaussian splat radiance field in a web browser or notebook.

Parameters:

sfm_scene (SfmScene) – The Structure-from-Motion scene containing images and camera poses.
config (GaussianSplatReconstructionConfig) – Configuration for the reconstruction process.
optimizer_config (GaussianSplatOptimizerConfig) – Configuration for the optimizer.
writer (GaussianSplatReconstructionBaseWriter) – Writer instance to handle logging metrics, saving images, checkpoints, PLY, files, and other results.
viz_scene (Scene | None) – Optional fvdb.viz.Scene instance for visualizing optimization progress. If None, no visualization is performed.
use_every_n_as_val (int) – Use every n-th image as a validation image. Default of -1 means no validation images are used.
viz_update_interval_epochs (float) – Interval in epochs at which to update the visualization if viz_scene is not None. An epoch is one full pass through the dataset.
log_interval_steps (int) – Interval (in steps) to log metrics to the writer.
device (str | torch.device) – Device to run the reconstruction on.

Returns:

gaussian_splat_reconstruction (GaussianSplatReconstruction) – An GaussianSplatReconstruction instance ready to reconstruct the scene.

classmethod from_state_dict(state_dict: dict[str, ~typing.Any], override_sfm_scene: ~fvdb_reality_capture.sfm_scene.sfm_scene.SfmScene | None = None, override_use_every_n_as_val: int | None = None, writer: ~fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionBaseWriter = <fvdb_reality_capture.radiance_fields.gaussian_splat_reconstruction_writer.GaussianSplatReconstructionWriter object>, viz_scene: ~fvdb.viz._scene.Scene | None = None, viz_update_interval_epochs: float = 1.0, log_interval_steps: int = 10, device: str | ~torch.device = 'cuda')[source]

Load a GaussianSplatReconstruction instance from a state dictionary (extracted with the state_dict() method). This will restore the model, optimizer, and configuration. You can optionally override the SfmScene and the train/validation split (via the override_use_every_n_as_val parameter). This is useful for resuming reconstruction on a different dataset or with a different train/validation split.

Parameters:

state_dict (dict) – State dictionary containing the model, optimizer, and configuration state. Generated by the state_dict() method.
override_sfm_scene (SfmScene | None) – Optional SfmScene to use instead of the one in the state_dict.
override_use_every_n_as_val (int | None) – If specified, will override the train/validation split using this value. Default of None means to use the train/validation split from the state_dict.
writer (GaussianSplatReconstructionBaseWriter) – GaussianSplatReconstructionBaseWriter instance to handle logging metrics, saving images, checkpoints, PLY, files, and other results.
viz_scene (Scene | None) – Optional fvdb.viz.Scene instance for visualizing optimization progress. If None, no visualization is performed.
viz_update_interval_epochs (float) – Interval in epochs at which to update the visualization if viz_scene is not None. An epoch is one full pass through the dataset.
log_interval_steps (int) – Interval in steps to log metrics to the writer.
device (str | torch.device) – Device to run the reconstruction on.

property model: GaussianSplat3d

Get the Gaussian Splatting model being optimized.

Returns:: model (GaussianSplat3d) – The fvdb.GaussianSplat3d instance being optimized.

optimize(show_progress: bool = True, log_tag: str = 'reconstruct') → None[source]

Run the reconstruction optimization loop to optimize reconstruct a Gaussian Splatting radiance field from a set of posed images.

The optimization loop iterates over the images and poses in the dataset, computes losses, updates the Gaussian’s parameters, and logs metrics at each step. It also handles scheduling refinement steps at specified intervals.

Parameters:

show_progress (bool) – Whether to display a progress bar during reconstruction.
log_tag (str) – Tag to use for logging metrics (e.g., "train"). Data logged will use this tag as a prefix. For metrics, this will be "{log_tag}/metric_name". For checkpoints, this will be "{log_tag}_ckpt.pt". For PLY files, this will be "{log_tag}_ckpt.ply".

Note

When calling evaluation from the reconstruction loop, the log_tag for evaluation will be log_tag+"_eval".

property optimizer: BaseGaussianSplatOptimizer

Get the optimizer used for optimizing the Gaussian Splat radiance field’s parameters.

Returns:: optimizer (BaseGaussianSplatOptimizer) – The optimizer instance. See GaussianSplatOptimizer for details.

property pose_adjust_model: CameraPoseAdjustment | None

Get the camera pose adjustment model used for optimizing camera poses during reconstruction.

Returns:: pose_adjust_model (CameraPoseAdjustment | None) – The pose adjustment model instance, or None if not used.

property pose_adjust_optimizer: Adam | None

Get the optimizer used for adjusting camera poses during reconstruction.

Returns:: pose_adjust_optimizer (torch.optim.Optimizer | None) – The pose adjustment optimizer instance, or None if not used.

property pose_adjust_scheduler: ExponentialLR | None

Get the learning rate scheduler used for adjusting camera poses during reconstruction.

Returns:: pose_adjust_scheduler (torch.optim.lr_scheduler.ExponentialLR | None) – The pose adjustment scheduler instance, or None if not used.

property reconstruction_metadata: dict[str, Tensor | float | int | str]

Get metadata about the reconstruction, including camera parameters and Gaussian rendering parameters.

This metadata is useful for downstream tasks such as extracting meshes or point clouds. It includes:

normalization_transform: The transformation matrix used to normalize the scene.
camera_to_world_matrices: The optimized camera-to-world matrices for the images used during reconstruction.
projection_matrices: The projection matrices for the images used during reconstruction.
image_sizes: The sizes of the images used during reconstruction.
median_depths: The median depth values (distance from camera to scene) for each image used during reconstruction.
eps2d: The 2D epsilon value used when rendering the Gaussian splat radiance field.
near_plane: The near plane distance used when rendering the Gaussian splat radiance field.
far_plane: The far plane distance used when rendering the Gaussian splat radiance field.
min_radius_2d: The minimum 2D radius below which splats are not rendered.
antialias: Whether anti-aliasing is enabled (1) or not (0).
tile_size: The tile size used to render the Gaussian splat radiance field.

Returns:: metadata (dict[str, torch.Tensor | float | int | str]) – A dictionary containing metadata about the reconstruction.

save_ply(path: str | Path) → None[source]

Save the current Gaussian Splatting model to a PLY file.

Parameters:: path (str | Path) – The file path where the PLY file will be saved.

save_usdz(path: str | Path) → None[source]

Save the current Gaussian Splatting model to a USDZ file.

Parameters:: path (str | Path) – The file path where the USDZ file will be saved.

state_dict() → dict[str, Any][source]

Get the state dictionary of the current optimization state, including model, optimizer, and configuration parameters.

The state dictionary can be used to save and resume optimization from checkpoints. Its keys include:

"magic": A magic string to identify the checkpoint type.
"version": The version of the checkpoint format.
"step": The current global optimization step.
"config": The configuration parameters used for optimization.
"sfm_scene": The state dictionary of the SfM scene.
"model": The state dictionary of the Gaussian Splatting model.
"optimizer": The state dictionary of the optimizer.
"train_indices": The indices of the training images in the dataset.
"val_indices": The indices of the validation images in the dataset.
"num_training_poses": The number of training poses if pose adjustment is used, otherwise None.
"pose_adjust_model": The state dictionary of the camera pose adjustment model if used, otherwise None.
"pose_adjust_optimizer": The state dictionary of the pose adjustment optimizer if used, otherwise None.
"pose_adjust_scheduler": The state dictionary of the pose adjustment scheduler if used, otherwise None.

Returns:: state_dict (dict[str, Any]) – A dictionary containing the state of the optimization process.

property training_dataset: SfmDataset

Get the training dataset used for training the Gaussian Splatting model.

Returns:: training_dataset (SfmDataset) – The training dataset instance.

property validation_dataset: SfmDataset

Get the validation dataset used for evaluating the Gaussian Splatting model.

Returns:: validation_dataset (SfmDataset) – The validation dataset instance.

version = '0.1.0'

Gaussian Splat Optimizers

class fvdb_reality_capture.radiance_fields.BaseGaussianSplatOptimizer[source]

Base class for optimizers that reconstruct a scene using Gaussian Splat radiance fields over a collection of posed images.

This class defines the interface for optimizers that optimize the parameters of a fvdb.GaussianSplat3d model, and provides utilities to refine the model by inserting and deleting Gaussians based on their contribution to the optimization.

Currently, the only concrete implementation is GaussianSplatOptimizer, which implements the algorithm in the original Gaussian Splatting paper.

abstractmethod filter_gaussians(indices_or_mask: Tensor)[source]

Abstract method to filter the Gaussians in the model based on the given indices or mask, and update the corresponding optimizer state accordingly. This can be used to delete, shuffle, or duplicate the Gaussians during optimization.

Parameters:: indices_or_mask (torch.Tensor) – A 1D tensor of indices or a boolean mask indicating which Gaussians to keep.

abstractmethod classmethod from_state_dict(model: GaussianSplat3d, state_dict: dict[str, Any]) → BaseGaussianSplatOptimizer[source]

Abstract method to create a new BaseGaussianSplatOptimizer instance from a model and a state dict (obtained from state_dict()).

Parameters:

model (GaussianSplat3d) – The GaussianSplat3d model to optimize.
state_dict (dict[str, Any]) – A state dict previously obtained from state_dict().

Returns:

optimizer (BaseGaussianSplatOptimizer) – A new BaseGaussianSplatOptimizer instance.

abstractmethod refine(zero_gradients: bool = True) → dict[str, Any][source]

Abstract method to refine the model by inserting and deleting Gaussians based on their contribution to the optimization.

Parameters:: zero_gradients (bool) – If True, zero the gradients of all tensors being optimized after refining.
Returns:: refinement_stat (dict[str, Any]) – A dictionary containing statistics about the refinement step.

abstractmethod reset_learning_rates_and_decay(batch_size: int, expected_steps: int) → None[source]

Abstract method to set the learning rates and learning rate decay factor based on the batch size and the expected number of optimization steps (times step() is called).

This is useful if you want to change the batch size or expected number of steps after creating the optimizer.

Parameters:

batch_size (int) – The batch size used for training. This is used to scale the learning rates.
expected_steps (int) – The expected number of optimization steps.

abstractmethod state_dict() → dict[str, Any][source]

Abstract method to return a serializable state dict for the optimizer.

Returns:: state_dict (dict[str, Any]) – A state dict containing the state of the optimizer.

abstractmethod step()[source]: Abstract method to step the optimizer (updating the model’s parameters).

abstractmethod zero_grad(set_to_none: bool = False)[source]

Abstract method to zero the gradients of all tensors being optimized.

Parameters:: set_to_none (bool) – If True, set the gradients to None instead of zeroing them. This can be more memory efficient.

class fvdb_reality_capture.radiance_fields.GaussianSplatOptimizerConfig(max_gaussians: int = -1, insertion_grad_2d_threshold_mode: InsertionGrad2dThresholdMode = InsertionGrad2dThresholdMode.CONSTANT, deletion_opacity_threshold: float = 0.005, deletion_scale_3d_threshold: float = 0.1, deletion_scale_2d_threshold: float = 0.15, insertion_grad_2d_threshold: float = 0.0002, insertion_scale_3d_threshold: float = 0.01, insertion_scale_2d_threshold: float = 0.05, opacity_updates_use_revised_formulation: bool = False, insertion_split_factor: int = 2, insertion_duplication_factor: int = 2, reset_opacities_every_n_refinements: int = 30, use_scales_for_deletion_after_n_refinements: int = 30, use_screen_space_scales_for_refinement_until: int = 0, spatial_scale_mode: SpatialScaleMode = SpatialScaleMode.MEDIAN_CAMERA_DEPTH, spatial_scale_multiplier: float = 1.1, means_lr: float = 0.00016, log_scales_lr: float = 0.005, quats_lr: float = 0.001, logit_opacities_lr: float = 0.05, sh0_lr: float = 0.0025, shN_lr: float = 0.000125)[source]

Parameters for configuring the GaussianSplatOptimizer.

deletion_opacity_threshold: float = 0.005: If a Gaussian’s opacity drops below this value, delete it during refinement.

deletion_scale_2d_threshold: float = 0.15: If the maximum projected size of a Gaussian between refinement steps exceeds this value then delete it during refinement.

Note

This parameter is only used if set use_screen_space_scales_for_refinement_until is greater than 0.

deletion_scale_3d_threshold: float = 0.1: If a Gaussian’s 3d scale is above this value, then delete it during refinement.

insertion_duplication_factor: int = 2: When duplicating Gaussians during insertion, this value specifies the total number of copies (including the original) that will result for each selected source Gaussian. The original is kept, and insertion_duplication_factor - 1 new identical copies are added. e.g. if this value is 3, each duplicated Gaussian becomes 3 copies of itself (the original plus 2 new). This value must be >= 2.

insertion_grad_2d_threshold: float = 0.0002: Threshold value on the accumulated norm of projected mean gradients between refinement steps to determine whether a Gaussian has high error and is a candidate for duplication or splitting.

Note

If insertion_grad_2d_threshold_mode is InsertionGrad2dThresholdMode.CONSTANT, then this value is used directly as the threshold, and must be positive.

Note

If insertion_grad_2d_threshold_mode is InsertionGrad2dThresholdMode.PERCENTILE_FIRST_ITERATION or InsertionGrad2dThresholdMode.PERCENTILE_EVERY_ITERATION, then this value must be in the range (0.0, 1.0) (exclusive).

insertion_grad_2d_threshold_mode: InsertionGrad2dThresholdMode = 'constant'

Whether to use a fixed threshold for insertion_grad_2d_threshold (constant), a value computed as a percentile of the distribution of screen space mean gradients on the first iteration, or a percentile value computed at each refinement step.

See InsertionGrad2dThresholdMode for details on the available modes.

insertion_scale_2d_threshold: float = 0.05: Split high-error (determined by insertion_grad_2d_threshold) Gaussians whose maximum projected size exceeds this value. These Gaussians are too large to capture the detail in the region they cover, so we split them to allow them to specialize.

Note

This parameter is only used if set use_screen_space_scales_for_refinement_until is greater than 0.

insertion_scale_3d_threshold: float = 0.01: Duplicate high-error (determined by insertion_grad_2d_threshold) Gaussians whose 3d scale is below this value. These Gaussians are too small to capture the detail in the region they cover, so we duplicate them to allow them to specialize.

insertion_split_factor: int = 2: When splitting Gaussians during insertion, this value specifies the total number of new Gaussians that will replace each selected source Gaussian. The original is removed and replaced by insertion_split_factor new Gaussians. e.g. if this value is 2, each split Gaussian is replaced by 2 new smaller Gaussians (the original is removed). This value must be >= 2.

log_scales_lr: float = 0.005: Learning rate for the log scales of the Gaussians.

logit_opacities_lr: float = 0.05: Learning rate for the logit opacities of the Gaussians.

max_gaussians: int = -1: The maximum number of Gaussians to allow in the model. If -1, no limit.

means_lr: float = 0.00016

Learning rate for the means of the Gaussians. This is also scaled by the spatial scale computed from the scene.

See spatial_scale_mode for details on how the spatial scale is computed.

opacity_updates_use_revised_formulation: bool = False: When splitting Gaussians, whether to update the opacities of the new Gaussians using the revised formulation from *”Revising Densification in Gaussian Splatting”*. This removes a bias which weighs newly split Gaussians contribution to the image more heavily than older Gaussians.

quats_lr: float = 0.001: Learning rate for the quaternions of the Gaussians.

reset_opacities_every_n_refinements: int = 30: If set to a positive value, then clamp all opacities to be at most twice the value of deletion_opacity_threshold every time GaussianSplatOptimizer.refine() is called reset_opacities_every_n_refinements times. This prevents Gaussians from becoming completely occluded by denser Gaussians and thus unable to be optimized.

sh0_lr: float = 0.0025: Learning rate for the diffuse spherical harmonics (order 0).

shN_lr: float = 0.000125: Learning rate for the specular spherical harmonics (order > 0).

spatial_scale_mode: SpatialScaleMode = 'median_camera_depth'

How to interpret 3D optimization scale thresholds and learning rates (i.e. insertion_scale_3d_threshold, deletion_scale_3d_threshold, and means_lr). These are scaled by a spatial scale computed from the scene, so they are relative to the size of the scene being optimized.

See SpatialScaleMode for details on the available modes.

spatial_scale_multiplier: float = 1.1: Multiplier to apply to the spatial scale computed from the scene to get a slightly larger scale.

use_scales_for_deletion_after_n_refinements: int = 30

If set to a positive value, then after use_scales_for_deletion_after_n_refinements calls to GaussianSplatOptimizer.refine(), use the 3D scales of the Gaussians to determine whether to delete them. This will delete Gaussians that have grown too large in 3D space and are not contributing to the optimization.

By default, this value matches reset_opacities_every_n_refinements so that both behaviors are enabled at the same time.

use_screen_space_scales_for_refinement_until: int = 0: If set to a positive value, then use threshold the maximum projected size of Gaussians between refinement steps to decide whether to split or delete Gaussians that are too large. This behavior is enabled until GaussianSplatOptimizer.refine() has been called use_screen_space_scales_for_refinement_until times. After that, only 3D scales are used for refinement.

class fvdb_reality_capture.radiance_fields.SpatialScaleMode(*values)[source]

How to interpret 3D optimization scale thresholds (insertion_scale_3d_threshold, deletion_scale_3d_threshold) and learning rates. These thresholds specified in a unitless space, and are subsequently multipled by a spatial scale computed from the scene being optimized. There are several heuristics for computing this spatial scale, specified by the config:

ABSOLUTE_UNITS = 'absolute_units': Use the thresholds and learning rates as-is, in absolute world units (e.g. meters).

MAX_CAMERA_DEPTH = 'max_camera_depth': Compute the maximum depth of SfmPoints across all cameras in the scene, and use that as the spatial scale.

MAX_CAMERA_TO_CENTROID = 'max_camera_diagonal': Compute the maximum distance from any camera to the centroid of all camera positions (good for orbits around an object).

MEDIAN_CAMERA_DEPTH = 'median_camera_depth': Compute the median depth of SfmPoints across of all cameras in the scene, and use that as the spatial scale.

SCENE_DIAGONAL_PERCENTILE = 'relative_to_scene_diagonal': Compute the axis-aligned bounding box of all points within the 5th to 95th percentile range along each axis, and use the given percentile of the length of the diagonal of this box as the spatial scale.

class fvdb_reality_capture.radiance_fields.InsertionGrad2dThresholdMode(*values)[source]

The GaussianSplatOptimizer uses a threshold on the accumulated norm of 2D mean gradients to use during refinement.

There are several modes for computing this threshold, specified by the config. These modes let you adapt the refinement behavior to the statistics of the gradients during training.

CONSTANT = 'constant': Always use the fixed threshold specified by self._config.insertion_grad_2d_threshold. This mode with a default value (0.0002) will produce okay results, but may not be optimal for all types of captures.

PERCENTILE_EVERY_ITERATION = 'percentile_every_iteration': During every refinement step, set the threshold to the given percentile of the gradients. For highly detailed scenes, this mode may be useful to adaptively insert more Gaussians as the model learns more detail. This generally produces many more Gaussians and more detailed results at the cost of more memory and compute.

PERCENTILE_FIRST_ITERATION = 'percentile_first_iteration': During the first refinement step, set the threshold to the given percentile of the gradients. For all subsequent refinement steps, use that fixed threshold. Using this mode will have similar behavior to CONSTANT but will adapt to the scale of the gradients which can be more robust across different capture types.

class fvdb_reality_capture.radiance_fields.GaussianSplatOptimizer(model: GaussianSplat3d, optimizer: Adam, config: GaussianSplatOptimizerConfig, spatial_scale: float, refine_count: int, step_count: int, _private: Any = None)[source]

Optimizer for reconstructing a scene using Gaussian Splat radiance fields over a collection of posed images.

The optimizer uses an Adam optimizer to optimize the parameters of a fvdb.GaussianSplat3d model, and provides utilities to refine the model by inserting and deleting Gaussians based on their contribution to the optimization. The tools here mostly follow the algorithm in the original Gaussian Splatting paper (https://arxiv.org/abs/2308.04079).

Note

You should not call the constructor of this class directly. Instead use from_model_and_config() or from_state_dict().

filter_gaussians(indices_or_mask: Tensor)[source]

Filter the Gaussians in the model to only those specified by the given indices or mask and update the optimizer state accordingly. This can be used to delete, shuffle, or duplicate the Gaussians during optimization.

Parameters:: indices_or_mask (torch.Tensor) – A 1D tensor of indices or a boolean mask indicating which Gaussians to keep.

classmethod from_model_and_scene(model: ~fvdb.gaussian_splatting.GaussianSplat3d, sfm_scene: ~fvdb_reality_capture.sfm_scene.sfm_scene.SfmScene, config: ~fvdb_reality_capture.radiance_fields.gaussian_splat_optimizer.GaussianSplatOptimizerConfig = GaussianSplatOptimizerConfig(max_gaussians=-1, insertion_grad_2d_threshold_mode=<InsertionGrad2dThresholdMode.CONSTANT: 'constant'>, deletion_opacity_threshold=0.005, deletion_scale_3d_threshold=0.1, deletion_scale_2d_threshold=0.15, insertion_grad_2d_threshold=0.0002, insertion_scale_3d_threshold=0.01, insertion_scale_2d_threshold=0.05, opacity_updates_use_revised_formulation=False, insertion_split_factor=2, insertion_duplication_factor=2, reset_opacities_every_n_refinements=30, use_scales_for_deletion_after_n_refinements=30, use_screen_space_scales_for_refinement_until=0, spatial_scale_mode=<SpatialScaleMode.MEDIAN_CAMERA_DEPTH: 'median_camera_depth'>, spatial_scale_multiplier=1.1, means_lr=0.00016, log_scales_lr=0.005, quats_lr=0.001, logit_opacities_lr=0.05, sh0_lr=0.0025, shN_lr=0.000125)) → GaussianSplatOptimizer[source]

Create a new GaussianSplatOptimizer instance from a model and config.

Parameters:

model (GaussianSplat3d) – The GaussianSplat3d model to optimize.
config (GaussianSplatOptimizerConfig) – Configuration options for the optimizer.
means_lr_scale (float) – A scale factor to apply to the means learning rate.
means_lr_decay_exponent (float) – The exponent used for decaying the means learning rate.
batch_size (int) – The batch size used for training. This is used to scale the learning rates.

Returns:

GaussianSplatOptimizer – A new GaussianSplatOptimizer instance.

classmethod from_state_dict(model: GaussianSplat3d, state_dict: dict[str, Any]) → GaussianSplatOptimizer[source]

Create a new GaussianSplatOptimizer instance from a model and a state dict.

Parameters:

model (GaussianSplat3d) – The GaussianSplat3d model to optimize.
state_dict (dict[str, Any]) – A state dict previously obtained from state_dict().

Returns:

optimizer (GaussianSplatOptimizer) – A new GaussianSplatOptimizer instance.

refine(zero_gradients: bool = True) → dict[str, int][source]

Perform a step of refinement by inserting Gaussians where more detail is needed and deleting Gaussians that are not contributing to the optimization. Refinement happens via three mechanisms:

Duplication: Make insertion_duplication_factor copies of a Gaussian.

We duplicate a Gaussian if its 3D size is below some threshold and the gradient of its projected means over time is high on average. Intuitively, this means the Gaussian is not taking up a lot of space in the scene, but consistently wants to change positions when viewed from different cameras. Likely this Gaussian is stuck trying to represent too much of the scene and should be split into multiple copies.

Splitting: Split a Gaussian into insertion_split_factor smaller ones.

We split a Gaussian when its 3D size exceeds a threshold value and the gradient of its projected mean over time is high on average. In this case, a Gaussian is likely too large for the amount of detail it represents and should be split to capture detail in the image.

Deletion: Removing a Gaussian from the scene.

We delete a Gaussian if its opacity falls below a threshold since it is not contributing much to rendered images.

Parameters:

zero_gradients (bool) – If True, zero the gradients after refinement.

Returns:

refine_stats (dict[str, int]) – A dictionary containing statistics about the refinement step with the keys:

"num_duplicated": The number of Gaussians that were duplicated.
"num_split": The number of Gaussians that were split.
"num_deleted": The number of Gaussians that were deleted.

reset_learning_rates_and_decay(batch_size: int, expected_steps: int)[source]

Set the learning rates and learning rate decay factor based on the batch size and the expected number of optimization steps (i.e. the number of times step() is called).

This is useful if you want to change the batch size or expected number of steps after creating the optimizer.

Parameters:

batch_size (int) – The batch size used for training. This is used to scale the learning rates.
expected_steps (int) – The expected number of optimization steps.

state_dict() → dict[str, Any][source]

Return a serializable state dict for the optimizer.

Returns:: state_dict (dict[str, Any]) – A state dict containing the state of the optimizer.

step()[source]: Step the optimizer (updating the model’s parameters) and decay the learning rate of the means.

zero_grad(set_to_none: bool = False)[source]

Zero the gradients of all tensors being optimized.

Parameters:: set_to_none (bool) – If True, set the gradients to None instead of zeroing them. This can be more memory efficient.

Gaussian Splat Logging and Checkpointing

class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionBaseWriter[source]

Base class for logging and saving data during Gaussian splat reconstruction.

This class defines the interface for logging metrics, saving images, checkpoints, and PLY files during Gaussian splat reconstruction. Concrete implementations must implement all abstract methods.

To implement custom logging/saving behavior, subclass this class and implement the abstract methods.

Abstract method to log a scalar metric value. This function is called during reconstruction to log metrics such as loss, PSNR, etc.

Parameters:

global_step (int) – The global step at which the metric is being logged.
metric_name (str) – The name of the metric being logged.
metric_value (NumericScalar) – The value of the metric being logged. Must be a scalar type (int, float, np.number, torch.number, etc.).

abstractmethod save_checkpoint(global_step: int, checkpoint_name: str, checkpoint: dict[str, Any]) → None[source]

Abstract method to save a checkpoint. This function is called during reconstruction to save model checkpoints.

Parameters:

global_step (int) – The global step at which the checkpoint is being saved.
checkpoint_name (str) – The name of the checkpoint being saved.
checkpoint (dict[str, Any]) – The checkpoint data to be saved.

abstractmethod save_image(global_step: int, image_name: str, image: Tensor) → None[source]

Abstract method to save an image. This function is called during reconstruction to save images such as rendered outputs or intermediate results.

Parameters:

global_step (int) – The global step at which the image is being saved.
image_name (str) – The name of the image being saved.
image (torch.Tensor) – The image tensor to be saved.

abstractmethod save_ply(global_step: int, ply_name: str, model: GaussianSplat3d, metadata: dict[str, Any] | None = None) → None[source]

Abstract method to save a Gaussian splat model to a PLY file. This function is called during reconstruction to save the current state of the model.

Parameters:

global_step (int) – The global step at which the PLY file is being saved.
ply_name (str) – The name of the PLY file being saved.
model (GaussianSplat3d) – The Gaussian splat model to be saved.
metadata (dict[str, Any] | None) – Optional metadata to be saved with the PLY file.

class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionWriterConfig(save_images: bool = False, save_checkpoints: bool = True, save_plys: bool = True, save_metrics: bool = True, metrics_file_buffer_size: int = 8388608, use_tensorboard: bool = False, save_images_to_tensorboard: bool = False)[source]

Parameters for configuring the behavior of a GaussianSplatReconstructionWriter. Controls what data gets saved to disk, how much buffering to use, and whether to use TensorBoard.

metrics_file_buffer_size: int = 8388608

How much buffering (in bytes) to use for metrics file logging. Larger values can improve performance when logging many metrics.

Default is 8 MiB.

save_checkpoints: bool = True

Whether to save checkpoints to disk. If False, checkpoints will not be saved to disk.

Default is True.

save_images: bool = False

Whether to save images to disk. If False, images will not be saved to disk.

Default is False.

save_images_to_tensorboard: bool = False

Whether to also save images to TensorBoard if use_tensorboard is True. If True, images will be saved to TensorBoard.

Default is False.

save_metrics: bool = True

Whether to save metrics to a CSV file. If False, metrics will not be saved to a CSV file.

Default is True.

save_plys: bool = True

Whether to save PLY files to disk. If False, PLY files will not be saved to disk.

Default is True.

use_tensorboard: bool = False

Whether to use TensorBoard for logging metrics and images. If True, metrics and images will be logged to TensorBoard.

Default is False.

class fvdb_reality_capture.radiance_fields.GaussianSplatReconstructionWriter(run_name: str | None, save_path: Path | None, exist_ok: bool = False, config: GaussianSplatReconstructionWriterConfig = GaussianSplatReconstructionWriterConfig(save_images=False, save_checkpoints=True, save_plys=True, save_metrics=True, metrics_file_buffer_size=8388608, use_tensorboard=False, save_images_to_tensorboard=False))[source]

Class to handle logging and saving data during Gaussian splat reconstruction. This class is responsible for saving, checkpoints, PLY files, images, and metrics. It can also log metrics and images to TensorBoard if requested.

save_path/run_name/
    checkpoints/
        <step>/
            <first_checkpoint>.pth
            <second_checkpoint>.pth
            ...
        <step>/
            ...
    ply/
        <step>/
            <first_ply>.ply
            <second_ply>.ply
            ...
        <step>/
            ...
    images/
        <step>/
            <first_image>.png
            <second_image>.png
            ...
        <step>/
            ...
    tensorboard/
        events.out.tfevents...
    metrics_log.csv

Log a scalar metric value. This function is called during reconstruction to log metrics such as loss, PSNR, etc.

Parameters:

global_step (int) – The global step at which the metric is being logged.
metric_name (str) – The name of the metric being logged.
metric_value (NumericScalar) – The value of the metric being logged.

property log_path: Path | None

Return the path where logged results are being saved, or None if no results are being saved.

Returns:: log_path (pathlib.Path | None) – The path where logged results are being saved, or None if no logged results are being saved.

property run_name: str | None

Return the name of this reconstruction run, or None if the writer is not saving any data. The name of the run matches the name of the directory where logged results are being saved.

Returns:: str | None – The name of this reconstruction run, or None if the writer is not saving any data.

save_checkpoint(global_step: int, checkpoint_name: str, checkpoint: dict[str, Any]) → None[source]

Save a reconstruction checkpoint to disk. This function is called during reconstruction to save model and optimizer state.

Parameters:

global_step (int) – The global step at which the checkpoint is being saved.
checkpoint_name (str) – The name of the checkpoint file. This will be used as the file name. Must have a .pth or .pt suffix.
checkpoint (dict[str, Any]) – The checkpoint dictionary to be saved. Typically contains model state, optimizer state, etc.

save_image(global_step: int, image_name: str, image: Tensor, jpeg_quality: int = 98)[source]

Save an image to disk and/or TensorBoard. This function is called during reconstruction to save rendered images, error maps, etc.

Parameters:

global_step (int) – The global step at which the image is being saved.
image_name (str) – The name of the image being saved. This will be used as the file name. Must have a .png or .jpg/.jpeg suffix.
image (torch.Tensor) – The image tensor to be saved. Must have shape (H, W), (H, W, C) or (B, H, W, C) and have a floating point or uint8 dtype.
jpeg_quality (int) – Quality of JPEG images if saving as JPEG. Must be between 0 and 100. Default is 98.

save_ply(global_step: int, ply_name: str, model: GaussianSplat3d, metadata: dict[str, Any] | None = None) → None[source]

Save the current Gaussian splat model to a PLY file. This function is called during reconstruction to save the reconstructed model at various stages.

Parameters:

global_step (int) – The global step at which the PLY file is being saved.
ply_name (str) – The name of the PLY file. This will be used as the file name. Must have a .ply suffix.
model (GaussianSplat3d) – The Gaussian splat model to be saved.
metadata (dict[str, Any] | None) – Optional metadata to include in the PLY file (e.g. camera parameters, reconstruction config, etc.).

fvdb_reality_capture.radiance_fields

Gaussian Splat Radiance Field Reconstruction

Gaussian Splat Optimizers

Gaussian Splat Logging and Checkpointing

`fvdb_reality_capture.radiance_fields`