`fvdb_reality_capture.tools`

fvdb_reality_capture.tools.download_example_data(dataset='all', download_path: str | Path = PosixPath('/__w/fvdb-core/fvdb-core/data'))[source]

Download example datasets for fvdb-reality-capture.

The fvdb-reality-capture package provides several example datasets that can be used for testing and experimentation. This function allows users to easily download these datasets for use in their own projects.

The available datasets are:

"mipnerf360": A dataset of 360-degree images originally captured for the Mip-NeRF 360 paper.
"gettysburg": A dataset featuring an aeriel flyover of the city of Gettysburg, Pennsylvania.
"safety_park": An aerial drone orbit of a safety training park.
"miris_factory": A dataset of synthetically rendered images of a car factory, generated by Miris.
"all": Download all available datasets.

Each dataset is downloaded as a compressed archive and extracted into its own subdirectory in the path specified by download_path.

Parameters:

dataset (str) – The name of the dataset to download. Use "all" to download all datasets. Default is "all".
download_path (str | pathlib.Path) – The directory where datasets will be downloaded. Default is the data subdirectory in the current working directory.

fvdb_reality_capture.tools.export_splats_to_usdz(model: GaussianSplat3d, out_path: str | Path) → None[source]

Export an fvdb.GaussianSplat3d model to a USDZ file.

Parameters:

model (fvdb.GaussianSplat3d) – The Gaussian Splat model to save to a usdz file
out_path (str | Path) – The output path for the usdz file. If the file extension is not .usdz, it will be added. e.g., ./scene will save to ./scene.usdz.

fvdb_reality_capture.tools.filter_splats_above_scale(splats: GaussianSplat3d, prune_scale3d_threshold=0.05) → GaussianSplat3d[source]

Remove all gaussians with sizes larger than provided percent threshold (relative to scene scale)

Parameters:

splats (fvdb.GaussianSplat3d) – The GaussianSplat3d to filter.
prune_scale3d_threshold (float) – Drop all spats with scales larger than this threshold (relative to scene scale).

Returns:

filtered_splats (fvdb.GaussianSplat3d) – The GaussianSplat3d after removal of gaussians outside threshold

fvdb_reality_capture.tools.filter_splats_below_scale(splats: GaussianSplat3d, prune_scale3d_threshold=0.05) → GaussianSplat3d[source]

Remove all gaussians with sizes smaller than provided percent threshold (relative to scene scale)

Parameters:

splats (fvdb.GaussianSplat3d) – The GaussianSplat3d to filter.
prune_scale3d_threshold (float) – Drop all spats with scales smaller than this threshold (relative to scene scale).

Returns:

filtered_splats (fvdb.GaussianSplat3d) – The GaussianSplat3d after removal of gaussians outside threshold

fvdb_reality_capture.tools.filter_splats_by_mean_percentile(splats: GaussianSplat3d, percentile=[0.98, 0.98, 0.98, 0.98, 0.98, 0.98], decimate=4) → GaussianSplat3d[source]

Remove all gaussians with locations falling outside the provided percentile ranges

Parameters:

splats (fvdb.GaussianSplat3d) – The GaussianSplat3d to filter.
percentile (NumericMaxRank1) – The percentiles to use for filtering. The percentiles are in the order of (minx, maxx, miny, maxy, minz, maxz).
decimate (int) – Decimate the number of splats by this factor when calculating the percentile range.

Returns:

filtered_splats (fvdb.GaussianSplat3d) – The GaussianSplat3d after removal of gaussians outside percentile bounds

fvdb_reality_capture.tools.filter_splats_by_opacity_percentile(splats: GaussianSplat3d, percentile=0.98, decimate=4) → GaussianSplat3d[source]

Remove all gaussians falling outside provided percentile range for logit_opacities.

Parameters:

splats (fvdb.GaussianSplat3d) – The GaussianSplat3d to filter.
percentile (float) – The percentile to use for filtering. The percentile is the percentile of the logit_opacities to use for filtering.
decimate (int) – Decimate the number of splats by this factor when calculating the percentile range.

Returns:

filtered_splats (fvdb.GaussianSplat3d) – The GaussianSplat3d after removal of gaussians outside opacity percentile range

Extract a triangle mesh from a fvdb.GaussianSplat3d using TSDF fusion from depth maps rendered from the Gaussian splat radiance field.

The algorithm proceeds in three steps:

First, it renders depth and color/feature images from the Gaussian splat radiance field at each of the specified camera views.
Second, it integrates the depths and colors/features into a sparse fvdb.Grid in a narrow band around the surface using sparse truncated signed distance field (TSDF) fusion. The result is a sparse voxel grid representation of the scene where each voxel stores a signed distance value and color (or other features).
Third, it extracts a mesh using the sparse marching cubes algorithm implemented in fvdb.Grid.marching_cubes over the Grid and TSDF values. This step produces a triangle mesh with vertex colors sampled from the colors/features stored in the Grid.

Note

For higher quality meshes, consider using fvdb_reality_capture.tools.mesh_from_splats_dlnr() which uses depth maps estimated using a deep learning-based depth estimation approach instead of the raw depth maps rendered from the Gaussian splat model.

Note

If you want to extract the TSDF grid and colors/features without extracting a mesh, you can use fvdb_reality_capture.tools.tsdf_from_splats_dlnr() directly.

Note

If you want to extract a point cloud from a Gaussian splat model instead of a mesh, consider using fvdb_reality_capture.tools.point_cloud_from_splats() which extracts a point cloud directly from depth images rendered from the Gaussian splat model.

Note

The TSDF fusion algorithm is a method for integrating multiple depth maps into a single volumetric representation of a scene encoded a truncated signed distance field (i.e. a signed distance field in a narrow band around the surface). TSDF fusion was first described in the paper “KinectFusion: Real-Time Dense Surface Mapping and Tracking”. We use a modified version of this algorithm which only allocates voxels in a narrow band around the surface of the model to reduce memory usage and speed up computation.

Parameters:

model (GaussianSplat3d) – The Gaussian splat radiance field to extract a mesh from.
camera_to_world_matrices (NumericMaxRank3) – A (C, 4, 4)-shaped Tensor containing the camera to world matrices to render depth images from for mesh extraction where C is the number of camera views.
projection_matrices (NumericMaxRank3) – A (C, 3, 3)-shaped Tensor containing the perspective projection matrices used to render images for mesh extraction where C is the number of camera views.
image_sizes (NumericMaxRank2) – A (C, 2)-shaped Tensor containing the height and width of each image to extract from the Gaussian splat where C is the number of camera views. i.e., image_sizes[c] = (height_c, width_c).
truncation_margin (float) – Margin for truncating the TSDF, in world units. This defines the half-width of the band around the surface where the TSDF is defined in world units.
grid_shell_thickness (float) – The number of voxels along each axis to include in the TSDF volume. This defines the resolution of the Grid around narrow band around the surface. Default is 3.0.
near (NumericMaxRank1) – Near plane distance below which to ignore depth samples. Can be a scalar to use a single value for all images or a tensor-like object of shape (C,) to use a different value for each image. Default is 0.1.
far (NumericMaxRank1) – Far plane distance above which to ignore depth samples. Can be a scalar to use a single value for all images or a tensor-like object of shape (C,) to use a different value for each image. Default is 1e10.
alpha_threshold (float) – Alpha threshold to mask pixels where the Gaussian splat model is transparent (usually indicating the background). Default is 0.1.
image_downsample_factor (int) – Factor by which to downsample the rendered images for depth estimation. Default is 1, i.e. no downsampling.
dtype (torch.dtype) – Data type for the TSDF grid values. Default is torch.float16.
feature_dtype (torch.dtype) – Data type for the color features. Default is torch.uint8.
show_progress (bool) – Whether to show a progress bar during processing. Default is True.

Returns:

mesh_vertices (torch.Tensor) – A (V, 3)-shaped tensor of mesh vertices of the extracted mesh.
mesh_faces (torch.Tensor) – A (F, 3)-shaped tensor of faces of the extracted mesh.
mesh_colors (torch.Tensor) – A (V, D)-shaped tensor of colors of the extracted mesh vertices where D is the number of channels encoded by the Gaussian Splat model (usually 3 for RGB colors).

Extract a triangle mesh from a fvdb.GaussianSplat3d using TSDF fusion from depth maps predicted from the Gaussian splat radiance field and the DLNR foundation model. DLNR is a high-frequency stereo matching network that computes optical flow and disparity maps between two images, which can be used to compute depth.

This algorithm proceeds in three steps:

First, it renders stereo pairs of images from the Gaussian splat radiance field, and uses DLNR to compute depth maps from these stereo pairs in the frame of the first image in the pair. The result is a set of depth maps aligned with the rendered images.
Second, it integrates the depths and colors/features into a sparse fvdb.Grid in a narrow band around the surface using sparse truncated signed distance field (TSDF) fusion. The result is a sparse voxel grid representation of the scene where each voxel stores a signed distance value and color (or other features).
Third, it extracts a mesh using the sparse marching cubes algorithm implemented in fvdb.Grid.marching_cubes over the Grid and TSDF values. This step produces a triangle mesh with vertex colors sampled from the colors/features stored in the Grid.

Note

If you want to extract the TSDF grid and colors/features without extracting a mesh, you can use fvdb_reality_capture.tools.tsdf_from_splats_dlnr() directly.

Note

If you want to extract a point cloud from a Gaussian splat model instead of a mesh, consider using fvdb_reality_capture.tools.point_cloud_from_splats() which extracts a point cloud directly from depth images rendered from the Gaussian splat model.

Note

This algorithm implemented is based on the paper “GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views”. We make key improvements to the method by using a more robust stereo baseline estimation method and by using a much more efficient sparse TSDF fusion implementation built on fVDB.

Note

The TSDF fusion algorithm is a method for integrating multiple depth maps into a single volumetric representation of a scene encoded a truncated signed distance field (i.e. a signed distance field in a narrow band around the surface). TSDF fusion was first described in the paper “KinectFusion: Real-Time Dense Surface Mapping and Tracking”. We use a modified version of this algorithm which only allocates voxels in a narrow band around the surface of the model to reduce memory usage and speed up computation.

Note

The DLNR model is a high-frequency stereo matching network that computes optical flow and disparity maps between two images. The DLNR model is described in the paper “High-Frequency Stereo Matching Network”.

Parameters:

model (GaussianSplat3d) – The Gaussian splat radiance field to extract a mesh from.
camera_to_world_matrices (NumericMaxRank3) – A (C, 4, 4)-shaped Tensor containing the camera to world matrices to render depth images from for mesh extraction where C is the number of camera views.
projection_matrices (NumericMaxRank3) – A (C, 3, 3)-shaped Tensor containing the perspective projection matrices used to render images for mesh extraction where C is the number of camera views.
image_sizes (NumericMaxRank2) – A (C, 2)-shaped Tensor containing the height and width of each image to extract from the Gaussian splat where C is the number of camera views. i.e., image_sizes[c] = (height_c, width_c).
truncation_margin (float) – Margin for truncating the TSDF, in world units. This defines the half-width of the band around the surface where the TSDF is defined in world units.
grid_shell_thickness (float) – The number of voxels along each axis to include in the TSDF volume. This defines the resolution of the Grid around narrow band around the surface. Default is 3.0.
baseline (float) – Baseline distance for stereo depth estimation. If use_absolute_baseline is False, this is interpreted as a fraction of the mean depth of each image. Otherwise, it is interpreted as an absolute distance in world units. Default is 0.07.
near (float) – Near plane distance below which to ignore depth samples, as a multiple of the baseline.
far (float) – Far plane distance above which to ignore depth samples, as a multiple of the baseline.
disparity_reprojection_threshold (float) – Reprojection error threshold for occlusion masking (in pixels units). Default is 3.0.
alpha_threshold (float) – Alpha threshold to mask pixels where the Gaussian splat model is transparent (usually indicating the background). Default is 0.1.
image_downsample_factor (int) – Factor by which to downsample the rendered images for depth estimation. Default is 1, i.e. no downsampling.
dtype (torch.dtype) – Data type for the TSDF grid values. Default is torch.float16.
feature_dtype (torch.dtype) – Data type for the color features. Default is torch.uint8.
dlnr_backbone (str) – Backbone to use for the DLNR model, either "middleburry" or "sceneflow". Default is "middleburry".
use_absolute_baseline (bool) – If True, treat the provided baseline as an absolute distance in world units. If False, treat the baseline as a fraction of the mean depth of each image estimated using the Gaussian splat radiance field. Default is False.
show_progress (bool) – Whether to show a progress bar during processing. Default is True.
num_workers (int) – Number of workers to use for loading data generated by DLNR. Default is 8.

Returns:

mesh_vertices (torch.Tensor) – A (V, 3)-shaped tensor of mesh vertices of the extracted mesh.
mesh_faces (torch.Tensor) – A (F, 3)-shaped tensor of faces of the extracted mesh.
mesh_colors (torch.Tensor) – A (V, D)-shaped tensor of colors of the extracted mesh vertices where D is the number of channels encoded by the Gaussian Splat model (usually 3 for RGB colors).

Extract a point cloud with colors/features from a Gaussian splat radiance field by unprojecting depth images rendered from it.

This algorithm can optionally filter out points near depth discontinuities using the following heurstic:

Apply a small Gaussian filter to the depth images to reduce noise.
Run a Canny edge detector on the depth immage to find depth discontinuities. The result is an image mask where pixels near depth edges are marked.
Dilate the edge mask to remove depth samples near edges.
Remove points from the point cloud where the corresponding depth pixel is marked in the dilated edge mask.

Parameters:

model (GaussianSplat3d) – The Gaussian splat radiance field to extract a mesh from.
camera_to_world_matrices (NumericMaxRank3) – A (C, 4, 4)-shaped Tensor containing the camera to world matrices to render depth images from for mesh extraction where C is the number of camera views.
projection_matrices (NumericMaxRank3) – A (C, 3, 3)-shaped Tensor containing the perspective projection matrices used to render images for mesh extraction where C is the number of camera views.
image_sizes (NumericMaxRank2) – A (C, 2)-shaped Tensor containing the height and width of each image to extract from the Gaussian splat where C is the number of camera views. i.e., image_sizes[c] = (height_c, width_c).
near (NumericMaxRank1) – Near plane distance below which to ignore depth samples. Can be a scalar to use a single value for all images or a tensor-like object of shape (C,) to use a different value for each image. Default is 0.1.
far (NumericMaxRank1) – Far plane distance above which to ignore depth samples. Can be a scalar to use a single value for all images or a tensor-like object of shape (C,) to use a different value for each image. Default is 1e10.
alpha_threshold (float) – Alpha threshold to mask pixels rendered by the Gaussian splat model that are transparent (usually indicating the background). Default is 0.1.
image_downsample_factor (int) – Factor by which to downsample the rendered images before extracting points This is useful to reduce the number of points extracted from the point cloud and speed up the extraction process. A value of 2 will downsample the rendered images by a factor of 2 in both dimensions, resulting in a point cloud with approximately 1/4 the number of points compared to the original rendered images.
canny_edge_std (float) – Standard deviation (in pixel units) for the Gaussian filter applied to the depth image before Canny edge detection. Set to 0.0 to disable canny edge filtering. Default is 1.0.
canny_mask_dilation (int) – Dilation size (in pixels) for the Canny edge mask. Default is 5.
dtype (torch.dtype) – Data type to store the point cloud positions. Default is torch.float16.
feature_dtype (torch.dtype) – Data type to store per-point colors/features. Default is torch.uint8 which is good for RGB colors.
show_progress (bool) – Whether to show a progress bar. Default is True.

Returns:

points (torch.Tensor) – A (num_points, 3)-shaped tensor of point positions in world space.
colors (torch.Tensor) – A (num_points, D)-shaped tensor of colors/features per point where D is the number of channels encoded by the Gaussian Splat model (usually 3 for RGB colors).

Extract a Truncated Signed Distance Field (TSDF) from a fvdb.GaussianSplat3d using TSDF fusion from depth maps rendered from the Gaussian splat model.

The algorithm proceeds in two steps:

First, it renders depth and color/feature images from the Gaussian splat radiance field at each of the specified camera views.
Second, it integrates the depths and colors/features into a sparse fvdb.Grid in a narrow band around the surface using sparse truncated signed distance field (TSDF) fusion. The result is a sparse voxel grid representation of the scene where each voxel stores a signed distance value and color (or other features).

Note

You can extract a mesh from the TSDF using the marching cubes algorithm implemented in fvdb.Grid.marching_cubes. If your goal is to extract a mesh from a Gaussian splat model, consider using fvdb_reality_capture.tools.mesh_from_splats_dlnr() which combines this function with marching cubes to directly extract a mesh.

Note

For higher quality TSDFs, consider using fvdb_reality_capture.tools.tsdf_from_splats_dlnr() which uses depth maps estimated using a deep learning-based depth estimation approach instead of the raw depth maps rendered from the Gaussian splat model.

Note

The TSDF fusion algorithm is a method for integrating multiple depth maps into a single volumetric representation of a scene encoded a truncated signed distance field (i.e. a signed distance field in a narrow band around the surface). TSDF fusion was first described in the paper “KinectFusion: Real-Time Dense Surface Mapping and Tracking”. We use a modified version of this algorithm which only allocates voxels in a narrow band around the surface of the model to reduce memory usage and speed up computation.

Parameters:

model (GaussianSplat3d) – The Gaussian splat radiance field to extract a mesh from.
camera_to_world_matrices (NumericMaxRank3) – A (C, 4, 4)-shaped Tensor containing the camera to world matrices to render depth images from for mesh extraction where C is the number of camera views.
projection_matrices (NumericMaxRank3) – A (C, 3, 3)-shaped Tensor containing the perspective projection matrices used to render images for mesh extraction where C is the number of camera views.
image_sizes (NumericMaxRank2) – A (C, 2)-shaped Tensor containing the height and width of each image to extract from the Gaussian splat where C is the number of camera views. i.e., image_sizes[c] = (height_c, width_c).
truncation_margin (float) – Margin for truncating the TSDF, in world units. This defines the half-width of the band around the surface where the TSDF is defined in world units.
grid_shell_thickness (float) – The number of voxels along each axis to include in the TSDF volume. This defines the resolution of the Grid around narrow band around the surface. Default is 3.0.
near (NumericMaxRank1) – Near plane distance below which to ignore depth samples. Can be a scalar to use a single value for all images or a tensor-like object of shape (C,) to use a different value for each image. Default is 0.1.
far (NumericMaxRank1) – Far plane distance above which to ignore depth samples. Can be a scalar to use a single value for all images or a tensor-like object of shape (C,) to use a different value for each image. Default is 1e10.
alpha_threshold (float) – Alpha threshold to mask pixels where the Gaussian splat model is transparent (usually indicating the background). Default is 0.1.
image_downsample_factor (int) – Factor by which to downsample the rendered images for depth estimation. Default is 1, i.e. no downsampling.
dtype (torch.dtype) – Data type for the TSDF grid values. Default is torch.float16.
feature_dtype (torch.dtype) – Data type for the color features. Default is torch.uint8.
show_progress (bool) – Whether to show a progress bar during processing. Default is True.

Returns:

accum_grid (Grid) – The accumulated fvdb.Grid representing the voxels in the TSDF volume.
tsdf (torch.Tensor) – The TSDF values for each voxel in the grid.
colors (torch.Tensor) – The colors/features for each voxel in the grid.

Extract a Truncated Signed Distance Field (TSDF) from a fvdb.GaussianSplat3d using TSDF fusion from depth maps predicted from the Gaussian splat radiance field and the DLNR foundation model. DLNR is a high-frequency stereo matching network that computes optical flow and disparity maps between two images, which can be used to compute depth.

This algorithm proceeds in two steps:

First, it renders stereo pairs of images from the Gaussian splat radiance field, and uses DLNR to compute depth maps from these stereo pairs in the frame of the first image in the pair. The result is a set of depth maps aligned with the rendered images.
Second, it integrates the depths and colors/features into a sparse fvdb.Grid in a narrow band around the surface using sparse truncated signed distance field (TSDF) fusion. The result is a sparse voxel grid representation of the scene where each voxel stores a signed distance value and color (or other features).

Note

You can extract a mesh from the TSDF using the marching cubes algorithm implemented in fvdb.Grid.marching_cubes. If your goal is to extract a mesh from a Gaussian splat model, consider using fvdb_reality_capture.tools.mesh_from_splats_dlnr() which combines this function with marching cubes to directly extract a mesh.

Note

This algorithm implemented is based on the paper “GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views”. We make key improvements to the method by using a more robust stereo baseline estimation method and by using a much more efficient sparse TSDF fusion implementation built on fVDB.

Note

The TSDF fusion algorithm is a method for integrating multiple depth maps into a single volumetric representation of a scene encoded a truncated signed distance field (i.e. a signed distance field in a narrow band around the surface). TSDF fusion was first described in the paper “KinectFusion: Real-Time Dense Surface Mapping and Tracking”. We use a modified version of this algorithm which only allocates voxels in a narrow band around the surface of the model to reduce memory usage and speed up computation.

Note

The DLNR model is a high-frequency stereo matching network that computes optical flow and disparity maps between two images. The DLNR model is described in the paper “High-Frequency Stereo Matching Network”.

Parameters:

model (GaussianSplat3d) – The Gaussian splat radiance field to extract a mesh from.
camera_to_world_matrices (NumericMaxRank3) – A (C, 4, 4)-shaped Tensor containing the camera to world matrices to render depth images from for mesh extraction where C is the number of camera views.
projection_matrices (NumericMaxRank3) – A (C, 3, 3)-shaped Tensor containing the perspective projection matrices used to render images for mesh extraction where C is the number of camera views.
image_sizes (NumericMaxRank2) – A (C, 2)-shaped Tensor containing the height and width of each image to extract from the Gaussian splat where C is the number of camera views. i.e., image_sizes[c] = (height_c, width_c).
truncation_margin (float) – Margin for truncating the TSDF, in world units. This defines the half-width of the band around the surface where the TSDF is defined in world units.
grid_shell_thickness (float) – The number of voxels along each axis to include in the TSDF volume. This defines the resolution of the Grid around narrow band around the surface. Default is 3.0.
baseline (float) – Baseline distance for stereo depth estimation. If use_absolute_baseline is False, this is interpreted as a fraction of the mean depth of each image. Otherwise, it is interpreted as an absolute distance in world units. Default is 0.07.
near (float) – Near plane distance below which to ignore depth samples, as a multiple of the baseline.
far (float) – Far plane distance above which to ignore depth samples, as a multiple of the baseline.
disparity_reprojection_threshold (float) – Reprojection error threshold for occlusion masking (in pixels units). Default is 3.0.
alpha_threshold (float) – Alpha threshold to mask pixels where the Gaussian splat model is transparent (usually indicating the background). Default is 0.1.
image_downsample_factor (int) – Factor by which to downsample the rendered images for depth estimation. Default is 1, i.e. no downsampling.
dtype (torch.dtype) – Data type for the TSDF grid values. Default is torch.float16.
feature_dtype (torch.dtype) – Data type for the color features. Default is torch.uint8.
dlnr_backbone (str) – Backbone to use for the DLNR model, either "middleburry" or "sceneflow". Default is "middleburry".
use_absolute_baseline (bool) – If True, treat the provided baseline as an absolute distance in world units. If False, treat the baseline as a fraction of the mean depth of each image estimated using the Gaussian splat radiance field. Default is False.
show_progress (bool) – Whether to show a progress bar during processing. Default is True.
num_workers (int) – Number of workers to use for loading data generated by DLNR. Default is 8.

Returns:

accum_grid (Grid) – The accumulated fvdb.Grid representing the voxels in the TSDF volume.
tsdf (torch.Tensor) – The TSDF values for each voxel in the grid.
colors (torch.Tensor) – The colors/features for each voxel in the grid.

fvdb_reality_capture.tools

`fvdb_reality_capture.tools`