fvdb_reality_capture.transforms

class fvdb_reality_capture.transforms.base_transform.BaseTransform[source]

Base class for all transforms.

Transforms are used to modify an SfmScene before it is used for reconstruction or other processing. They can be used to filter images, adjust camera parameters, or perform other modifications to the scene.

Subclasses of BaseTransform must implement the following methods:

abstractmethod __call__(input_scene: SfmScene) SfmScene[source]

Abstract method to apply the transform to the input scene and return the transformed scene.

Parameters:

input_scene (SfmScene) – The input scene to transform.

Returns:

output_scene (SfmScene) – The transformed scene.

abstractmethod static from_state_dict(state_dict: dict[str, Any]) BaseTransform[source]

Abstract method to create a transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

Returns:

transform (BaseTransform) – An instance of the transform.

abstractmethod static name() str[source]

Abstract method to return the name of the transform.

Returns:

str – The name of the transform.

abstractmethod state_dict() dict[str, Any][source]

Abstract method to return a dictionary containing information to serialize/deserialize the transform.

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

class fvdb_reality_capture.transforms.Compose(*transforms)[source]

A BaseTransform that composes multiple transforms together in sequence. This is useful for encoding a sequence of transforms into a single object.

The transforms are applied in the order they are provided, allowing for complex data processing pipelines.

Example usage:

# Example usage:
from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene

scene_transform = transforms.Compose(
    transforms.NormalizeScene("pca"),
    transforms.DownsampleImages(4),
)
input_scene: SfmScene = ...  # Load or create an SfmScene
transformed_scene: SfmScene = scene_transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene which is the result of applying the composed transforms sequentially to the input scene.

Parameters:

input_scene (SfmScene) – The input SfmScene to transform.

Returns:

output_scene (SfmScene) – A new SfmScene that has been transformed by all the composed transforms.

__init__(*transforms)[source]

Initialize the Compose transform with a sequence of transforms.

Parameters:

*transforms (tuple[BaseTransform...]) – A tuple of BaseTransform instances to compose.

static from_state_dict(state_dict: dict[str, Any]) Compose[source]

Create a Compose transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

Returns:

transform (Compose) – An instance of the Compose transform loaded from the state dictionary.

static name() str[source]

Return the name of the Compose transform. i.e. "Compose".

Returns:

str – The name of the Compose transform. i.e. "Compose".

state_dict() dict[str, Any][source]

Return the state of the Compose transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'
class fvdb_reality_capture.transforms.CropScene(bbox: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size, mask_format: Literal['png', 'jpg', 'npy'] = 'png', composite_with_existing_masks: bool = True)[source]

A BaseTransform which crops the input SfmScene points to lie within a specified bounding box. This transform additionally and updates the scene’s masks to nullify pixels whose rays do not intersect the bounding box.

Note

If the input scene already has masks, these new masks will be composited with the existing masks to ensure that pixels outside the cropped region are properly masked. This can be disabled by setting composite_with_existing_masks to False.

Example usage:

# Example usage:
from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene
import numpy as np

# Bounding box in the format (min_x, min_y, min_z, max_x, max_y, max_z)
scene_transform = transforms.CropScene(bbox=np.array([-1.0, -1.0, -1.0, 1.0, 1.0, 1.0]))

input_scene: SfmScene = ...  # Load or create an SfmScene

# The transformed scene will have points only within the bounding box, and posed images will have
# masks updated to nullify pixels corresponding to regions outside the cropped scene.
transformed_scene: SfmScene = scene_transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene with points cropped to lie within the bounding box specified at initialization, and with masks updated to nullify pixels whose rays do not intersect the bounding box.

Parameters:

input_scene (SfmScene) – The scene to be cropped.

Returns:

output_scene (SfmScene) – The cropped scene.

__init__(bbox: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size, mask_format: Literal['png', 'jpg', 'npy'] = 'png', composite_with_existing_masks: bool = True)[source]

Create a new CropScene transform with a bounding box.

Parameters:
  • bbox (NumericMaxRank1) – A bounding box in the format (min_x, min_y, min_z, max_x, max_y, max_z).

  • mask_format (Literal["png", "jpg", "npy"]) – The format to save the masks in. Defaults to β€œpng”.

  • composite_with_existing_masks (bool) – Whether to composite the masks generated into existing masks for pixels corresponding to regions outside the cropped scene. If set to True, existing masks will be loaded and composited with the new mask. Defaults to True. The resulting composited mask will allow a pixel to be valid if it is valid in both the existing and new mask.

static from_state_dict(state_dict: dict) CropScene[source]

Create a CropScene transform from a state dictionary created with state_dict().

Parameters:

state_dict (dict) – The state dictionary for the transform.

Returns:

transform (CropScene) – An instance of the CropScene transform.

static name() str[source]

Return the name of the CropScene transform. i.e. "CropScene".

Returns:

str – The name of the CropScene transform. i.e. "CropScene".

state_dict() dict[source]

Return the state of the CropScene transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'
class fvdb_reality_capture.transforms.CropSceneToPoints(margin: float = 0.0, mask_format: Literal['png', 'jpg', 'npy'] = 'png', composite_with_existing_masks: bool = True)[source]

A BaseTransform which crops the input SfmScene points to lie within the bounding box around its points plus or minus a padding margin. This transform additionally and updates the scene’s masks to nullify pixels whose rays do not intersect the bounding box.

Note

If the input scene already has masks, these new masks will be composited with the existing masks to ensure that pixels outside the cropped region are properly masked. This can be disabled by setting composite_with_existing_masks to False.

Note

You may want to use this over CropScene if you want the bounding box to depend on the input scene points rather than being fixed (e.g. if you don’t know the bounding box ahead of time). This transform is also useful if you just want to apply conservative masking to the input scene based on its points.

Note

The margin is specified as a fraction of the bounding box size. For example, a margin of 0.1 will expand the bounding box by 10% (5% in all directions). So if the scene’s bounding box is (0, 0, 0) to (1, 1, 1), a margin of 0.1 will result in a bounding box of (-0.05, -0.05, -0.05) to (1.05, 1.05, 1.05). The margin can also be negative to shrink the bounding box.

Example usage:

# Example usage:
from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene
import numpy as np

# Crop the scene to be 0.1 times smaller than the bounding box around its points
# (i.e. a margin of -0.1)
scene_transform = transforms.CropSceneToPoints(margin=-0.1)

input_scene: SfmScene = ...  # Load or create an SfmScene

# The transformed scene will have points only within the bounding box of its points
# minus a factor of 0.1 times the size. (i.e. a margin of -0.1).
# Posed images will have masks updated to nullify pixels corresponding to regions outside the cropped scene.
transformed_scene: SfmScene = scene_transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene with points cropped to lie within the bounding box of the input scene’s points plus or minus the margin specified at initialization, and with masks updated to nullify pixels whose rays do not intersect the bounding box.

Parameters:

input_scene (SfmScene) – The scene to be cropped.

Returns:

output_scene (SfmScene) – The cropped scene.

__init__(margin: float = 0.0, mask_format: Literal['png', 'jpg', 'npy'] = 'png', composite_with_existing_masks: bool = True)[source]

Create a new CropSceneToPoints transform with the given margin.

Parameters:
  • margin (float) – The margin factor to apply around the bounding box of the points. Can be negative to shrink the bounding box. This is a fraction of the bounding box size. For example, a margin of 0.1 will expand the bounding box by 10% (5% in all directions), while a margin of -0.1 will shrink the bounding box by 10% (-5% in all directions). Defaults to 0.0.

  • mask_format (Literal["png", "jpg", "npy"]) – The format to save the masks in. Defaults to β€œpng”.

  • composite_with_existing_masks (bool) – Whether to composite the masks generated into existing masks for pixels corresponding to regions outside the cropped scene. If set to True, existing masks will be loaded and composited with the new mask. Defaults to True.

static from_state_dict(state_dict: dict) CropSceneToPoints[source]

Create a CropSceneToPoints transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

Returns:

transform (CropSceneToPoints) – An instance of the CropSceneToPoints transform loaded from the state dictionary.

static name() str[source]

Return the name of the CropSceneToPoints transform. i.e. "CropSceneToPoints".

Returns:

str – The name of the CropSceneToPoints transform. i.e. "CropSceneToPoints".

state_dict() dict[source]

Return the state of the CropSceneToPoints transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'
class fvdb_reality_capture.transforms.DownsampleImages(image_downsample_factor: int, image_type: Literal['jpg', 'png'] = 'jpg', rescale_sampling_mode: int = 3, rescaled_jpeg_quality: int = 98)[source]

A BaseTransform which downsamples all images in an SfmScene by a specified factor and caches the downsampled images for future use.

You can specify the cached downsampled image type (e.g., "jpg" or "png"), the mode for downsampling (e.g., cv2.INTER_AREA), and the rescaled JPEG quality (if using JPEG).

If the downsampled images already exist in the scene’s cache with the correct parameters, they will be loaded from the cache instead of being regenerated.

Example usage:

# Example usage:
from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene

scene_transform = transforms.DownsampleImages(4)
input_scene: SfmScene = ...  # Load or create an SfmScene

# The returned scene will have paths pointing to downsampled images by a factor of 4.
transformed_scene: SfmScene = scene_transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene with images downsampled by the specified factor. i.e. images will be resized to (width / image_downsample_factor, height / image_downsample_factor).

Parameters:

input_scene (SfmScene) – The input scene with images to be downsampled.

Returns:

output_scene (SfmScene) – The scene with downsampled images.

__init__(image_downsample_factor: int, image_type: Literal['jpg', 'png'] = 'jpg', rescale_sampling_mode: int = 3, rescaled_jpeg_quality: int = 98)[source]

Create a new DownsampleImages transform with the specified downsampling factor and image caching parameters (image type, downsampling mode, and quality).

Note

We use enums from OpenCV for the rescale_sampling_mode parameter, e.g., cv2.INTER_AREA, cv2.INTER_LINEAR, cv2.INTER_CUBIC, etc. This means if you want to change the resampling mode, you will need to import cv2` and pass in the appropriate enum value. See the OpenCV documentation <https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121> for more details on valid enum values.

Parameters:
  • image_downsample_factor (int) – The factor by which to downsample the images.

  • image_type (str) – The type of the cached downsampled images, either β€œjpg” or β€œpng”.

  • rescale_sampling_mode (int) –

    The interpolation method to use for rescaling images. Note that we use enums from OpenCV for this parameter, e.g., cv2.INTER_AREA, cv2.INTER_LINEAR, cv2.INTER_CUBIC, etc.

  • rescaled_jpeg_quality (int) – The quality of the JPEG images when saving them to the cache (1-100).

static from_state_dict(state_dict: dict[str, Any]) DownsampleImages[source]

Create a DownsampleImages transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict) – The state dictionary for the transform.

Returns:

transform (DownsampleImages) – An instance of the DownsampleImages transform.

static name() str[source]

Return the name of the DownsampleImages transform. i.e. "DownsampleImages".

Returns:

str – The name of the DownsampleImages transform. i.e. "DownsampleImages".

state_dict() dict[str, Any][source]

Return the state of the DownsampleImages transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'
class fvdb_reality_capture.transforms.FilterImagesWithLowPoints(min_num_points: int = 0)[source]

A BaseTransform which filters out posed images from an SfmScene that have fewer than a specified minimum number of visible points.

Any images that have a number of visible points less than or equal to min_num_points will be removed from the scene.

Note

If the input SfmScene does not have point indices for its posed images (i.e. it has has_visible_point_indices set to False), then this transform is a no-op.

Example usage:

# Example usage:
from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene

# Create a transform to filter out images with 50 or fewer visible points.
scene_transform = transforms.FilterImagesWithLowPoints(min_num_points=50)

input_scene: SfmScene = ...  # Load or create an SfmScene

# The transformed scene will only contain posed images with more than 50 visible points.
transformed_scene: SfmScene = scene_transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene containing only posed images which have more than min_num_points visible points.

Note

If the input SfmScene does not have point indices for its posed images (i.e. fvdb_reality_capture.sfm_scene.SfmScene.has_visible_point_indices is False), then this transform is a no-op.

Parameters:

input_scene (SfmScene) – The input scene.

Returns:

output_scene (SfmScene) – A new SfmScene containing only posed images which have more than min_num_points visible points. If the input scene does not have point indices for its posed images, the input scene is returned unmodified.

__init__(min_num_points: int = 0)[source]

Create a new FilterImagesWithLowPoints transform which removes posed images from the scene which have fewer than or equal to min_num_points visible points.

Parameters:

min_num_points (int) – The minimum number of visible points required to keep a posed image in the scene. Posed images with fewer or equal visible points will be removed.

static from_state_dict(state_dict: dict[str, Any]) FilterImagesWithLowPoints[source]

Create a FilterImagesWithLowPoints transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict) – The state dictionary for the transform.

Returns:

transform (FilterImagesWithLowPoints) – An instance of the FilterImagesWithLowPoints transform.

property min_num_points: int

Get the minimum number of points required to keep a posed image in the scene when applying this transform.

Returns:

min_num_points (int) – The minimum number of points required to keep a posed image in the scene when applying this transform.

static name() str[source]

Return the name of the FilterImagesWithLowPoints transform. i.e. "FilterImagesWithLowPoints".

Returns:

str – The name of the FilterImagesWithLowPoints transform. i.e. "FilterImagesWithLowPoints".

state_dict() dict[str, Any][source]

Return the state of the FilterImagesWithLowPoints transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'
class fvdb_reality_capture.transforms.NormalizeScene(normalization_type: Literal['pca', 'none', 'ecef2enu', 'similarity'])[source]

A BaseTransform which normalizes an SfmScene using a variety of approaches. This transform applies a rotation/translation/scaling to the entire scene, including both points and camera poses.

The normalization types available are:

  • "pca": Normalizes by centering the scene about its median point, and rotating the point cloud to align with its principal axes.

  • "ecef2enu": Converts a scene whose points and camera poses are in Earth-Centered, Earth-Fixed (ECEF) coordinates to East-North-Up (ENU) coordinates, centering the scene around the median point.

  • "similarity": Rotate the scene so that +z aligns with the average up vector of the cameras, center the scene around the median camera position, and rescale the scene to fit within a unit cube.

  • "none": Do not apply any normalization to the scene. Effectively a no-op.

Example usage:

from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene

# Create a NormalizeScene transform to normalize the scene using PCA
transform = transforms.NormalizeScene(normalization_type="pca")

# Apply the transform to an SfmScene
input_scene: SfmScene = ...
output_scene: SfmScene = transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene which is the result of applying the normalization transform to the input scene.

The normalization transform is computed based on the specified normalization type and the contents of the input scene. It is applied to both the points and camera poses in the scene.

Parameters:

input_scene (SfmScene) – Input SfmScene object containing camera and point data

Returns:

output_scene (SfmScene) – A new SfmScene after applying the normalization transform.

__init__(normalization_type: Literal['pca', 'none', 'ecef2enu', 'similarity'])[source]

Create a new NormalizeScene transform which normalizes an SfmScene using the specified normalization type.

Normalization is applied to both the points and camera poses in the scene.

Parameters:

normalization_type (str) – The type of normalization to apply. Options are "pca", "none", "ecef2enu", or "similarity".

static from_state_dict(state_dict: dict[str, Any]) NormalizeScene[source]

Create a NormalizeScene transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict) – The state dictionary for the transform.

Returns:

transform (NormalizeScene) – An instance of the NormalizeScene transform.

static name() str[source]

Return the name of the NormalizeScene transform. i.e. "NormalizeScene".

Returns:

str – The name of the NormalizeScene transform. i.e. "NormalizeScene".

state_dict() dict[str, Any][source]

Return the state of the NormalizeScene transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

valid_normalization_types = ['pca', 'ecef2enu', 'similarity', 'none']
version = '1.0.0'
class fvdb_reality_capture.transforms.PercentileFilterPoints(percentile_min: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size, percentile_max: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size)[source]

A BaseTransform that filters points in an SfmScene based on percentile bounds for x, y, and z coordinates.

When applied to an input scene, this transform returns a new SfmScene with points that fall within the specified percentile bounds of the input scene’s points along each axis.

e.g. If percentile_min is (0, 0, 0) and percentile_max is (100, 100, 100), all points will be included in the output scene.

e.g. If percentile_min is (10, 20, 30) and percentile_max is (90, 80, 70), only points with x-coordinates in the 10th to 90th percentile, y-coordinates in the 20th to 80th percentile, and z-coordinates in the 30th to 70th percentile will be included in the output scene.

Example usage:

from fvdb_reality_capture.transforms import PercentileFilterPoints
from fvdb_reality_capture.sfm_scene import SfmScene

# Create a PercentileFilterPoints transform to filter points between the 10th and 90th percentiles
transform = PercentileFilterPoints(percentile_min=(10, 10, 10), percentile_max=(90, 90, 90))

# Apply the transform to an SfmScene
input_scene: SfmScene = ...
output_scene: SfmScene = transform(input_scene)
__call__(input_scene: SfmScene) SfmScene[source]

Return a new SfmScene with points filtered based on the specified percentile bounds.

Parameters:

input_scene (SfmScene) – The input SfmScene containing points to be filtered.

Returns:

output_scene (SfmScene) – A new SfmScene with points filtered based on the specified percentile bounds.

__init__(percentile_min: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size, percentile_max: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size)[source]

Create a new PercentileFilterPoints transform which filters points in an SfmScene based on percentile bounds for x, y, and z coordinates.

Parameters:
  • percentile_min (NumericMaxRank1) – Tuple of minimum percentiles (from 0 to 100) for x, y, z coordinates or None to use (0, 0, 0) (default: None)

  • percentile_max (NumericMaxRank1) – Tuple of maximum percentiles (from 0 to 100) for x, y, z coordinates or None to use (100, 100, 100) (default: None)

static from_state_dict(state_dict: dict[str, Any]) PercentileFilterPoints[source]

Create a PercentileFilterPoints transform from a state dictionary generated with state_dict().

Parameters:

state_dict (dict) – The state dictionary for the transform.

Returns:

transform (PercentileFilterPoints) – An instance of the PercentileFilterPoints transform.

static name() str[source]

Return the name of the PercentileFilterPoints transform. i.e. "PercentileFilterPoints".

Returns:

str – The name of the PercentileFilterPoints transform. i.e. "PercentileFilterPoints".

state_dict() dict[str, Any][source]

Return the state of the PercentileFilterPoints transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'
class fvdb_reality_capture.transforms.Identity[source]

A BaseTransform that performs the identity transform on an SfmScene. This transform returns the input scene unchanged. It can be useful as a placeholder or default transform in a processing pipeline.

Example usage:

# Example usage:
from fvdb_reality_capture import transforms
from fvdb_reality_capture.sfm_scene import SfmScene

# Use Identity as a default parameter value
def append_normalize(transform: transforms.BaseTransform = transforms.Identity()):
    return transforms.Compose(
        transform,
        transforms.NormalizeScene("pca"),
    )

# Use Identity to return a no-op for later use
def get_transform(condition: bool) -> transforms.BaseTransform:
    if condition:
        return transforms.DownsampleImages(2)
    else:
        # Still return a valid transform that is a no-op
        return transforms.Identity()

get_transform(False)  # Returns an Identity transform
get_transform(True)   # Returns a DownsampleImages transform
__call__(input_scene: SfmScene) SfmScene[source]

Return the input SfmScene unchanged.

Parameters:

input_scene (SfmScene) – The input scene.

Returns:

output_scene (SfmScene) – The input scene, unchanged.

__init__()[source]

Create a new Identity transform representing a No-Op.

static from_state_dict(state_dict: dict[str, Any]) Identity[source]

Create a Identity transform from a state dictionary created with state_dict().

Parameters:

state_dict (dict) – The state dictionary for the transform.

Returns:

transform (Identity) – An instance of the Identity transform.

static name() str[source]

Return the name of the Identity transform. i.e. "Identity".

Returns:

str – The name of the Identity transform. i.e. "Identity".

state_dict() dict[str, Any][source]

Return the state of the Identity transform for serialization.

You can use this state dictionary to recreate the transform using from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing information to serialize/deserialize the transform.

version = '1.0.0'