fvdb_reality_capture.sfm_scene

class fvdb_reality_capture.sfm_scene.SfmScene(cameras: dict[int, SfmCameraMetadata], images: Sequence[SfmPosedImageMetadata], points: ndarray, points_err: ndarray, points_rgb: ndarray, scene_bbox: ndarray | None, transformation_matrix: ndarray | None, cache: SfmCache)[source]

Lightweight in-memory representation of a scene extracted from a structure-from-motion (SFM) pipeline such as COLMAP or GLOMAP.

This class does not load large data, but instead stores metadata about the scene, including camera parameters, image paths, and 3D points. It also provides methods to manipulate and transform the scene, such as filtering points or images, applying transformations, and accessing camera and image properties.

Note

The SfmScene class is immutable. Methods that modify the scene (e.g. filtering points or images, applying transformations) return new instances rather than modifying the existing one.

Note

The SfmScene class does not load or store the actual image data. It only stores paths to the images on disk.

In general, an SfmScene consists of the following components:

Cameras (cameras): A dictionary mapping unique integer camera identifiers to SfmCameraMetadata objects which contain information about each camera used to capture the scene (e.g. focal length, distortion parameters).

Posed Images (images): A list of SfmImageMetadata objects containing metadata for each posed image in the scene (e.g. the ID of the camera that captured it, the path to the image on disk, the camera to world matrix, etc.).

Points (points): An Nx3 array of 3D points in the scene, where N is the number of points. Point Errors (points_err): An array of shape (N,) representing the error or uncertainty of each point where N is the number of points.

Point RGB Colors (points_rgb): An Nx3 uint8 array of RGB color values for each point in the scene, where N is the number of points.

Scene Bounding Box (scene_bbox): An array of shape (6,) representing a bounding box containing the scene. In the form (bbmin_x, bbmin_y, bbmin_z, bbmax_x, bbmax_y, bbmax_z).

Transformation Matrix (transformation_matrix): A 4x4 matrix encoding a transformation from some canonical coordinate space to scene coordinates.

Cache (cache): An SfmCache object representing a cache folder for storing quantities derived from the scene (e.g. depth maps, feature matches, etc.).

__init__(cameras: dict[int, SfmCameraMetadata], images: Sequence[SfmPosedImageMetadata], points: ndarray, points_err: ndarray, points_rgb: ndarray, scene_bbox: ndarray | None, transformation_matrix: ndarray | None, cache: SfmCache)[source]

Initialize an SfmScene instance from the given components.

Parameters:
  • cameras (dict[int, SfmCameraMetadata]) – A dictionary mapping camera IDs to SfmCameraMetadata objects containing information about each camera used to capture the scene (e.g. focal length, distortion parameters, etc.).

  • images (Sequence[SfmImageMetadata]) – A sequence of SfmImageMetadata objects containing metadata for each image in the scene (e.g. camera ID, image path, view transform, etc.).

  • points (np.ndarray) – An (N, 3)-shaped array of 3D points in the scene, where N is the number of points.

  • points_err (np.ndarray) – An array of shape (N,) representing the error or uncertainty of each point in points.

  • points_rgb (np.ndarray) – An (N,3)-shaped uint8 array of RGB color values for each point in the scene, where N is the number of points.

  • scene_bbox (np.ndarray | None) – A (6,)-shaped array of the form [bmin_x, bmin_y, bmin_z, bmax_x, bmax_y, bmax_z] defining the bounding box of the scene. If None is passed in, it will default to [-inf, -inf, -inf, inf, inf, inf] (i.e. all of \(\mathbb{R}^3\))

  • transformation_matrix (np.ndarray | None) – A 4x4 transformation matrix encoding the transformation from a reference coordinate system to the scene’s coordinate system. Note that this is not applied to the scene but simply stored to track transformations applied to the scene (e.g. via apply_transformation_matrix()). If None is passed in, it will default to the identity matrix.

apply_transformation_matrix(transformation_matrix: ndarray) SfmScene[source]

Return a new SfmScene instance with the transformation applied.

The transformation applies to the camera poses and the 3D points in the scene.

Parameters:

transformation_matrix (np.ndarray) – A 4x4 transformation matrix to apply to the scene.

Returns:

SfmScene – A new SfmScene instance with the transformed cameras and points.

property cache: SfmCache

Return an SfmCache object representing a cache folder for storing quantities derived from the scene.

Returns:

cache (SfmCache) – The SfmCache object associated with this scene.

property camera_to_world_matrices: ndarray

Return the camera-to-world matrices for each posed image in the scene.

Returns:

camera_to_world_matrices (np.ndarray) – An (I, 4, 4)-shaped array representing the camera-to-world transformation matrix of each posed image in the scene, where I is the number of images.

property cameras: dict[int, SfmCameraMetadata]

Return a dictionary mapping unique (integer) camera identifiers to SfmCameraMetadata objects which contain information about each camera used to capture the scene (e.g. its focal length, projection matrix, etc.).

Returns:

dict[int, SfmCameraMetadata] – A dictionary mapping camera IDs to SfmCameraMetadata objects.

filter_images(mask: ndarray | Sequence[bool]) SfmScene[source]

Return a new SfmScene instance containing only the images for which the mask is True.

Parameters:

mask (np.ndarray | Sequence[bool]) – A Boolean array of shape (I,) where I is the number of images. True values indicate that the corresponding image should be kept.

Returns:

SfmScene – A new SfmScene instance with filtered images and corresponding metadata.

filter_points(mask: ndarray | Sequence[bool]) SfmScene[source]

Return a new SfmScene instance containing only the points for which the mask is True.

Parameters:

mask (np.ndarray | Sequence[bool]) – A boolean array of shape (N,) where N is the number of points. True values indicate that the corresponding point should be kept.

Returns:

SfmScene – A new SfmScene instance with filtered points and corresponding metadata.

classmethod from_colmap(colmap_path: str | Path) SfmScene[source]

Load an SfmScene (with a cache to store derived quantities) from the output of a COLMAP structure-from-motion (SfM) pipeline. COLMAP produces a directory of images, a set of 3D correspondence points, as well as a lightweight SqLite database containing image poses (camera to world matrices), camera intrinsics (projection matrices, camera type, etc.), and indices of which points are seen from which images.

These are loaded into memory as an SfmScene object which provides easy access to the camera parameters, image paths, and 3D points, as well as methods to manipulate and transform the scene.

Parameters:

colmap_path (str | pathlib.Path) – The path to the output of a COLMAP run.

Returns:

scene (SfmScene) – An in-memory representation of the loaded scene.

classmethod from_e57(e57_path: str | Path, point_downsample_factor: int = 1) SfmScene[source]

Load an SfmScene (with a cache to store derived quantities) from a set of E57 files.

Parameters:
  • e57_path (str | pathlib.Path) – The path to a directory containing E57 files.

  • point_downsample_factor (int) – Factor by which to downsample the points loaded from the E57 files. Defaults to 1 (i.e. no downsampling).

Returns:

scene (SfmScene) – An in-memory representation of the loaded scene.

classmethod from_simple_directory(data_path: str | Path) SfmScene[source]

Load an SfmScene (with a cache to store derived quantities) from a simple directory structure containing images, camera parameters (stored as JSON), and 3D points (stored as a PLY).

The directory should contain:

  • images/: A directory of images.

  • cameras.json: A JSON file containing camera parameters. The cameras.json file is a list of dictionaries, each containing the following keys:

    • "camera_name": The name of the image file.

    • "width": The width of the image.

    • "height": The height of the image.

    • "camera_intrinsics": The perspective projection matrix

    • "world_to_camera": The world-to-camera transformation matrix.

    • "image_path": The path to the image file relative to the images directory.

  • points.ply: A PLY file containing 3D points.

Parameters:

data_path (str | pathlib.Path) – The path to the data directory.

Returns:

scene (SfmScene) – An in-memory representation of the loaded scene.

classmethod from_state_dict(state_dict: dict[str, Any]) SfmScene[source]

Create an SfmScene from a state dictionary previously obtained via state_dict().

Parameters:

state_dict (dict[str, Any]) – A state dictionary representing the SfmScene originally created with state_dict().

Returns:

scene (SfmScene) – An in-memory representation of the loaded scene.

property has_visible_point_indices: bool

Return whether the images in the scene have point indices indicating which 3D points are visible in each image.

Returns:

has_visible_point_indices (bool) – True if the images have point indices, False otherwise.

property image_camera_positions: ndarray

Returns the position where each posed image was captured in the scene (i.e. the position of the camera when it captured the image).

Returns:

image_camera_positions (np.ndarray) – A (I, 3)-shaped array representing the 3D positions of the camera positions that captured each posed image in the scene, where I is the number of images.

property image_sizes: ndarray

Return the resolution of each posed image in the scene as a numpy array of shape (N, 2) where N is the number of images and each entry is (height, width).

Returns:

image_sizes (np.ndarray) – A (I, 2)-shaped array representing the resolution of each posed image in the scene, where I is the number of images, image_sizes[i, 0] is the height of image i, and image_sizes[i, 1] is the width of image i.

property images: list[SfmPosedImageMetadata]

Get a list of image metadata objects (SfmImageMetadata) with information about each image in the scene (e.g. it’s camera ID, path on the filesystem, etc.).

Returns:

list[SfmImageMetadata] – A list of SfmImageMetadata objects containing metadata for each image in the scene.

property median_depth_per_image: ndarray

Return an array containing the median depth of the points observed in each image.

Returns:

median_depth_per_image (np.ndarray) – An array of shape (I,) where I is the number of images. Each value represents the median depth of the points observed in the corresponding image. If this scene does not have visible points per image (i.e has_visible_point_indices is False), an array of np.nan values is returned.

property num_cameras: int

Return the total number of cameras used to capture the scene.

Returns:

num_cameras (int) – The number of cameras in the scene.

property num_images: int

Return the total number of posed images in the scene.

Returns:

num_images (int) – The number of posed images in the scene.

property points: ndarray

Get the 3D points in the scene as a numpy array of shape (N, 3).

Note: The points are in the same coordinate system as the camera poses.

Returns:

points (np.ndarray) – An (N, 3)-shaped array of 3D points in the scene where N is the number of points.

property points_err: ndarray

Return an un-normalized confidence value for each point in the scene (see points).

The error is a measure of the uncertainty in the 3D point position, typically derived from the SFM pipeline.

Returns:

points_err (np.ndarray) – An array of shape (N,) where N is the number of points in the scene. points_err[i] encodes the error or uncertainty of the i-th corresponding point in points.

property points_rgb: ndarray

Return the RGB color values for each point in the scene as a uint8 array of shape (N, 3) where N is the number of points.

Returns:

points_rgb (np.ndarray) – An (N, 3)-shaped uint8 array of RGB color values for each point in the scene where N is the number of points.

property projection_matrices: ndarray

Return the projection matrices for each posed image in the scene.

Returns:

projection_matrices (np.ndarray) – An (I, 3, 3)-shaped array representing the projection matrix of each posed image in the scene, where I is the number of images. The projection matrix maps 3D points in camera coordinates to 2D points in pixel coordinates.

property scene_bbox: ndarray

Return the clip bounds of the scene as a numpy array of shape (6,) in the form `[xmin, ymin, zmin, xmax, ymax, zmax].

If the scene was not constructed with a bounding box, the default clip bounds are [-inf, -inf, -inf, inf, inf, inf].

Returns:

scene_bbox (np.ndarray) – A 1D array of shape (6,) representing the bounding box of the scene. If the scene was not constructed with a bounding box, then return [-inf, -inf, -inf, inf, inf, inf].

select_images(indices: ndarray | Sequence[int]) SfmScene[source]

Return a new SfmScene instance containing only the images specified by the given indices.

Parameters:

indices (np.ndarray | Sequence[int]) – An array of integer indices specifying which images to select. The indices should be in the range [0, num_images - 1].

Returns:

SfmScene – A new SfmScene instance with the selected images and corresponding metadata.

state_dict() dict[str, Any][source]

Get a state dictionary representing the SfmScene. This can be used to serialize the scene to disk or to create a new SfmScene instance via from_state_dict().

Returns:

state_dict (dict[str, Any]) – A dictionary containing the state of the SfmScene.

property transformation_matrix: ndarray

Return the 4x4 transformation matrix for the scene. This matrix encodes the transformation from the coordinate system the scene was loaded in to the current scene’s coordinates.

Returns:

transformation_matrix (np.ndarray) – A 4x4 numpy array representing the transformation matrix from the original coordinate system to the current scene’s coordinate system.

property world_to_camera_matrices: ndarray

Return the world-to-camera matrices for each posed image in the scene.

Returns:

world_to_camera_matrices (np.ndarray) – An (I, 4, 4)-shaped array representing the world-to-camera transformation matrix of each posed image in the scene, where I is the number of images.

class fvdb_reality_capture.sfm_scene.SfmCameraMetadata(img_width: int, img_height: int, fx: float, fy: float, cx: float, cy: float, camera_type: SfmCameraType, distortion_parameters: ndarray)[source]

This class encodes metadata about a camera used to capture images in an SfmScene.

It contains information about the camera’s intrinsic parameters (focal length, principal point, etc.), the camera type (see SfmCameraType) (e.g., pinhole, radial distortion), and distortion parameters if applicable.

The camera metadata is used to project 3D points into 2D pixel coordinates and to undistort images captured by the camera.

__init__(img_width: int, img_height: int, fx: float, fy: float, cx: float, cy: float, camera_type: SfmCameraType, distortion_parameters: ndarray)[source]

Create a new SfmCameraMetadata object.

Parameters:
  • img_width (int) – The width of the camera image in pixel units (must be a positive integer).

  • img_height (int) – The height of the camera image in pixel units (must be a positive integer).

  • fx (float) – The focal length in the x direction in pixel units.

  • fy (float) – The focal length in the y direction in pixel units.

  • cx (float) – The x-coordinate of the principal point (optical center) in pixel units.

  • cy (float) – The y-coordinate of the principal point (optical center) in pixel units.

  • camera_type (SfmCameraType) – The type of camera used to capture the image (e.g., β€œPINHOLE”, β€œSIMPLE_PINHOLE”, etc.). See SfmCameraType for details.

  • distortion_parameters (np.ndarray) – An array of distortion coefficients corresponding to the camera type, or an empty array if no distortion is present.

property aspect: float

Return the aspect ratio of the camera image.

The aspect ratio is defined as the width divided by the height.

Returns:

aspect (float) – The aspect ratio of the camera image.

property camera_type: SfmCameraType

Return the type of camera used to capture the image.

Returns:

camera_type (SfmCameraType) – The camera type (e.g., β€œPINHOLE”, β€œSIMPLE_PINHOLE”, etc.). See SfmCameraType for details.

property cx: float

Return the x-coordinate of the principal point (optical center) in pixel units.

Returns:

cx (float) – The x-coordinate of the principal point in pixel units.

property cy: float

Return the y-coordinate of the principal point (optical center) in pixel units.

Returns:

cy (float) – The y-coordinate of the principal point in pixel units.

property distortion_parameters: ndarray

Return the distortion parameters of the camera.

The distortion parameters are used to correct lens distortion in the captured images.

Returns:

distortion_parameters (np.ndarray) – An array of distortion coefficients.

property fovx: float

Return the horizontal field of view in radians.

Returns:

fovx (float) – The horizontal field of view in radians.

property fovy: float

Return the vertical field of view in radians.

Returns:

fovy (float) – The vertical field of view in radians.

classmethod from_state_dict(state_dict: dict[str, Any]) SfmCameraMetadata[source]

Create a new SfmCameraMetadata object from a state dictionary originally created by state_dict().

Parameters:

state_dict (dict[str, Any]) – A dictionary containing the camera metadata.

Returns:

SfmCameraMetadata – A new SfmCameraMetadata object.

property fx: float

Return the focal length in the x direction in pixel units.

Returns:

fx (float) – The focal length in the x direction in pixel units.

property fy: float

Return the focal length in the y direction in pixel units.

Returns:

fy (float) – The focal length in the y direction in pixel units.

property height: int

Return the height of the camera image in pixel units.

Returns:

height (int) – The height of the camera image in pixels.

property projection_matrix: ndarray

Return the camera projection matrix.

The projection matrix is a 3x3 matrix that maps 3D points in camera coordinates to 2D points in pixel coordinates.

Returns:

projection_matrix (np.ndarray) – The camera projection matrix as a 3x3 numpy array.

resize(new_width, new_height) SfmCameraMetadata[source]

Return a new SfmCameraMetadata object with the camera parameters resized to the new image dimensions.

Parameters:
  • new_width (int) – The new width of the camera image (must be a positive integer)

  • new_height (int) – The new height of the camera image (must be a positive integer)

Returns:

SfmCameraMetadata – A new SfmCameraMetadata object with the resized camera parameters.

state_dict() dict[str, Any][source]

Return a state dictionary representing the camera metadata.

This dictionary can be used to serialize and deserialize the camera metadata.

Returns:

state_dict (dict[str, Any]) – A dictionary containing the camera metadata.

undistort_image(image: ndarray) ndarray[source]

Undistort an image using the camera’s distortion parameters.

Parameters:

image (np.ndarray) – The distorted image to undistort.

Returns:

undistorted_image (np.ndarray) – The undistorted image.

property undistort_map_x: ndarray | None

Return the undistortion map for the x-coordinates of the image. The undistortion map is used to remap the pixel coordinates in a distorted image to correct for lens distortion. If the camera does not have distortion parameters, this will be None.

Returns:

undistort_map_x (np.ndarray | None) – The undistortion map for the x-coordinates or None if no distortion parameters are present.

property undistort_map_y: ndarray | None

Return the undistortion map for the y-coordinates of the image. The undistortion map is used to remap the pixel coordinates in a distorted image to correct for lens distortion. If the camera does not have distortion parameters, this will be None. :Returns: undistort_map_y (np.ndarray | None) – The undistortion map for the y-coordinates or None if no distortion parameters are present.

property undistort_roi: tuple[int, int, int, int] | None

Return the region of interest (ROI) for undistorted images. The ROI is defined as a tuple of (x, y, width, height) that specifies the valid pixel range in an undistorted image. If the camera does not have distortion parameters, this will be None.

Returns:

undistort_roi (tuple[int, int, int, int] | None) – The ROI for undistorted images or None if no distortion parameters are present.

property width: int

Return the width of the camera image in pixel units.

Returns:

width (int) – The width of the camera image in pixels.

class fvdb_reality_capture.sfm_scene.SfmPosedImageMetadata(world_to_camera_matrix: ndarray, camera_to_world_matrix: ndarray, camera_metadata: SfmCameraMetadata, camera_id: int, image_path: str, mask_path: str, point_indices: ndarray | None, image_id: int)[source]

This class encodes metadata about a single posed image in an SfmScene.

It contains information about the camera pose (world-to-camera and camera-to-world matrices), a reference to the metadata for the camera that captured the image (see SfmCameraMetadata), and the image and (optionally) mask file paths.

__init__(world_to_camera_matrix: ndarray, camera_to_world_matrix: ndarray, camera_metadata: SfmCameraMetadata, camera_id: int, image_path: str, mask_path: str, point_indices: ndarray | None, image_id: int)[source]

Create a new SfmImageMetadata object.

Parameters:
  • world_to_camera_matrix (np.ndarray) – A 4x4 matrix representing the transformation from world coordinates to camera coordinates.

  • camera_to_world_matrix (np.ndarray) – A 4x4 matrix representing the transformation from camera coordinates to world coordinates.

  • camera_metadata (SfmCameraMetadata) – The metadata for the camera that captured this image.

  • camera_id (int) – The unique identifier for the camera that captured this image.

  • image_path (str) – The file path to the image on the filesystem.

  • mask_path (str) – The file path to the mask image on the filesystem (can be an empty string if no mask is available).

  • point_indices (np.ndarray | None) – An optional array of point indices that are visible in this image (can be None if not available).

  • image_id (int) – The unique identifier for the image.

property camera_id: int

Return the unique identifier for the camera that captured this image.

Returns:

camera_id (int) – The camera ID.

property camera_metadata: SfmCameraMetadata

Return metadata about the camera that captured this posed image (see SfmCameraMetadata).

The camera metadata contains information about the camera’s intrinsic parameters, such as focal length and distortion coefficients.

Returns:

SfmCameraMetadata – The camera metadata object.

property camera_to_world_matrix: ndarray

Return the camera-to-world transformation matrix for this posed image.

This matrix transforms points from camera coordinates to world coordinates.

Returns:

camera_to_world_matrix (np.ndarray) – The camera-to-world transformation matrix as a 4x4 numpy array.

classmethod from_state_dict(state_dict: dict[str, Any], camera_metadata: dict[int, SfmCameraMetadata]) SfmPosedImageMetadata[source]

Create a new SfmImageMetadata object from a state dictionary and camera metadata (see state_dict()).

Parameters:
  • state_dict (dict[str, Any]) – A dictionary containing the image metadata.

  • camera_metadata (dict[int, SfmCameraMetadata]) – A dictionary mapping camera IDs to SfmCameraMetadata objects.

Returns:

SfmImageMetadata – A new SfmImageMetadata object.

property image_id: int

Return the unique identifier for this image.

This ID is used to uniquely identify the image within the dataset.

Returns:

int – The image ID.

property image_path: str

Return the file path to color image for this posed image.

Returns:

image_path (str) – The path to the color image file for this posed image.

property image_size: tuple[int, int]

Return the resolution of the posed image in pixels as a tuple of the form (height, width)

Returns:

image_size (tuple[int, int]) – The image resolution as (height, width).

property lookat

Return the camera lookat vector.

The lookat vector is the direction the camera is pointing, which is the negative z-axis in the camera coordinate system.

Returns:

lookat (np.ndarray) – The camera lookat vector as a 3D numpy array.

property mask_path: str

Return the file path to the mask for this posed image.

The mask image is used to indicate which pixels in the image are valid (e.g., not occluded).

An empty string indicates that no mask is available.

Returns:

mask_path (str) – The path to the posed mask image file.

property origin

Return the origin of the posed image. i.e. the position of the camera in world coordinates when it captured the image.

The origin is the position of the camera in world coordinates, which is the translation part of the camera-to-world matrix.

Returns:

origin (np.ndarray) – The camera origin as a 3D numpy array.

property point_indices: ndarray | None

Return the indices of the 3D points that are visible in this posed image or None if the indices are not available.

These indices correspond to the points in the SfmScene’s point cloud that are visible in this posed image.

Returns:

point_indices (np.ndarray | None) – An array of indices of the visible 3D points or None if not available.

property right

Return the camera right vector.

The right vector is the direction that is considered β€œright” in the camera coordinate system, which is the x-axis in the camera coordinate system.

Returns:

right (np.ndarray) – The camera right vector as a 3D numpy array.

state_dict() dict[str, Any][source]

Return a state dictionary representing the image metadata.

This dictionary can be used to serialize and deserialize the image metadata.

Returns:

state_dict (dict[str, Any]) – A dictionary containing the image metadata.

transform(transformation_matrix: ndarray) SfmPosedImageMetadata[source]

Return a new SfmImageMetadata object with the camera pose transformed by the given transformation matrix.

This transformation applies to the left of the camera to world transformation matrix, meaning it transforms the camera in world space.

i.e. new_camera_to_world_matrix = transformation_matrix @ self.camera_to_world_matrix :param transformation_matrix: A 4x4 transformation matrix to apply. :type transformation_matrix: np.ndarray

Returns:

SfmImageMetadata – A new SfmImageMetadata object with the transformed matrices.

property up

Return the camera up vector.

The up vector is the direction that is considered β€œup” in the camera coordinate system, which is the negative y-axis in the camera coordinate system.

Returns:

up (np.ndarray) – The camera up vector as a 3D numpy array.

property world_to_camera_matrix: ndarray

Return the world-to-camera transformation matrix for this posed image.

This matrix transforms points from world coordinates to camera coordinates.

Returns:

world_to_camera_matrix (np.ndarray) – The world-to-camera transformation matrix as a 4x4 numpy array.