fvdb_reality_capture.sfm_sceneο
- class fvdb_reality_capture.sfm_scene.SfmScene(cameras: dict[int, SfmCameraMetadata], images: Sequence[SfmPosedImageMetadata], points: ndarray, points_err: ndarray, points_rgb: ndarray, scene_bbox: ndarray | None, transformation_matrix: ndarray | None, cache: SfmCache)[source]ο
Lightweight in-memory representation of a scene extracted from a structure-from-motion (SFM) pipeline such as COLMAP or GLOMAP.
This class does not load large data, but instead stores metadata about the scene, including camera parameters, image paths, and 3D points. It also provides methods to manipulate and transform the scene, such as filtering points or images, applying transformations, and accessing camera and image properties.
Note
The
SfmSceneclass is immutable. Methods that modify the scene (e.g. filtering points or images, applying transformations) return new instances rather than modifying the existing one.Note
The
SfmSceneclass does not load or store the actual image data. It only stores paths to the images on disk.In general, an
SfmSceneconsists of the following components:Cameras (
cameras): A dictionary mapping unique integer camera identifiers toSfmCameraMetadataobjects which contain information about each camera used to capture the scene (e.g. focal length, distortion parameters).Posed Images (
images): A list ofSfmImageMetadataobjects containing metadata for each posed image in the scene (e.g. the ID of the camera that captured it, the path to the image on disk, the camera to world matrix, etc.).Points (
points): An Nx3 array of 3D points in the scene, whereNis the number of points. Point Errors (points_err): An array of shape(N,)representing the error or uncertainty of each point whereNis the number of points.Point RGB Colors (
points_rgb): An Nx3 uint8 array of RGB color values for each point in the scene, whereNis the number of points.Scene Bounding Box (
scene_bbox): An array of shape (6,) representing a bounding box containing the scene. In the form(bbmin_x, bbmin_y, bbmin_z, bbmax_x, bbmax_y, bbmax_z).Transformation Matrix (
transformation_matrix): A 4x4 matrix encoding a transformation from some canonical coordinate space to scene coordinates.Cache (
cache): AnSfmCacheobject representing a cache folder for storing quantities derived from the scene (e.g. depth maps, feature matches, etc.).- __init__(cameras: dict[int, SfmCameraMetadata], images: Sequence[SfmPosedImageMetadata], points: ndarray, points_err: ndarray, points_rgb: ndarray, scene_bbox: ndarray | None, transformation_matrix: ndarray | None, cache: SfmCache)[source]ο
Initialize an
SfmSceneinstance from the given components.- Parameters:
cameras (dict[int, SfmCameraMetadata]) β A dictionary mapping camera IDs to
SfmCameraMetadataobjects containing information about each camera used to capture the scene (e.g. focal length, distortion parameters, etc.).images (Sequence[SfmImageMetadata]) β A sequence of
SfmImageMetadataobjects containing metadata for each image in the scene (e.g. camera ID, image path, view transform, etc.).points (np.ndarray) β An
(N, 3)-shaped array of 3D points in the scene, whereNis the number of points.points_err (np.ndarray) β An array of shape
(N,)representing the error or uncertainty of each point inpoints.points_rgb (np.ndarray) β An
(N,3)-shaped uint8 array of RGB color values for each point in the scene, whereNis the number of points.scene_bbox (np.ndarray | None) β A
(6,)-shaped array of the form[bmin_x, bmin_y, bmin_z, bmax_x, bmax_y, bmax_z]defining the bounding box of the scene. IfNoneis passed in, it will default to[-inf, -inf, -inf, inf, inf, inf](i.e. all of \(\mathbb{R}^3\))transformation_matrix (np.ndarray | None) β A 4x4 transformation matrix encoding the transformation from a reference coordinate system to the sceneβs coordinate system. Note that this is not applied to the scene but simply stored to track transformations applied to the scene (e.g. via
apply_transformation_matrix()). IfNoneis passed in, it will default to the identity matrix.
- apply_transformation_matrix(transformation_matrix: ndarray) SfmScene[source]ο
Return a new
SfmSceneinstance with the transformation applied.The transformation applies to the camera poses and the 3D points in the scene.
- Parameters:
transformation_matrix (np.ndarray) β A 4x4 transformation matrix to apply to the scene.
- Returns:
SfmScene β A new
SfmSceneinstance with the transformed cameras and points.
- property cache: SfmCacheο
Return an
SfmCacheobject representing a cache folder for storing quantities derived from the scene.- Returns:
cache (SfmCache) β The
SfmCacheobject associated with this scene.
- property camera_to_world_matrices: ndarrayο
Return the camera-to-world matrices for each posed image in the scene.
- Returns:
camera_to_world_matrices (np.ndarray) β An
(I, 4, 4)-shaped array representing the camera-to-world transformation matrix of each posed image in the scene, where I is the number of images.
- property cameras: dict[int, SfmCameraMetadata]ο
Return a dictionary mapping unique (integer) camera identifiers to SfmCameraMetadata objects which contain information about each camera used to capture the scene (e.g. its focal length, projection matrix, etc.).
- Returns:
dict[int, SfmCameraMetadata] β A dictionary mapping camera IDs to SfmCameraMetadata objects.
- filter_images(mask: ndarray | Sequence[bool]) SfmScene[source]ο
Return a new
SfmSceneinstance containing only the images for which the mask isTrue.- Parameters:
mask (np.ndarray | Sequence[bool]) β A Boolean array of shape
(I,)whereIis the number of images.Truevalues indicate that the corresponding image should be kept.- Returns:
SfmScene β A new
SfmSceneinstance with filtered images and corresponding metadata.
- filter_points(mask: ndarray | Sequence[bool]) SfmScene[source]ο
Return a new
SfmSceneinstance containing only the points for which the mask isTrue.- Parameters:
mask (np.ndarray | Sequence[bool]) β A boolean array of shape
(N,)whereNis the number of points.Truevalues indicate that the corresponding point should be kept.- Returns:
SfmScene β A new
SfmSceneinstance with filtered points and corresponding metadata.
- classmethod from_colmap(colmap_path: str | Path) SfmScene[source]ο
Load an
SfmScene(with a cache to store derived quantities) from the output of a COLMAP structure-from-motion (SfM) pipeline. COLMAP produces a directory of images, a set of 3D correspondence points, as well as a lightweight SqLite database containing image poses (camera to world matrices), camera intrinsics (projection matrices, camera type, etc.), and indices of which points are seen from which images.These are loaded into memory as an
SfmSceneobject which provides easy access to the camera parameters, image paths, and 3D points, as well as methods to manipulate and transform the scene.- Parameters:
colmap_path (str | pathlib.Path) β The path to the output of a COLMAP run.
- Returns:
scene (SfmScene) β An in-memory representation of the loaded scene.
- classmethod from_e57(e57_path: str | Path, point_downsample_factor: int = 1) SfmScene[source]ο
Load an
SfmScene(with a cache to store derived quantities) from a set of E57 files.- Parameters:
e57_path (str | pathlib.Path) β The path to a directory containing E57 files.
point_downsample_factor (int) β Factor by which to downsample the points loaded from the E57 files. Defaults to 1 (i.e. no downsampling).
- Returns:
scene (SfmScene) β An in-memory representation of the loaded scene.
- classmethod from_simple_directory(data_path: str | Path) SfmScene[source]ο
Load an
SfmScene(with a cache to store derived quantities) from a simple directory structure containing images, camera parameters (stored as JSON), and 3D points (stored as a PLY).The directory should contain:
images/: A directory of images.
cameras.json: A JSON file containing camera parameters. The cameras.json file is a list of dictionaries, each containing the following keys:
"camera_name": The name of the image file."width": The width of the image."height": The height of the image."camera_intrinsics": The perspective projection matrix"world_to_camera": The world-to-camera transformation matrix."image_path": The path to the image file relative to the images directory.
points.ply: A PLY file containing 3D points.
- Parameters:
data_path (str | pathlib.Path) β The path to the data directory.
- Returns:
scene (SfmScene) β An in-memory representation of the loaded scene.
- classmethod from_state_dict(state_dict: dict[str, Any]) SfmScene[source]ο
Create an
SfmScenefrom a state dictionary previously obtained viastate_dict().- Parameters:
state_dict (dict[str, Any]) β A state dictionary representing the SfmScene originally created with
state_dict().- Returns:
scene (SfmScene) β An in-memory representation of the loaded scene.
- property has_visible_point_indices: boolο
Return whether the images in the scene have point indices indicating which 3D points are visible in each image.
- Returns:
has_visible_point_indices (bool) β
Trueif the images have point indices,Falseotherwise.
- property image_camera_positions: ndarrayο
Returns the position where each posed image was captured in the scene (i.e. the position of the camera when it captured the image).
- Returns:
image_camera_positions (np.ndarray) β A
(I, 3)-shaped array representing the 3D positions of the camera positions that captured each posed image in the scene, whereIis the number of images.
- property image_sizes: ndarrayο
Return the resolution of each posed image in the scene as a numpy array of shape (N, 2) where
Nis the number of images and each entry is (height, width).- Returns:
image_sizes (np.ndarray) β A
(I, 2)-shaped array representing the resolution of each posed image in the scene, whereIis the number of images, image_sizes[i, 0] is the height of imagei, and image_sizes[i, 1] is the width of imagei.
- property images: list[SfmPosedImageMetadata]ο
Get a list of image metadata objects (SfmImageMetadata) with information about each image in the scene (e.g. itβs camera ID, path on the filesystem, etc.).
- Returns:
list[SfmImageMetadata] β A list of SfmImageMetadata objects containing metadata for each image in the scene.
- property median_depth_per_image: ndarrayο
Return an array containing the median depth of the points observed in each image.
- Returns:
median_depth_per_image (np.ndarray) β An array of shape
(I,)whereIis the number of images. Each value represents the median depth of the points observed in the corresponding image. If this scene does not have visible points per image (i.ehas_visible_point_indicesisFalse), an array ofnp.nanvalues is returned.
- property num_cameras: intο
Return the total number of cameras used to capture the scene.
- Returns:
num_cameras (int) β The number of cameras in the scene.
- property num_images: intο
Return the total number of posed images in the scene.
- Returns:
num_images (int) β The number of posed images in the scene.
- property points: ndarrayο
Get the 3D points in the scene as a numpy array of shape
(N, 3).Note: The points are in the same coordinate system as the camera poses.
- Returns:
points (np.ndarray) β An
(N, 3)-shaped array of 3D points in the scene whereNis the number of points.
- property points_err: ndarrayο
Return an un-normalized confidence value for each point in the scene (see
points).The error is a measure of the uncertainty in the 3D point position, typically derived from the SFM pipeline.
- Returns:
points_err (np.ndarray) β An array of shape
(N,)whereNis the number of points in the scene.points_err[i]encodes the error or uncertainty of the i-th corresponding point inpoints.
- property points_rgb: ndarrayο
Return the RGB color values for each point in the scene as a uint8 array of shape
(N, 3)whereNis the number of points.- Returns:
points_rgb (np.ndarray) β An
(N, 3)-shaped uint8 array of RGB color values for each point in the scene whereNis the number of points.
- property projection_matrices: ndarrayο
Return the projection matrices for each posed image in the scene.
- Returns:
projection_matrices (np.ndarray) β An
(I, 3, 3)-shaped array representing the projection matrix of each posed image in the scene, where I is the number of images. The projection matrix maps 3D points in camera coordinates to 2D points in pixel coordinates.
- property scene_bbox: ndarrayο
Return the clip bounds of the scene as a numpy array of shape
(6,)in the form`[xmin, ymin, zmin, xmax, ymax, zmax].If the scene was not constructed with a bounding box, the default clip bounds are
[-inf, -inf, -inf, inf, inf, inf].- Returns:
scene_bbox (np.ndarray) β A 1D array of shape
(6,)representing the bounding box of the scene. If the scene was not constructed with a bounding box, then return[-inf, -inf, -inf, inf, inf, inf].
- select_images(indices: ndarray | Sequence[int]) SfmScene[source]ο
Return a new
SfmSceneinstance containing only the images specified by the given indices.- Parameters:
indices (np.ndarray | Sequence[int]) β An array of integer indices specifying which images to select. The indices should be in the range
[0, num_images - 1].- Returns:
SfmScene β A new
SfmSceneinstance with the selected images and corresponding metadata.
- state_dict() dict[str, Any][source]ο
Get a state dictionary representing the SfmScene. This can be used to serialize the scene to disk or to create a new SfmScene instance via
from_state_dict().- Returns:
state_dict (dict[str, Any]) β A dictionary containing the state of the SfmScene.
- property transformation_matrix: ndarrayο
Return the 4x4 transformation matrix for the scene. This matrix encodes the transformation from the coordinate system the scene was loaded in to the current sceneβs coordinates.
- Returns:
transformation_matrix (np.ndarray) β A 4x4 numpy array representing the transformation matrix from the original coordinate system to the current sceneβs coordinate system.
- property world_to_camera_matrices: ndarrayο
Return the world-to-camera matrices for each posed image in the scene.
- Returns:
world_to_camera_matrices (np.ndarray) β An
(I, 4, 4)-shaped array representing the world-to-camera transformation matrix of each posed image in the scene, where I is the number of images.
- class fvdb_reality_capture.sfm_scene.SfmCameraMetadata(img_width: int, img_height: int, fx: float, fy: float, cx: float, cy: float, camera_type: SfmCameraType, distortion_parameters: ndarray)[source]ο
This class encodes metadata about a camera used to capture images in an
SfmScene.It contains information about the cameraβs intrinsic parameters (focal length, principal point, etc.), the camera type (see
SfmCameraType) (e.g., pinhole, radial distortion), and distortion parameters if applicable.The camera metadata is used to project 3D points into 2D pixel coordinates and to undistort images captured by the camera.
- __init__(img_width: int, img_height: int, fx: float, fy: float, cx: float, cy: float, camera_type: SfmCameraType, distortion_parameters: ndarray)[source]ο
Create a new
SfmCameraMetadataobject.- Parameters:
img_width (int) β The width of the camera image in pixel units (must be a positive integer).
img_height (int) β The height of the camera image in pixel units (must be a positive integer).
fx (float) β The focal length in the x direction in pixel units.
fy (float) β The focal length in the y direction in pixel units.
cx (float) β The x-coordinate of the principal point (optical center) in pixel units.
cy (float) β The y-coordinate of the principal point (optical center) in pixel units.
camera_type (SfmCameraType) β The type of camera used to capture the image (e.g., βPINHOLEβ, βSIMPLE_PINHOLEβ, etc.). See
SfmCameraTypefor details.distortion_parameters (np.ndarray) β An array of distortion coefficients corresponding to the camera type, or an empty array if no distortion is present.
- property aspect: floatο
Return the aspect ratio of the camera image.
The aspect ratio is defined as the width divided by the height.
- Returns:
aspect (float) β The aspect ratio of the camera image.
- property camera_type: SfmCameraTypeο
Return the type of camera used to capture the image.
- Returns:
camera_type (SfmCameraType) β The camera type (e.g., βPINHOLEβ, βSIMPLE_PINHOLEβ, etc.). See
SfmCameraTypefor details.
- property cx: floatο
Return the x-coordinate of the principal point (optical center) in pixel units.
- Returns:
cx (float) β The x-coordinate of the principal point in pixel units.
- property cy: floatο
Return the y-coordinate of the principal point (optical center) in pixel units.
- Returns:
cy (float) β The y-coordinate of the principal point in pixel units.
- property distortion_parameters: ndarrayο
Return the distortion parameters of the camera.
The distortion parameters are used to correct lens distortion in the captured images.
- Returns:
distortion_parameters (np.ndarray) β An array of distortion coefficients.
- property fovx: floatο
Return the horizontal field of view in radians.
- Returns:
fovx (float) β The horizontal field of view in radians.
- property fovy: floatο
Return the vertical field of view in radians.
- Returns:
fovy (float) β The vertical field of view in radians.
- classmethod from_state_dict(state_dict: dict[str, Any]) SfmCameraMetadata[source]ο
Create a new
SfmCameraMetadataobject from a state dictionary originally created bystate_dict().- Parameters:
state_dict (dict[str, Any]) β A dictionary containing the camera metadata.
- Returns:
SfmCameraMetadata β A new
SfmCameraMetadataobject.
- property fx: floatο
Return the focal length in the x direction in pixel units.
- Returns:
fx (float) β The focal length in the x direction in pixel units.
- property fy: floatο
Return the focal length in the y direction in pixel units.
- Returns:
fy (float) β The focal length in the y direction in pixel units.
- property height: intο
Return the height of the camera image in pixel units.
- Returns:
height (int) β The height of the camera image in pixels.
- property projection_matrix: ndarrayο
Return the camera projection matrix.
The projection matrix is a 3x3 matrix that maps 3D points in camera coordinates to 2D points in pixel coordinates.
- Returns:
projection_matrix (np.ndarray) β The camera projection matrix as a 3x3 numpy array.
- resize(new_width, new_height) SfmCameraMetadata[source]ο
Return a new
SfmCameraMetadataobject with the camera parameters resized to the new image dimensions.- Parameters:
new_width (int) β The new width of the camera image (must be a positive integer)
new_height (int) β The new height of the camera image (must be a positive integer)
- Returns:
SfmCameraMetadata β A new
SfmCameraMetadataobject with the resized camera parameters.
- state_dict() dict[str, Any][source]ο
Return a state dictionary representing the camera metadata.
This dictionary can be used to serialize and deserialize the camera metadata.
- Returns:
state_dict (dict[str, Any]) β A dictionary containing the camera metadata.
- undistort_image(image: ndarray) ndarray[source]ο
Undistort an image using the cameraβs distortion parameters.
- Parameters:
image (np.ndarray) β The distorted image to undistort.
- Returns:
undistorted_image (np.ndarray) β The undistorted image.
- property undistort_map_x: ndarray | Noneο
Return the undistortion map for the x-coordinates of the image. The undistortion map is used to remap the pixel coordinates in a distorted image to correct for lens distortion. If the camera does not have distortion parameters, this will be None.
- Returns:
undistort_map_x (np.ndarray | None) β The undistortion map for the x-coordinates or None if no distortion parameters are present.
- property undistort_map_y: ndarray | Noneο
Return the undistortion map for the y-coordinates of the image. The undistortion map is used to remap the pixel coordinates in a distorted image to correct for lens distortion. If the camera does not have distortion parameters, this will be None. :Returns: undistort_map_y (np.ndarray | None) β The undistortion map for the y-coordinates or None if no distortion parameters are present.
- property undistort_roi: tuple[int, int, int, int] | Noneο
Return the region of interest (ROI) for undistorted images. The ROI is defined as a tuple of
(x, y, width, height)that specifies the valid pixel range in an undistorted image. If the camera does not have distortion parameters, this will be None.- Returns:
undistort_roi (tuple[int, int, int, int] | None) β The ROI for undistorted images or None if no distortion parameters are present.
- property width: intο
Return the width of the camera image in pixel units.
- Returns:
width (int) β The width of the camera image in pixels.
- class fvdb_reality_capture.sfm_scene.SfmPosedImageMetadata(world_to_camera_matrix: ndarray, camera_to_world_matrix: ndarray, camera_metadata: SfmCameraMetadata, camera_id: int, image_path: str, mask_path: str, point_indices: ndarray | None, image_id: int)[source]ο
This class encodes metadata about a single posed image in an
SfmScene.It contains information about the camera pose (world-to-camera and camera-to-world matrices), a reference to the metadata for the camera that captured the image (see
SfmCameraMetadata), and the image and (optionally) mask file paths.- __init__(world_to_camera_matrix: ndarray, camera_to_world_matrix: ndarray, camera_metadata: SfmCameraMetadata, camera_id: int, image_path: str, mask_path: str, point_indices: ndarray | None, image_id: int)[source]ο
Create a new
SfmImageMetadataobject.- Parameters:
world_to_camera_matrix (np.ndarray) β A 4x4 matrix representing the transformation from world coordinates to camera coordinates.
camera_to_world_matrix (np.ndarray) β A 4x4 matrix representing the transformation from camera coordinates to world coordinates.
camera_metadata (SfmCameraMetadata) β The metadata for the camera that captured this image.
camera_id (int) β The unique identifier for the camera that captured this image.
image_path (str) β The file path to the image on the filesystem.
mask_path (str) β The file path to the mask image on the filesystem (can be an empty string if no mask is available).
point_indices (np.ndarray | None) β An optional array of point indices that are visible in this image (can be None if not available).
image_id (int) β The unique identifier for the image.
- property camera_id: intο
Return the unique identifier for the camera that captured this image.
- Returns:
camera_id (int) β The camera ID.
- property camera_metadata: SfmCameraMetadataο
Return metadata about the camera that captured this posed image (see
SfmCameraMetadata).The camera metadata contains information about the cameraβs intrinsic parameters, such as focal length and distortion coefficients.
- Returns:
SfmCameraMetadata β The camera metadata object.
- property camera_to_world_matrix: ndarrayο
Return the camera-to-world transformation matrix for this posed image.
This matrix transforms points from camera coordinates to world coordinates.
- Returns:
camera_to_world_matrix (np.ndarray) β The camera-to-world transformation matrix as a 4x4 numpy array.
- classmethod from_state_dict(state_dict: dict[str, Any], camera_metadata: dict[int, SfmCameraMetadata]) SfmPosedImageMetadata[source]ο
Create a new
SfmImageMetadataobject from a state dictionary and camera metadata (seestate_dict()).- Parameters:
state_dict (dict[str, Any]) β A dictionary containing the image metadata.
camera_metadata (dict[int, SfmCameraMetadata]) β A dictionary mapping camera IDs to
SfmCameraMetadataobjects.
- Returns:
SfmImageMetadata β A new
SfmImageMetadataobject.
- property image_id: intο
Return the unique identifier for this image.
This ID is used to uniquely identify the image within the dataset.
- Returns:
int β The image ID.
- property image_path: strο
Return the file path to color image for this posed image.
- Returns:
image_path (str) β The path to the color image file for this posed image.
- property image_size: tuple[int, int]ο
Return the resolution of the posed image in pixels as a tuple of the form
(height, width)- Returns:
image_size (tuple[int, int]) β The image resolution as
(height, width).
- property lookatο
Return the camera lookat vector.
The lookat vector is the direction the camera is pointing, which is the negative z-axis in the camera coordinate system.
- Returns:
lookat (np.ndarray) β The camera lookat vector as a 3D numpy array.
- property mask_path: strο
Return the file path to the mask for this posed image.
The mask image is used to indicate which pixels in the image are valid (e.g., not occluded).
An empty string indicates that no mask is available.
- Returns:
mask_path (str) β The path to the posed mask image file.
- property originο
Return the origin of the posed image. i.e. the position of the camera in world coordinates when it captured the image.
The origin is the position of the camera in world coordinates, which is the translation part of the camera-to-world matrix.
- Returns:
origin (np.ndarray) β The camera origin as a 3D numpy array.
- property point_indices: ndarray | Noneο
Return the indices of the 3D points that are visible in this posed image or
Noneif the indices are not available.These indices correspond to the points in the
SfmSceneβs point cloud that are visible in this posed image.- Returns:
point_indices (np.ndarray | None) β An array of indices of the visible 3D points or
Noneif not available.
- property rightο
Return the camera right vector.
The right vector is the direction that is considered βrightβ in the camera coordinate system, which is the x-axis in the camera coordinate system.
- Returns:
right (np.ndarray) β The camera right vector as a 3D numpy array.
- state_dict() dict[str, Any][source]ο
Return a state dictionary representing the image metadata.
This dictionary can be used to serialize and deserialize the image metadata.
- Returns:
state_dict (dict[str, Any]) β A dictionary containing the image metadata.
- transform(transformation_matrix: ndarray) SfmPosedImageMetadata[source]ο
Return a new
SfmImageMetadataobject with the camera pose transformed by the given transformation matrix.This transformation applies to the left of the camera to world transformation matrix, meaning it transforms the camera in world space.
i.e.
new_camera_to_world_matrix = transformation_matrix @ self.camera_to_world_matrix:param transformation_matrix: A 4x4 transformation matrix to apply. :type transformation_matrix: np.ndarray- Returns:
SfmImageMetadata β A new
SfmImageMetadataobject with the transformed matrices.
- property upο
Return the camera up vector.
The up vector is the direction that is considered βupβ in the camera coordinate system, which is the negative y-axis in the camera coordinate system.
- Returns:
up (np.ndarray) β The camera up vector as a 3D numpy array.
- property world_to_camera_matrix: ndarrayο
Return the world-to-camera transformation matrix for this posed image.
This matrix transforms points from world coordinates to camera coordinates.
- Returns:
world_to_camera_matrix (np.ndarray) β The world-to-camera transformation matrix as a 4x4 numpy array.