Neural Network Layers and Blocks
- class fvdb.nn.MaxPool(kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size, stride: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size | None = None)[source]
Applies a 3D max pooling over an input
JaggedTensorof features associated with afvdb.GridBatch.- Parameters:
kernel_size (NumericMaxRank1) – the size of the window to take the max over, broadcastable to (3,)
stride (NumericMaxRank1, optional) – the stride of the window. Default value is
kernel_size
Note
For target voxels that are not covered by any source voxels, the output feature will be set to zero.
See also
fvdb.GridBatch.max_pool()for details on the max pooling operation.See also
fvdb.nn.AvgPoolfor average pooling.- Parameters:
kernel_size (NumericMaxRank1) – the size of the window to take the max over, broadcastable to (3,)
stride (NumericMaxRank1, optional) – the stride of the window. Default value is
kernel_size
- class fvdb.nn.AvgPool(kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size, stride: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size | None = None)[source]
Applies a 3D average pooling over an input
JaggedTensorof features associated with afvdb.GridBatch.- Parameters:
kernel_size (NumericMaxRank1) – the size of the window to take the average over, broadcastable to (3,)
stride (NumericMaxRank1, optional) – the stride of the window. Default value is
kernel_size
Note
For target voxels that are not covered by any source voxels, the output feature will be set to zero.
See also
fvdb.GridBatch.avg_pool()for details on the average pooling operation.See also
fvdb.nn.MaxPoolfor max pooling.- Parameters:
kernel_size (NumericMaxRank1) – the size of the window to take the average over
stride (NumericMaxRank1, optional) – the stride of the window. Default value is
kernel_size
- class fvdb.nn.UpsamplingNearest(scale_factor: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size)[source]
Refines a
JaggedTensorof features associated with a coarsefvdb.GridBatchto a fineGridBatchusing nearest-neighbor upsampling. i.e. each voxel in the coarse grid expands to a cube of voxels in the fine grid.See also
fvdb.GridBatch.refine()for details on the refinement operation.See also
fvdb.nn.AvgPoolandfvdb.nn.MaxPoolfor downsampling operations.- Parameters:
scale_factor (NumericMaxRank1) – the upsampling factor, broadcastable to (3,)
- class fvdb.nn.SparseConv3d(in_channels: int, out_channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, stride: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 1, bias: bool = True)[source]
A sparse 3D convolution module that operates on
JaggedTensorinputs according to a providedConvolutionPlan.A
ConvolutionPlandefines the mapping of a sparse convolution operation between data associated with an inputfvdb.GridBatchand an outputfvdb.GridBatch. This allows for efficient sparse convolution operations without explicitly constructing dense tensors.See also
fvdb.ConvolutionPlanfor details on creating and using convolution plans.See also
fvdb.SparseConvTranspose3dfor the transposed version of this module.- Parameters:
in_channels (int) – Number of channels in the input
JaggedTensor.out_channels (int) – Number of channels in the output
JaggedTensor.kernel_size (NumericMaxRank1, optional) – Size of the convolution kernel, broadcastable to
(3,). Default: 3stride (NumericMaxRank1, optional) – Stride of the convolution, broadcastable to
(3,). Default: 1bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
- class fvdb.nn.SparseConvTranspose3d(in_channels: int, out_channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, stride: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 1, bias: bool = True)[source]
A sparse 3D transposed convolution module that operates on
JaggedTensorinputs according to a providedConvolutionPlan.A
ConvolutionPlandefines the mapping of a sparse convolution operation between data associated with an inputfvdb.GridBatchand an outputfvdb.GridBatch. This allows for efficient sparse convolution operations without explicitly constructing dense tensors.See also
fvdb.ConvolutionPlanfor details on creating and using convolution plans.See also
fvdb.nn.SparseConv3dfor the non-transposed version of this module.- Parameters:
in_channels (int) – Number of channels in the input
JaggedTensor.out_channels (int) – Number of channels in the output
JaggedTensor.kernel_size (NumericMaxRank1, optional) – Size of the convolution kernel, broadcastable to
(3,). Default: 3stride (NumericMaxRank1, optional) – Stride of the convolution, broadcastable to
(3,). Default: 1bias (bool, optional) – If True, adds a learnable bias to the output. Default:
True
- class fvdb.nn.BatchNorm(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device=None, dtype=None)[source]
Applies Batch Normalization over a
JaggedTensorbatch of features associated with aGridBatch. SeeBatchNorm1dfor detailed information on Batch Normalization.See also
fvdb.nn.SyncBatchNormfor distributed batch normalization across multiple processes.- Parameters:
num_features (int) – number of features in the input
JaggedTensoreps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-5.
momentum (float, optional) – the value used for the running_mean and running_var computation. Default: 0.1
affine (bool, optional) – a boolean value that when set to
True, this module has learnable affine parameters. Default:Truetrack_running_stats (bool, optional) – a boolean value that when set to
True, this module tracks the running mean and variance, and when set toFalse, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:Truedevice (torch.device, optional) – device on which the module is allocated. Default:
Nonedtype (torch.dtype, optional) – data type of the module parameters. Default:
None.
- class fvdb.nn.GroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device=None, dtype=None)[source]
Applies Group Normalization over a
JaggedTensorbatch of features associated with aGridBatch. SeeGroupNormfor detailed information on Group Normalization.- Parameters:
num_groups (int) – number of groups to separate the channels into
num_channels (int) – number of channels in the input
JaggedTensoreps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-5.
affine (bool, optional) – a boolean value that when set to
True, this module has learnable affine parameters. Default:Truedevice (torch.device, optional) – device on which the module is allocated. Default:
Nonedtype (torch.dtype, optional) – data type of the module parameters. Default:
None.
- class fvdb.nn.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Any | None = None, device=None, dtype=None)[source]
Applies distributed Batch Normalization over a
JaggedTensorbatch of features associated with aGridBatch. SeeSyncBatchNormfor detailed information on distributed batch normalization.Note
Only supports
DistributedDataParallel(DDP) with single GPU per process. Usefvdb.nn.SyncBatchNorm.convert_sync_batchnorm()to convertBatchNormlayer toSyncBatchNormbefore wrapping Network with DDP.See also
fvdb.nn.BatchNormfor non-distributed batch normalization.- Parameters:
num_features (int) – number of features in the input
JaggedTensoreps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-5.
momentum (float, optional) – the value used for the running_mean and running_var computation. Default: 0.1
affine (bool, optional) – a boolean value that when set to
True, this module has learnable affine parameters. Default:Truetrack_running_stats (bool, optional) – a boolean value that when set to
True, this module tracks the running mean and variance, and when set toFalse, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:Trueprocess_group (Any, optional) – the process group to scope synchronization. Default:
Nonedevice (torch.device, optional) – device on which the module is allocated. Default:
Nonedtype (torch.dtype, optional) – data type of the module parameters. Default:
None.
U-Net Architecture Blocks
- class fvdb.nn.SimpleUNet(in_channels: int, base_channels: int, out_channels: int, channel_growth_rate: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, downup_layer_count: int = 4, block_layer_count: int = 2, momentum: float = 0.1)[source]
Complete U-Net architecture for sparse voxel data processing.
Implements a residual U-Net designed specifically for sparse 3D voxel grids. The network follows the classic U-Net structure with encoder-decoder paths, skip connections, and residual blocks, adapted for efficient processing of sparse volumetric data using the FVDB framework.
The architecture consists of three main stages: 1. Padding: Prepares input data with appropriate spatial and channel dimensions 2. Encoder-Decoder: Recursive downsampling and upsampling with skip connections 3. Unpadding: Produces final output in original grid dimensions
The network is designed for dense prediction tasks on sparse 3D data, such as semantic segmentation, shape completion, or volumetric reconstruction. The residual connections improve gradient flow and training stability, while the U-Net structure preserves both local and global spatial information.
- Parameters:
in_channels (int) – Number of input channels from the original data.
base_channels (int) – Number of base channels used throughout the network.
out_channels (int) – Number of output channels for final predictions.
channel_growth_rate (int) – Factor by which channels grow at each encoder level.
kernel_size (NumericMaxRank1) – Size of convolution kernels used throughout. Defaults to 3.
downup_layer_count (int) – Number of encoder-decoder levels. Defaults to 4.
block_layer_count (int) – Number of basic blocks per convolutional block. Defaults to 2.
momentum (float) – Momentum parameter for all batch normalization layers. Defaults to 0.1.
- class fvdb.nn.SimpleUNetBasicBlock(in_channels: int, out_channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, momentum: float = 0.1)[source]
Basic convolutional block with batch normalization and ReLU activation.
This is the fundamental building block of the U-Net architecture, consisting of a 3D sparse convolution followed by batch normalization and ReLU activation.
The block has different input/output channel counts, but the spatial topology is the same.
Because it is likely that this convolution block can share a plan with other blocks, the plan is taken as an argument to the forward method.
- Parameters:
in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
kernel_size (NumericMaxRank1) – Size of the convolution kernel. Defaults to 3.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetBottleneck(channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, layer_count: int = 2, momentum: float = 0.1)[source]
Bottleneck module at the deepest level of the U-Net architecture.
Represents the coarsest spatial resolution in the network where the receptive field is maximized and the most abstract features are learned. The bottleneck consists of a residual convolutional block that processes features at the lowest resolution before they begin the upsampling path.
This module operates at a fixed spatial resolution and channel count, serving as the bridge between the encoder and decoder paths.
- Parameters:
channels (int) – Number of input and output channels.
kernel_size (NumericMaxRank1) – Size of the convolution kernel. Defaults to 3.
layer_count (int) – Number of basic blocks in the residual block. Defaults to 2.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetConvBlock(in_channels: int, mid_channels: int, out_channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, layer_count: int = 2, momentum: float = 0.1)[source]
Multi-layer residual convolutional block.
Combines multiple SimpleUNetBasicBlocks with a residual connection around the entire sequence. The input is added to the output of the block sequence, enabling improved gradient flow and training stability. Requires that the convolution plan maintains fixed topology to ensure compatible tensor shapes for the residual connection.
Takes separate input channels, mid channels, and output channels. If there’s only one layer, the mid channels are ignored.
Because it is likely that this convolution block can share a plan with other blocks, the plan is taken as an argument to the forward method.
- Parameters:
in_channels (int) – Number of input channels.
mid_channels (int) – Number of mid channels.
out_channels (int) – Number of output channels.
kernel_size (NumericMaxRank1) – Size of the convolution kernel. Defaults to 3.
layer_count (int) – Number of basic blocks to stack. Defaults to 2.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetDown(in_channels: int, out_channels: int, momentum: float = 0.1)[source]
Downsampling module for the encoder path of the U-Net.
Reduces spatial resolution by a factor of 2 using max pooling while simultaneously increasing the number of channels through a 1x1 convolution. This design follows the typical U-Net encoder pattern where spatial information is traded for feature depth as the network progresses toward the bottleneck.
The module performs two operations in sequence: 1. Max pooling with factor 2 to reduce spatial resolution 2. 1x1 convolution to adjust channel count (channel fan-out linear layer)
The convolution plan is not able to be used elsewhere, so it is created inline.
- Parameters:
in_channels (int) – Number of input channels from the fine grid.
out_channels (int) – Number of output channels for the coarse grid.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetDownUp(in_channels: int, channel_growth_rate: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, downup_layer_count: int = 4, block_layer_count: int = 2, momentum: float = 0.1)[source]
Recursive encoder-decoder module forming the core U-Net structure.
Implements the complete encoder-decoder architecture with skip connections. The module recursively creates nested U-Net structures, where each level performs downsampling, processes features at a coarser resolution, upsamples, and combines results via skip connections.
The recursive structure enables the network to capture multi-scale features effectively. At each level: 1. Input convolution processes features at current resolution 2. Downsampling reduces resolution and increases channels 3. Inner module (either bottleneck or another DownUp) processes coarser features 4. Upsampling increases resolution and reduces channels 5. Skip connection combines encoder and decoder features (simple addition) 6. Output convolution refines the combined features
- Parameters:
in_channels (int) – Number of input and output channels.
channel_growth_rate (int) – Factor by which channels increase at each level.
kernel_size (NumericMaxRank1) – Size of convolution kernels. Defaults to 3.
downup_layer_count (int) – Number of recursive levels. Defaults to 4.
block_layer_count (int) – Number of layers per convolutional block. Defaults to 2.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetPad(in_channels: int, out_channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3, momentum: float = 0.1)[source]
Input padding module for handling convolution boundary conditions.
Transforms the input grid to accommodate the convolution operations throughout the network. The module simultaneously handles two transformations: 1. Spatial padding: Expands the grid to ensure valid convolutions at boundaries 2. Channel adjustment: Converts input channels to the base channel count
The spatial padding is determined by the kernel size and ensures that the network can process boundary voxels correctly. This is particularly important for sparse voxel data where boundary handling affects the final output quality.
- Parameters:
in_channels (int) – Number of input channels from the original data.
out_channels (int) – Number of output channels (base channels for the network).
kernel_size (NumericMaxRank1) – Size of convolution kernels. Defaults to 3.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetUnpad(in_channels: int, out_channels: int, kernel_size: Tensor | ndarray | int | float | integer | floating | Sequence[int | float | integer | floating] | Size = 3)[source]
Output unpadding module for producing final predictions.
Transforms the network output back to the original grid dimensions and target channel count. The module simultaneously handles two transformations: 1. Spatial unpadding: Removes padding to match original grid dimensions 2. Channel adjustment: Converts base channels to final output channels
Uses transposed convolution to ensure proper gradient flow during training while accurately mapping from the padded feature space back to the original input space. This is the final step that produces the network’s predictions.
Since this is the last module in the network, we don’t need a final batchnorm.
Builds its plan inline, since not needed elsewhere.
- Parameters:
in_channels (int) – Number of input channels (base channels from the network).
out_channels (int) – Number of output channels for final predictions.
kernel_size (NumericMaxRank1) – Size of convolution kernels. Defaults to 3.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.
- class fvdb.nn.SimpleUNetUp(in_channels: int, out_channels: int, momentum: float = 0.1)[source]
Upsampling module for the decoder path of the U-Net.
Increases spatial resolution by a factor of 2 using subdivision while simultaneously decreasing the number of channels through a 1x1 convolution. This design follows the typical U-Net decoder pattern where feature depth is traded for spatial information as the network progresses toward the output.
The module performs two operations in sequence: 1. 1x1 convolution to adjust channel count (channel fan-in linear layer) 2. Grid subdivision with factor 2 to increase spatial resolution
The convolution plan is not able to be used elsewhere, so it is created inline.
- Parameters:
in_channels (int) – Number of input channels from the coarse grid.
out_channels (int) – Number of output channels for the fine grid.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.