Neural Network Layers and Blocks

Applies a 3D max pooling over an input JaggedTensor of features associated with a fvdb.GridBatch.

Parameters:

kernel_size (NumericMaxRank1) – the size of the window to take the max over, broadcastable to (3,)
stride (NumericMaxRank1, optional) – the stride of the window. Default value is kernel_size

Note

For target voxels that are not covered by any source voxels, the output feature will be set to zero.

See also

fvdb.GridBatch.max_pool() for details on the max pooling operation.

See also

fvdb.nn.AvgPool for average pooling.

Parameters:

kernel_size (NumericMaxRank1) – the size of the window to take the max over, broadcastable to (3,)
stride (NumericMaxRank1, optional) – the stride of the window. Default value is kernel_size

Applies a 3D average pooling over an input JaggedTensor of features associated with a fvdb.GridBatch.

Parameters:

kernel_size (NumericMaxRank1) – the size of the window to take the average over, broadcastable to (3,)
stride (NumericMaxRank1, optional) – the stride of the window. Default value is kernel_size

Note

For target voxels that are not covered by any source voxels, the output feature will be set to zero.

See also

fvdb.GridBatch.avg_pool() for details on the average pooling operation.

See also

fvdb.nn.MaxPool for max pooling.

Parameters:

kernel_size (NumericMaxRank1) – the size of the window to take the average over
stride (NumericMaxRank1, optional) – the stride of the window. Default value is kernel_size

Refines a JaggedTensor of features associated with a coarse fvdb.GridBatch to a fine GridBatch using nearest-neighbor upsampling. i.e. each voxel in the coarse grid expands to a cube of voxels in the fine grid.

See also

fvdb.GridBatch.refine() for details on the refinement operation.

See also

fvdb.nn.AvgPool and fvdb.nn.MaxPool for downsampling operations.

Parameters:: scale_factor (NumericMaxRank1) – the upsampling factor, broadcastable to (3,)

A sparse 3D convolution module that operates on JaggedTensor inputs according to a provided ConvolutionPlan.

A ConvolutionPlan defines the mapping of a sparse convolution operation between data associated with an input fvdb.GridBatch and an output fvdb.GridBatch. This allows for efficient sparse convolution operations without explicitly constructing dense tensors.

See also

fvdb.ConvolutionPlan for details on creating and using convolution plans.

See also

fvdb.SparseConvTranspose3d for the transposed version of this module.

Parameters:

in_channels (int) – Number of channels in the input JaggedTensor.
out_channels (int) – Number of channels in the output JaggedTensor.
kernel_size (NumericMaxRank1, optional) – Size of the convolution kernel, broadcastable to (3,). Default: 3
stride (NumericMaxRank1, optional) – Stride of the convolution, broadcastable to (3,). Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

A sparse 3D transposed convolution module that operates on JaggedTensor inputs according to a provided ConvolutionPlan.

A ConvolutionPlan defines the mapping of a sparse convolution operation between data associated with an input fvdb.GridBatch and an output fvdb.GridBatch. This allows for efficient sparse convolution operations without explicitly constructing dense tensors.

See also

fvdb.ConvolutionPlan for details on creating and using convolution plans.

See also

fvdb.nn.SparseConv3d for the non-transposed version of this module.

Parameters:

in_channels (int) – Number of channels in the input JaggedTensor.
out_channels (int) – Number of channels in the output JaggedTensor.
kernel_size (NumericMaxRank1, optional) – Size of the convolution kernel, broadcastable to (3,). Default: 3
stride (NumericMaxRank1, optional) – Stride of the convolution, broadcastable to (3,). Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

class fvdb.nn.BatchNorm(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device=None, dtype=None)[source]

Applies Batch Normalization over a JaggedTensor batch of features associated with a GridBatch. See BatchNorm1d for detailed information on Batch Normalization.

See also

fvdb.nn.SyncBatchNorm for distributed batch normalization across multiple processes.

Parameters:

num_features (int) – number of features in the input JaggedTensor
eps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-5.
momentum (float, optional) – the value used for the running_mean and running_var computation. Default: 0.1
affine (bool, optional) – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats (bool, optional) – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True
device (torch.device, optional) – device on which the module is allocated. Default: None
dtype (torch.dtype, optional) – data type of the module parameters. Default: None.

class fvdb.nn.GroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device=None, dtype=None)[source]

Applies Group Normalization over a JaggedTensor batch of features associated with a GridBatch. See GroupNorm for detailed information on Group Normalization.

Parameters:

num_groups (int) – number of groups to separate the channels into
num_channels (int) – number of channels in the input JaggedTensor
eps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-5.
affine (bool, optional) – a boolean value that when set to True, this module has learnable affine parameters. Default: True
device (torch.device, optional) – device on which the module is allocated. Default: None
dtype (torch.dtype, optional) – data type of the module parameters. Default: None.

class fvdb.nn.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Any | None = None, device=None, dtype=None)[source]

Applies distributed Batch Normalization over a JaggedTensor batch of features associated with a GridBatch. See SyncBatchNorm for detailed information on distributed batch normalization.

Note

Only supports DistributedDataParallel (DDP) with single GPU per process. Use fvdb.nn.SyncBatchNorm.convert_sync_batchnorm() to convert BatchNorm layer to SyncBatchNorm before wrapping Network with DDP.

See also

fvdb.nn.BatchNorm for non-distributed batch normalization.

Parameters:

num_features (int) – number of features in the input JaggedTensor
eps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-5.
momentum (float, optional) – the value used for the running_mean and running_var computation. Default: 0.1
affine (bool, optional) – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats (bool, optional) – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True
process_group (Any, optional) – the process group to scope synchronization. Default: None
device (torch.device, optional) – device on which the module is allocated. Default: None
dtype (torch.dtype, optional) – data type of the module parameters. Default: None.

U-Net Architecture Blocks

Complete U-Net architecture for sparse voxel data processing.

Implements a residual U-Net designed specifically for sparse 3D voxel grids. The network follows the classic U-Net structure with encoder-decoder paths, skip connections, and residual blocks, adapted for efficient processing of sparse volumetric data using the FVDB framework.

The architecture consists of three main stages: 1. Padding: Prepares input data with appropriate spatial and channel dimensions 2. Encoder-Decoder: Recursive downsampling and upsampling with skip connections 3. Unpadding: Produces final output in original grid dimensions

The network is designed for dense prediction tasks on sparse 3D data, such as semantic segmentation, shape completion, or volumetric reconstruction. The residual connections improve gradient flow and training stability, while the U-Net structure preserves both local and global spatial information.

Parameters:

in_channels (int) – Number of input channels from the original data.
base_channels (int) – Number of base channels used throughout the network.
out_channels (int) – Number of output channels for final predictions.
channel_growth_rate (int) – Factor by which channels grow at each encoder level.
kernel_size (NumericMaxRank1) – Size of convolution kernels used throughout. Defaults to 3.
downup_layer_count (int) – Number of encoder-decoder levels. Defaults to 4.
block_layer_count (int) – Number of basic blocks per convolutional block. Defaults to 2.
momentum (float) – Momentum parameter for all batch normalization layers. Defaults to 0.1.

Basic convolutional block with batch normalization and ReLU activation.

This is the fundamental building block of the U-Net architecture, consisting of a 3D sparse convolution followed by batch normalization and ReLU activation.

The block has different input/output channel counts, but the spatial topology is the same.

Because it is likely that this convolution block can share a plan with other blocks, the plan is taken as an argument to the forward method.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
kernel_size (NumericMaxRank1) – Size of the convolution kernel. Defaults to 3.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

Bottleneck module at the deepest level of the U-Net architecture.

Represents the coarsest spatial resolution in the network where the receptive field is maximized and the most abstract features are learned. The bottleneck consists of a residual convolutional block that processes features at the lowest resolution before they begin the upsampling path.

This module operates at a fixed spatial resolution and channel count, serving as the bridge between the encoder and decoder paths.

Parameters:

channels (int) – Number of input and output channels.
kernel_size (NumericMaxRank1) – Size of the convolution kernel. Defaults to 3.
layer_count (int) – Number of basic blocks in the residual block. Defaults to 2.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

Multi-layer residual convolutional block.

Combines multiple SimpleUNetBasicBlocks with a residual connection around the entire sequence. The input is added to the output of the block sequence, enabling improved gradient flow and training stability. Requires that the convolution plan maintains fixed topology to ensure compatible tensor shapes for the residual connection.

Takes separate input channels, mid channels, and output channels. If there’s only one layer, the mid channels are ignored.

Because it is likely that this convolution block can share a plan with other blocks, the plan is taken as an argument to the forward method.

Parameters:

in_channels (int) – Number of input channels.
mid_channels (int) – Number of mid channels.
out_channels (int) – Number of output channels.
kernel_size (NumericMaxRank1) – Size of the convolution kernel. Defaults to 3.
layer_count (int) – Number of basic blocks to stack. Defaults to 2.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

class fvdb.nn.SimpleUNetDown(in_channels: int, out_channels: int, momentum: float = 0.1)[source]

Downsampling module for the encoder path of the U-Net.

Reduces spatial resolution by a factor of 2 using max pooling while simultaneously increasing the number of channels through a 1x1 convolution. This design follows the typical U-Net encoder pattern where spatial information is traded for feature depth as the network progresses toward the bottleneck.

The module performs two operations in sequence: 1. Max pooling with factor 2 to reduce spatial resolution 2. 1x1 convolution to adjust channel count (channel fan-out linear layer)

The convolution plan is not able to be used elsewhere, so it is created inline.

Parameters:

in_channels (int) – Number of input channels from the fine grid.
out_channels (int) – Number of output channels for the coarse grid.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

Recursive encoder-decoder module forming the core U-Net structure.

Implements the complete encoder-decoder architecture with skip connections. The module recursively creates nested U-Net structures, where each level performs downsampling, processes features at a coarser resolution, upsamples, and combines results via skip connections.

The recursive structure enables the network to capture multi-scale features effectively. At each level: 1. Input convolution processes features at current resolution 2. Downsampling reduces resolution and increases channels 3. Inner module (either bottleneck or another DownUp) processes coarser features 4. Upsampling increases resolution and reduces channels 5. Skip connection combines encoder and decoder features (simple addition) 6. Output convolution refines the combined features

Parameters:

in_channels (int) – Number of input and output channels.
channel_growth_rate (int) – Factor by which channels increase at each level.
kernel_size (NumericMaxRank1) – Size of convolution kernels. Defaults to 3.
downup_layer_count (int) – Number of recursive levels. Defaults to 4.
block_layer_count (int) – Number of layers per convolutional block. Defaults to 2.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

Input padding module for handling convolution boundary conditions.

Transforms the input grid to accommodate the convolution operations throughout the network. The module simultaneously handles two transformations: 1. Spatial padding: Expands the grid to ensure valid convolutions at boundaries 2. Channel adjustment: Converts input channels to the base channel count

The spatial padding is determined by the kernel size and ensures that the network can process boundary voxels correctly. This is particularly important for sparse voxel data where boundary handling affects the final output quality.

Parameters:

in_channels (int) – Number of input channels from the original data.
out_channels (int) – Number of output channels (base channels for the network).
kernel_size (NumericMaxRank1) – Size of convolution kernels. Defaults to 3.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

Output unpadding module for producing final predictions.

Transforms the network output back to the original grid dimensions and target channel count. The module simultaneously handles two transformations: 1. Spatial unpadding: Removes padding to match original grid dimensions 2. Channel adjustment: Converts base channels to final output channels

Uses transposed convolution to ensure proper gradient flow during training while accurately mapping from the padded feature space back to the original input space. This is the final step that produces the network’s predictions.

Since this is the last module in the network, we don’t need a final batchnorm.

Builds its plan inline, since not needed elsewhere.

Parameters:

in_channels (int) – Number of input channels (base channels from the network).
out_channels (int) – Number of output channels for final predictions.
kernel_size (NumericMaxRank1) – Size of convolution kernels. Defaults to 3.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.

class fvdb.nn.SimpleUNetUp(in_channels: int, out_channels: int, momentum: float = 0.1)[source]

Upsampling module for the decoder path of the U-Net.

Increases spatial resolution by a factor of 2 using subdivision while simultaneously decreasing the number of channels through a 1x1 convolution. This design follows the typical U-Net decoder pattern where feature depth is traded for spatial information as the network progresses toward the output.

The module performs two operations in sequence: 1. 1x1 convolution to adjust channel count (channel fan-in linear layer) 2. Grid subdivision with factor 2 to increase spatial resolution

The convolution plan is not able to be used elsewhere, so it is created inline.

Parameters:

in_channels (int) – Number of input channels from the coarse grid.
out_channels (int) – Number of output channels for the fine grid.
momentum (float) – Momentum parameter for batch normalization. Defaults to 0.1.