fp::blas::tile

Function gemm_concurrent

pub fn gemm_concurrent<const M: usize, const N: usize, L: LoopOrder>(
    alpha: bool,
    a: MatrixTileSlice<'_>,
    b: MatrixTileSlice<'_>,
    beta: bool,
    c: MatrixTileSliceMut<'_>,
)

Expand description

Performs tile-level GEMM with recursive parallelization.

The matrix is recursively split along rows (if rows > M blocks) or columns (if cols > N blocks) until tiles are small enough, then all tiles are processed in parallel using rayon.

§Type Parameters

M - Minimum block rows before parallelization stops
N - Minimum block columns before parallelization stops
L - Loop ordering strategy (see LoopOrder)

§Performance

For best performance, choose M and N based on your matrix sizes. The defaults used in the codebase are M=1, N=16, which work well for many workloads.

gemm_concurrent

Function gemm_concurrent Copy item path

§Type Parameters

§Performance

Function gemm_concurrent