gemm_concurrent

Function gemm_concurrent 

Source
pub fn gemm_concurrent<const M: usize, const N: usize, L: LoopOrder>(
    alpha: bool,
    a: MatrixTileSlice<'_>,
    b: MatrixTileSlice<'_>,
    beta: bool,
    c: MatrixTileSliceMut<'_>,
)
Expand description

Performs tile-level GEMM with recursive parallelization.

The matrix is recursively split along rows (if rows > M blocks) or columns (if cols > N blocks) until tiles are small enough, then all tiles are processed in parallel using rayon.

§Type Parameters

  • M - Minimum block rows before parallelization stops
  • N - Minimum block columns before parallelization stops
  • L - Loop ordering strategy (see LoopOrder)

§Performance

For best performance, choose M and N based on your matrix sizes. The defaults used in the codebase are M=1, N=16, which work well for many workloads.