pub fn gemm<L: LoopOrder>(
alpha: bool,
a: MatrixTileSlice<'_>,
b: MatrixTileSlice<'_>,
beta: bool,
c: MatrixTileSliceMut<'_>,
)Expand description
Performs tile-level GEMM with a specified loop ordering.
This is the sequential (non-parallel) version. For large matrices, use gemm_concurrent
instead.
ยงLoop Ordering
The choice of loop order affects cache locality and performance. Benchmarking suggests RIC is optimal for most cases, but this depends on matrix dimensions.