pub fn gemm_block(a: MatrixBlock, b: MatrixBlock, c: &mut MatrixBlock)Expand description
Performs block-level GEMM: C = A * B + C for 64 x 64 bit blocks.
§Arguments
a- Left input block (64 x 64 bits)b- Right input block (64 x 64 bits)c- Accumulator block (64 x 64 bits)
For efficiency reasons, we mutate C in-place.
§Implementation Selection
- x86_64 with AVX-512: Uses optimized assembly kernel
- Other platforms: Falls back to scalar implementation