pub fn _mm_sub_ph(a: __m128h, b: __m128h) -> __m128h
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
Intel’s documentation