cpu_matmul_quantization_cpp_short¶
C++ API example demonstrating how one can perform reduced precision matrix-matrix multiplication using MatMul and the accuracy of the result compared to the floating point computations.
Concepts:
Static and dynamic quantization
Asymmetric quantization
Run-time output scales: dnnl::primitive_attr::set_output_scales() and DNNL_RUNTIME_F32_VAL
Run-time zero points: dnnl::primitive_attr::set_zero_points() and DNNL_RUNTIME_S32_VAL