inference_int8_matmul_cpp_short¶
C++ API example demonstrating how one can use MatMul fused with ReLU in INT8 inference.
Concepts:
Asymmetric quantization
Run-time output scales: dnnl::primitive_attr::set_output_scales() and DNNL_RUNTIME_F32_VAL
Run-time zero points: dnnl::primitive_attr::set_zero_points() and DNNL_RUNTIME_S32_VAL
Create primitive once, use multiple times
Run-time tensor shapes: DNNL_RUNTIME_DIM_VAL
Weights pre-packing: use dnnl::memory::format_tag::any