From 7dcad7e55ce97246baf62d7f85d8fcd6db754e98 Mon Sep 17 00:00:00 2001 From: Gunes Bayir Date: Tue, 2 Jul 2024 18:18:19 +0100 Subject: Waive overflow issue in a64_gemm_s8_4x4 a64_gemm_s8_4x4 assembly kernel has an overflow issue, if fixed, could case the kernel to perform much slower. For this to happen, Lhs matrix must have -128 values eight positions apart and the Rhs matrix must have -128 in the same positions as well. So, the multiplication becomes (-128)*(-128) + (-128)*(-128) = 32768, which overflows and results in being -32768. The possibility of this happening is very low and when it happens, it's going to affect only one pixel. Therefore, we waive the issue and report it in the errata. We also modify the relevant test with minimum Int8 to use as -127 instead of -128. Change-Id: Ia36407d67c439eb14c145aede2f07729bc41db2e Signed-off-by: Gunes Bayir Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11818 Benchmark: Arm Jenkins Tested-by: Arm Jenkins Reviewed-by: Omar Al Khatib Comments-Addressed: Arm Jenkins --- docs/user_guide/errata.dox | 10 ++++++++++ tests/validation/fixtures/ConvolutionLayerFixture.h | 5 ++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/user_guide/errata.dox b/docs/user_guide/errata.dox index c195dc7851..a9795489d2 100644 --- a/docs/user_guide/errata.dox +++ b/docs/user_guide/errata.dox @@ -30,6 +30,16 @@ namespace arm_compute @section S7_1_errata Errata +- (COMPMID-7109) Under certain conditions, Quantized GEMM may result in very few mismatches due to 16-bit accumuation overflow + - Versions: >= v17.09 + - Oses: Linux, Android, MacOS, Windows. + - Conditions: + - Compile the latest Arm Compute Library for armv8a + - Device without dot product support + - In the matrix multiplication + - Lhs matrix must have -128 values eight positions apart from each other in its row + - Rhs matrix must have -128 values at the same positions as Lhs + - (COMPMID-6904) Fix out-of-bound memory write for non-optimized FP16 GeMM kernel. - Versions: >= v17.09 && < v24.06 - Oses: Linux, Android, MacOS, Windows. diff --git a/tests/validation/fixtures/ConvolutionLayerFixture.h b/tests/validation/fixtures/ConvolutionLayerFixture.h index 2a317e9b9b..51084533f9 100644 --- a/tests/validation/fixtures/ConvolutionLayerFixture.h +++ b/tests/validation/fixtures/ConvolutionLayerFixture.h @@ -204,7 +204,10 @@ protected: { if(_use_dynamic_output_quant) { - std::uniform_int_distribution distribution(-128, 127); + // Using -127 as the lower bound because of possible overflow. + // This is a known issue and reported in the errata. + // See COMPMID-7109 for more details + std::uniform_int_distribution distribution(-127, 127); library->fill(tensor, distribution, i); } else -- cgit v1.2.1