Waive overflow issue in a64_gemm_s8_4x4

a64_gemm_s8_4x4 assembly kernel has an overflow issue, if fixed, could case the kernel to perform much slower. For this to happen, Lhs matrix must have -128 values eight positions apart and the Rhs matrix must have -128 in the same positions as well. So, the multiplication becomes (-128)*(-128) + (-128)*(-128) = 32768, which overflows and results in being -32768. The possibility of this happening is very low and when it happens, it's going to affect only one pixel. Therefore, we waive the issue and report it in the errata. We also modify the relevant test with minimum Int8 to use as -127 instead of -128. Change-Id: Ia36407d67c439eb14c145aede2f07729bc41db2e Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11818 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
author: Gunes Bayir <gunes.bayir@arm.com> 2024-07-02 18:18:19 +0100
committer: Gunes Bayir <gunes.bayir@arm.com> 2024-07-03 14:18:04 +0000
commit: 7dcad7e55ce97246baf62d7f85d8fcd6db754e98 (patch)
tree: 6bd40f24ccc6552f6ed6bed6b697076796e04cad
parent: a3f238a44d9f306c77be0177f13d22ae3f3bcc57 (diff)
download: ComputeLibrary-7dcad7e55ce97246baf62d7f85d8fcd6db754e98.tar.gz
2 files changed, 14 insertions, 1 deletions
diff --git a/docs/user_guide/errata.dox b/docs/user_guide/errata.dox
index c195dc7851..a9795489d2 100644
--- a/docs/user_guide/errata.dox
+++ b/docs/user_guide/errata.dox
@@ -30,6 +30,16 @@ namespace arm_compute
 
 @section S7_1_errata Errata
 
+- (COMPMID-7109) Under certain conditions, Quantized GEMM may result in very few mismatches due to 16-bit accumuation overflow
+    - Versions: >= v17.09
+    - Oses: Linux, Android, MacOS, Windows.
+    - Conditions:
+        - Compile the latest Arm Compute Library for armv8a
+        - Device without dot product support
+        - In the matrix multiplication
+            - Lhs matrix must have -128 values eight positions apart from each other in its row
+            - Rhs matrix must have -128 values at the same positions as Lhs
+
 - (COMPMID-6904) Fix out-of-bound memory write for non-optimized FP16 GeMM kernel.
     - Versions: >= v17.09 && < v24.06
     - Oses: Linux, Android, MacOS, Windows.
diff --git a/tests/validation/fixtures/ConvolutionLayerFixture.h b/tests/validation/fixtures/ConvolutionLayerFixture.h
index 2a317e9b9b..51084533f9 100644
--- a/tests/validation/fixtures/ConvolutionLayerFixture.h
+++ b/tests/validation/fixtures/ConvolutionLayerFixture.h
@@ -204,7 +204,10 @@ protected:
             {
                 if(_use_dynamic_output_quant)
                 {
-                    std::uniform_int_distribution<int32_t> distribution(-128, 127);
+                    // Using -127 as the lower bound because of possible overflow.
+                    // This is a known issue and reported in the errata.
+                    // See COMPMID-7109 for more details
+                    std::uniform_int_distribution<int32_t> distribution(-127, 127);
                     library->fill(tensor, distribution, i);
                 }
                 else
author	Gunes Bayir <gunes.bayir@arm.com>	2024-07-02 18:18:19 +0100
committer	Gunes Bayir <gunes.bayir@arm.com>	2024-07-03 14:18:04 +0000
commit	7dcad7e55ce97246baf62d7f85d8fcd6db754e98 (patch)
tree	6bd40f24ccc6552f6ed6bed6b697076796e04cad
parent	a3f238a44d9f306c77be0177f13d22ae3f3bcc57 (diff)
download	ComputeLibrary-7dcad7e55ce97246baf62d7f85d8fcd6db754e98.tar.gz