Add in place summation to CPU GEMM kernels

Instead of dispatching the sum postop for GEMM kernels to a separate kernel + add, that requires an extra destination sized allocation, plus 3 extra load/stores per element, just do it in the GEMM kernel. Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Co-authored-by: Milos Puzovic <milos.puzovic@arm.com> Change-Id: I7a1f2da3300875fa1ac88b705a34390969518077 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
author: Radu Salavat <radu.salavat@arm.com> 2024-02-27 18:32:26 +0000
committer: Radu Salavat <radu.salavat@arm.com> 2024-04-11 08:47:50 +0000
commit: f1f1f87132690a8061801ef1a4638d637c780df7 (patch)
tree: 8ad4c3739217b3bc6281f4e0b9a7a63fe6c3f9bb /src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp
parent: 1322065a3fbd15b00dbfb0969d6b438b5ba15530 (diff)
download: ComputeLibrary-f1f1f87132690a8061801ef1a4638d637c780df7.tar.gz
1 files changed, 9 insertions, 1 deletions
diff --git a/src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp b/src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp
index b25505a85d..94e86c6077 100644
--- a/src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp
+++ b/src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2021-2023 Arm Limited.
+ * Copyright (c) 2021-2024 Arm Limited.
  *
  * SPDX-License-Identifier: MIT
  *
@@ -65,6 +65,7 @@ cpu::AsmGemmInfo init_assembly_metadata(const GEMMInfo &info)
     asm_info.activation_info         = info.activation_info();
     asm_info.output_stage            = info.gemmlowp_output_stage();
     asm_info.fast_mode               = info.fast_math();
+    asm_info.accumulate              = info.accumulate();
 
     return asm_info;
 }
@@ -343,6 +344,13 @@ Status CpuGemmLowpMatrixMultiplyCore::validate(const ITensorInfo *a,
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
 
+    // When using accumulation(in place summation), for now, the only supported DataType for output is S32.
+    if (gemm_info.accumulate())
+    {
+        ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE,
+                                        "Accumulation is not supported for output QASYMM8/QASYMM8_SIGNED");
+    }
+
     GEMMInfo           info          = gemm_info;
     const ITensorInfo *matrix_a_info = a;
     const ITensorInfo *matrix_b_info = b;
author	Radu Salavat <radu.salavat@arm.com>	2024-02-27 18:32:26 +0000
committer	Radu Salavat <radu.salavat@arm.com>	2024-04-11 08:47:50 +0000
commit	f1f1f87132690a8061801ef1a4638d637c780df7 (patch)
tree	8ad4c3739217b3bc6281f4e0b9a7a63fe6c3f9bb /src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp
parent	1322065a3fbd15b00dbfb0969d6b438b5ba15530 (diff)
download	ComputeLibrary-f1f1f87132690a8061801ef1a4638d637c780df7.tar.gz