Implement MatMul Function and Operator with Floating Point support for CPU

- Implements MatMul function and operator for floating point datatype FP16/FP32 - Includes support for transposing dynamic tensors prior to matrix multiplication. - Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors) - Updates fixture to allow for testing fused activation in MatMul - Adds tests for matmul with and without fused activation Resolved: [COMPMID-5898] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
author: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> 2023-03-23 22:21:31 +0000
committer: Mohmun02 <MohammedSuhail.Munshi@arm.com> 2023-04-13 09:24:52 +0000
commit: a1b1e41bb261f5613f443fed7071936a360686ed (patch)
tree: eff2978a682fb24c8078df9c6c796fde51074255 /src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
parent: 8b7f42aa0e76a65a4ffa46ee875df6a6220695ae (diff)
download: ComputeLibrary-a1b1e41bb261f5613f443fed7071936a360686ed.tar.gz
1 files changed, 33 insertions, 1 deletions
diff --git a/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h b/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
index 0c51c92359..588c45294a 100644
--- a/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
+++ b/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2018-2022 Arm Limited.
+ * Copyright (c) 2018-2023 Arm Limited.
  *
  * SPDX-License-Identifier: MIT
  *
@@ -82,6 +82,38 @@ public:
 public:
     /** If supported create a Compute Library function else fallback to the arm_gemm function.
      *
+     * @note Configuring "batches"
+     * The shapes of @p a @p b and @p d are arranged as follows:
+     *     Lowest dimension <-> Highest dimension
+     * a: [K, M, Batch, Multi]
+     * b: [N, K, Multi]
+     * d: [N, M, Batch, Multi]
+     *
+     * The "Batch" refers to where "Batch" number of MxK slices of tensor a multiplies with a single KxN slice of b
+     * The "Multi" refers to where "Multi" number of individual multiplication of a with b
+     *
+     * E.g. the following are some example input shape configurations
+     *
+     * (1) Normal 2D gemm
+     * a: [K=3, M=4]
+     * b: [N=5, K=3]
+     * d: [N=5, M=4]
+     *
+     * (2) Batches of a sharing b (e.g. gemm-based batched convolution where b is the shared )
+     * a: [K=3, M=4, Batch=9]
+     * b: [N=5, K=3]
+     * d: [N=5, M=4, Batch=9]
+     *
+     * (3) "Batches" of independent gemm (e.g. batched matmul)
+     * a: [K=3, M=4, Batch=1, Multi=7]
+     * b: [N=5, K=3, Multi=7]
+     * d: [N=5, M=4, Batch=1, Multi=7]
+     *
+     * (4) "Batches" of independent gemm where b is also shared
+     * a: [K=3, M=4, Batch=4, Multi=7]
+     * b: [N=5, K=3, Multi=7]
+     * d: [N=5, M=4, Batch=4, Multi=7]
+     *
      * @param[in]  a    Input tensor (Matrix A)
      * @param[in]  b    Input tensor (Matrix B)
      * @param[in]  c    Input tensor (Matrix C) used to pass the bias for quantized calculations
author	Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>	2023-03-23 22:21:31 +0000
committer	Mohmun02 <MohammedSuhail.Munshi@arm.com>	2023-04-13 09:24:52 +0000
commit	a1b1e41bb261f5613f443fed7071936a360686ed (patch)
tree	eff2978a682fb24c8078df9c6c796fde51074255 /src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
parent	8b7f42aa0e76a65a4ffa46ee875df6a6220695ae (diff)
download	ComputeLibrary-a1b1e41bb261f5613f443fed7071936a360686ed.tar.gz