aboutsummaryrefslogtreecommitdiff
path: root/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
diff options
context:
space:
mode:
authorMohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>2023-03-23 22:21:31 +0000
committerMohmun02 <MohammedSuhail.Munshi@arm.com>2023-04-13 09:24:52 +0000
commita1b1e41bb261f5613f443fed7071936a360686ed (patch)
treeeff2978a682fb24c8078df9c6c796fde51074255 /src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
parent8b7f42aa0e76a65a4ffa46ee875df6a6220695ae (diff)
downloadComputeLibrary-a1b1e41bb261f5613f443fed7071936a360686ed.tar.gz
Implement MatMul Function and Operator with Floating Point support for CPU
- Implements MatMul function and operator for floating point datatype FP16/FP32 - Includes support for transposing dynamic tensors prior to matrix multiplication. - Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors) - Updates fixture to allow for testing fused activation in MatMul - Adds tests for matmul with and without fused activation Resolved: [COMPMID-5898] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
Diffstat (limited to 'src/cpu/operators/internal/CpuGemmAssemblyDispatch.h')
-rw-r--r--src/cpu/operators/internal/CpuGemmAssemblyDispatch.h34
1 files changed, 33 insertions, 1 deletions
diff --git a/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h b/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
index 0c51c92359..588c45294a 100644
--- a/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
+++ b/src/cpu/operators/internal/CpuGemmAssemblyDispatch.h
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2018-2022 Arm Limited.
+ * Copyright (c) 2018-2023 Arm Limited.
*
* SPDX-License-Identifier: MIT
*
@@ -82,6 +82,38 @@ public:
public:
/** If supported create a Compute Library function else fallback to the arm_gemm function.
*
+ * @note Configuring "batches"
+ * The shapes of @p a @p b and @p d are arranged as follows:
+ * Lowest dimension <-> Highest dimension
+ * a: [K, M, Batch, Multi]
+ * b: [N, K, Multi]
+ * d: [N, M, Batch, Multi]
+ *
+ * The "Batch" refers to where "Batch" number of MxK slices of tensor a multiplies with a single KxN slice of b
+ * The "Multi" refers to where "Multi" number of individual multiplication of a with b
+ *
+ * E.g. the following are some example input shape configurations
+ *
+ * (1) Normal 2D gemm
+ * a: [K=3, M=4]
+ * b: [N=5, K=3]
+ * d: [N=5, M=4]
+ *
+ * (2) Batches of a sharing b (e.g. gemm-based batched convolution where b is the shared )
+ * a: [K=3, M=4, Batch=9]
+ * b: [N=5, K=3]
+ * d: [N=5, M=4, Batch=9]
+ *
+ * (3) "Batches" of independent gemm (e.g. batched matmul)
+ * a: [K=3, M=4, Batch=1, Multi=7]
+ * b: [N=5, K=3, Multi=7]
+ * d: [N=5, M=4, Batch=1, Multi=7]
+ *
+ * (4) "Batches" of independent gemm where b is also shared
+ * a: [K=3, M=4, Batch=4, Multi=7]
+ * b: [N=5, K=3, Multi=7]
+ * d: [N=5, M=4, Batch=4, Multi=7]
+ *
* @param[in] a Input tensor (Matrix A)
* @param[in] b Input tensor (Matrix B)
* @param[in] c Input tensor (Matrix C) used to pass the bias for quantized calculations