Integrate new pretranspose_b_array with extra fused transpose of B

This patch fuses the transposition taking place in Acl with the transformations done in arm_gemm (called pretranspose_b_array) if the underlying kernel and transform supports it. This should improve start-up time (as it's for constant Rhs matrices) and memory footprint. The transformations in arm_gemm are kernel specific. The Rhs matrix is transformed into certain layouts to improve the performance. Resolves: COMPMID-6595 Change-Id: Id2932dd966e59f903c279417bebcea83d9a42464 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11144 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
author: Gunes Bayir <gunes.bayir@arm.com> 2024-02-12 21:32:51 +0000
committer: Gunes Bayir <gunes.bayir@arm.com> 2024-02-21 10:36:22 +0000
commit: ef637398a8c2060e15de438020c53331da8bd6dd (patch)
tree: b1a1738736c9b6b49e76767e44bf4b77bf732876 /src/core/NEON/kernels/arm_gemm/transform.cpp
parent: 0a48c4c83b598991b4d4235f870c24d9e6634b20 (diff)
download: ComputeLibrary-ef637398a8c2060e15de438020c53331da8bd6dd.tar.gz
1 files changed, 9 insertions, 2 deletions
diff --git a/src/core/NEON/kernels/arm_gemm/transform.cpp b/src/core/NEON/kernels/arm_gemm/transform.cpp
index 5aa62f0fe4..45e4f0e1de 100644
--- a/src/core/NEON/kernels/arm_gemm/transform.cpp
+++ b/src/core/NEON/kernels/arm_gemm/transform.cpp
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2021-2023 Arm Limited.
+ * Copyright (c) 2021-2024 Arm Limited.
  *
  * SPDX-License-Identifier: MIT
  *
@@ -134,7 +134,14 @@ template void Transform<8, 1, true, VLType::None>(float *, const __fp16 *, int,
 #endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
 #ifdef ARM_COMPUTE_ENABLE_BF16
 template void Transform<8, 1, true, VLType::None>(float *, const bfloat16 *, int, int, int, int, int);
-#endif
+#endif // ARM_COMPUTE_ENABLE_BF16
 #endif // AArch32
 
+#if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+template void Transform<12, 1, false, VLType::None>(float *, const __fp16 *, int, int, int, int, int);
+#endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+#ifdef ARM_COMPUTE_ENABLE_BF16
+template void Transform<12, 1, false, VLType::None>(float *, const bfloat16 *, int, int, int, int, int);
+#endif // ARM_COMPUTE_ENABLE_BF16
+
 } // namespace arm_gemm
author	Gunes Bayir <gunes.bayir@arm.com>	2024-02-12 21:32:51 +0000
committer	Gunes Bayir <gunes.bayir@arm.com>	2024-02-21 10:36:22 +0000
commit	ef637398a8c2060e15de438020c53331da8bd6dd (patch)
tree	b1a1738736c9b6b49e76767e44bf4b77bf732876 /src/core/NEON/kernels/arm_gemm/transform.cpp
parent	0a48c4c83b598991b4d4235f870c24d9e6634b20 (diff)
download	ComputeLibrary-ef637398a8c2060e15de438020c53331da8bd6dd.tar.gz