diff options
author | Gunes Bayir <gunes.bayir@arm.com> | 2024-03-06 09:58:40 +0000 |
---|---|---|
committer | Gunes Bayir <gunes.bayir@arm.com> | 2024-03-11 10:02:41 +0000 |
commit | 9167c9cd1c684218f76a3c0ec97574dd6f381b98 (patch) | |
tree | 7a9608f1f6861ad164697a0bbdc784be92a8d3e5 /tests/datasets | |
parent | e77736fe4150648d2fd0649cf61c1bade928d69d (diff) | |
download | ComputeLibrary-9167c9cd1c684218f76a3c0ec97574dd6f381b98.tar.gz |
Prefer indirect Gemm vs. Direct convolution if supported
Indirect GEMM uses optimized assembly path while Direct Conv uses the fallback Acl kernel for convolution.
In certain cases, where the input tensor is large and filter size is greater than 7 (e.g. 9x9 filters), heuristics fall back to Direct Conv algorithm where it could still prefer the assembly path if the data layout is NHWC. This is more important when SME2 kernels are present.
Resolves: COMPMID-6900
Change-Id: Ia611c975eee0423615113fcaeaa8f9eef0421456
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11254
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Diffstat (limited to 'tests/datasets')
-rw-r--r-- | tests/datasets/LargeConvolutionLayerDataset.h | 12 |
1 files changed, 11 insertions, 1 deletions
diff --git a/tests/datasets/LargeConvolutionLayerDataset.h b/tests/datasets/LargeConvolutionLayerDataset.h index 72f73ba6d9..c299f2460b 100644 --- a/tests/datasets/LargeConvolutionLayerDataset.h +++ b/tests/datasets/LargeConvolutionLayerDataset.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2017-2020, 2023 Arm Limited. + * Copyright (c) 2017-2020, 2023-2024 Arm Limited. * * SPDX-License-Identifier: MIT * @@ -294,6 +294,16 @@ public: } }; +class VeryLargeConvolutionLayerDataset final : public ConvolutionLayerDataset +{ +public: + VeryLargeConvolutionLayerDataset() + { + // Tensor size > 1e7 bytes && weight dimensions > 7 + add_config(TensorShape(336U, 336U, 32U), TensorShape(9U, 9U, 32U, 64U), TensorShape(64U), TensorShape(168U, 168U, 64U), PadStrideInfo(2, 2, 4, 4)); + } +}; + class LargeGroupedConvolutionLayerDataset final : public ConvolutionLayerDataset { public: |