aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-03-29Fix GCC13 compiler errorsPablo Marquez Tello
* Removed namespace arm_compute::utils::requires to fix the build error ‘requires’ is a keyword in C++20 [-Wc++20-compat] * Added missing includes for cstdint.h * Resolves MLCE-1040 Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-29Add quantized support for unary elementwise in CPUViet-Hoa Do
* Add quantized unary elementwise in CPU using LUT. * Widen the input data range of the test suite. - Fix CPU exponential function overflow/underflow range. - Fix saturation issue of CL round operator. Resolves: COMPMID-5763 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-27Fix BatchToSpaceFixtureSiCong Li
* Use a vector to represent the (static) block shape instead of an N-D Tensor. The previous use of ND Tensor as block shape was wrong, not adhering to the specification, and non-functional (only first dim was used anyway). * The fixture now accepts a static block shape, because the dynamic case is not properly implemented and will be deprecated for now. * Fix an assertion error in reference implementation. Partially resolves COMPMID-5918 Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs T kernelsRamy Elgammal
Resolves: COMPMID-5952, COMPMID-5956 Change-Id: Idbd14538e7660792254072fa9631a6f03966f89b Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9371 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Work around CLScale compiler-specific issueSiCong Li
Resolves COMPMID-5985 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I0e789619f09e3adefe3655df347390f057300c0f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9373 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs NT kernelsGunes Bayir
Resolves: COMPMID-5945, COMPMID-5954 Change-Id: I7b27021d21f8e08c4896f6b1f595a75125064f9e Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9356 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-23Round to nearest with ties to away from zero in ReluPablo Marquez Tello
* This patch adds support for rounding modes in vmlaq_qasymm8_signed which is used to compute Relu for quantized types * Partially resolves MLCE-1018 Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-21gemm_interleaved: Set up the accumulation buffer properly in alternateDavid Mansell
threading mode. Resolves: COMPMID-5844 Change-Id: Iceb0018114bbca2bfdac4d4406936f9b260539e9 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9070 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
2023-03-21Add dynamic weights for CPU fully connected layerViet-Hoa Do
Resolves: COMPMID-5917 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-20Implement OpenCL MatMul for Lhs T Rhs T/NT FP32/16Gunes Bayir
- Implement opencl kernel for LHS transposed and RHS non-transposed - Implement opencl kernel for LHS transposed and RHS transposed - Add validation tests Resolves: COMPMID-5953, COMPMID-5955 Change-Id: I55589acbffe86c44e29807574975978a1ec09bad Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implementation of RSQRT for quantized int8Ramy Elgammal
Resolves: COMPMID-5863 Change-Id: I9ff67face62826c1d335a6b941e8516be39bdac8 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/488768 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9225 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
- Implement ClNativeMatMulKernel class - Implement opencl kernel for LHS non-transposed and RHS non-transposed - Implement opencl kernel for LHS non-transposed and RHS transposed - Add test fixture and dataset for matmul - Implement transpose_tensor() for reference implementation to transpose high dimensional tensors Resolves: COMPMID-5944, COMPMID-5951 Co-authored-by: Gunes Bayir <gunes.bayir@arm.com> Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1d5b8978f41be27baddb3153ade880472141573f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-14Add CropInfo to BatchToSpace reference and fixtureSiCong Li
Partially resolves COMPMID-5918, COMPMID-5865 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib3b01e7dc1c944184a4c038045bf0469fbb9ff45 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9321 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-13arm_gemm: Add SME2 FP16 kernels.David Mansell
Resolves: COMPMID-5966 Change-Id: Ic0d694493178da029a297643855bd0cff01b174f Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9302 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-13[ONCPUML-1174] Allow src/weights mismatch for fixed formatJonathan Deakin
Without this, we have to pass in weights to be NHWC, even if they are in fact blocked/interleaved for consumption by a fixed format kernel. Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Change-Id: I9ee8720a21a16b17816dbecf6308e1668ddda59c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9285 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-08Add support for arbitrary parameters for CPU GatherViet-Hoa Do
* The shape of input and indices tensors, and the gather axis can be any number, as long as these are valid and the output tensor doesn't have more dimensions than the library supports. * Update the reference code to be more generic and straightforward. * Add necessary test cases. Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Resolves: COMPMID-5919 Change-Id: Ic7e2032777aa97ecc147f61d5388528697508ab1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9199 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-07Add sigmoid and tanh for dynamic fusionViet-Hoa Do
* Add sigmoid and tanh activation functions for dynamic fusion. * Add corresponding tests, but both activation functions share the same fixture implementation. Resolves: COMPMID-5939 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0aae0eaa18b746ce89680d2773c66e09b0f854ce Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9257 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-07GEMM: SME: Allow threading for quantized GEMMs.David Mansell
The SME kernels for quantized int8/uint8 GEMMs erroneously required that maxthreads==1 before they could be selected. This resulted in them not being available on multi-thread runs. Remove that restriction. Resolves COMPMID-5962 Change-Id: Ia7933d0c66020b5e2981604ca97ff7ead95ec14e Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9274 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-03-07Resolve the presence of variables that are unused in release mode in Dynamic ↵Omar Al Khatib
Fusion source files Resloves: [COMPMID-5960] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I1b11f01c51a029082ed05823717b4c4ae4897798 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9270 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-06Fix LWS search space used by CLTunerSiCong Li
* Ensure CLTuner uses the real GWS used by run(), instead of the static GWS (which is usually changed at run time), by caching GWS in each kernel Note this is a somewhat inelegant workaround. The real issue stems from the fact that execution window and scheduler are very much coupled with our operator run() / run_op() method. (Please see COMPMID-5934) * Restrict LWS values to explore within GWS bound for exhaustive mode * Refactor gws_from_window() to include all the information required to calculate GWS * Log lws search space used for tuning * Fix ClDirectConv2dKernel config id Resolves COMPMID-5892 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I420490d8b94d13ada2e44eb0a12078f883379334 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9193 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-03Add weights_info as optional input for NEDeconvolutionLayerAnnop Wongwathanarat
This is so that we can leverage fixed format kernel when using gemm convolution method. Partially resolves: [ONCPUML-1129] Change-Id: I61ffa74f5cd9d75579dbc1f9aa187371f855e932 Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9248 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-03NEGEMMLowpMatrixMultiplyCore should be configured for optimized int8 kernel.Ethan Doe
Currently the validation routine incorrectly prevents optimized INT8 Gemm kernel from being used if the input is QASYMM8 and output type is S32. This change allows QASYMM8 input and S32 output types to leverage optimized kernel. Signed-off-by: Ethan Doe <yidoe@amazon.com> Change-Id: I65b060f522795db07d6d4df86fb7c6ddd1c626d4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9250 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-02Fix direct conv2d in dynamic fusionViet-Hoa Do
* Put input and output tensor shape value directly to the CL code. * Use texture for weights when it is possible. Resolves: COMPMID-5938 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ib53b310a80ce857eac36564b352136fdde55b131 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9249 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-01Add support for kernel indices in MaxpoolAdnan AlSinan
- Add a max pooling implementation that returns kernel indices. - Add a parameter in pooling info object to pick kernel indices impl. - Add validation tests. Resolves: [ONCPUML-1187] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-28Add an option to use lowest for max-poolingAdnan AlSinan
- Add a parameter in PoolingLayerInfo class to pick which value to use as min for max-pooling. Resolves: [ONCPUML-1166] Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I34e1cccc15176bbf31523c61e99f3188ddca23e1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8989 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-27Add build option to disable threads hintViet-Hoa Do
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Resolves: COMPMID-5936 Change-Id: Iedfcc632f5d900865f38bd7b164121af48546542 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-02-22Fix configuration files required for Bazel and CMake buildsGunes Bayir
Partially Resolves: COMPMID-5909 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Ia0ae7bfe22f8866e3d82f62098757e3bc55f46ca Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9183 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-02-15Fix Intermittent Neon™ ReduceMean QASYMM8 MismatchMohammed Suhail Munshi
- Dividing scale by number of elements causes accuracy loss due to limitations in float datatype and truncation to int - Adds rounding after division on aarch64 to negate this. Resolves: [COMPMID-5839] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I54ef0f7e56f39da1fa5f30378f551b5ca419a61d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/492456 Tested-by: bsgcomp <bsgcomp@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9110 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-14Extend skip upsampling for deconvolution for non-1x1 kernelsAnnop Wongwathanarat
Skip upsampling step for deconvolution when input strides are 1 regardless of kernel size. This is achieved by setting correct paddings for unit strides convolution. Resolve: [ONCPUML-1183] Change-Id: Ief88f9fe30f6f56d3358e3cf6a506ab8b5691f18 Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9134 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-09Fix performance regression in Transposed ConvolutionGunes Bayir
Resolves: COMPMID-5849 Change-Id: I86f8bbc1f3a7c12c66d5ad8fcd74dd9e69629aa0 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9102 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: Jakub Sujak <jakub.sujak@arm.com>
2023-02-08Update CPU kernels to remove x19 and w19Michael Tyler
Resolves: COMPMID-5805 Change-Id: Idf720bbb136474810086f5089c5ed23b3f79835a Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9081 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-02-08Add support for dilation > 1 in assembly DepthwiseConvolutionPablo Marquez Tello
* Resolve COMPMID-5689 Change-Id: I81a3791ad054db59562b76d1c729f2b2168aee8b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Signed-off-by: Andrew Mundy <andrew.mundy@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-03Fix armv7a failing GEMMConvolutionLayer testsMohammed Suhail Munshi
Resolves: [COMPMID-5854] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ib0228409be5e816acca7e123f2660eb01a79e38f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9078 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Fix GEMMLowp/Batched MatMul mismatches on CPUMohammed Suhail Munshi
- Fixes Column Offset matrix is not being iterated through in y dimension Resolves : COMPMID-5795 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I0190474be404b4f0e171855739cfd0a48cbed5bc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9020 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add new operator AddMulAdd for Neon™ backend for Float/Quantized typesGunes Bayir
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types. The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel). The inputs are - input1 : nD tensor [X, Y, Z, W, ..] - input2 : nD tensor [X, Y, Z, W, ..] - add_coef : 1D tensor [X] - mul_coef : 1D tensor [X] The outputs are - out1 : nD tensor (intermediate output) [X, Y, Z, W, ..] - out2 : nD tensor (final output) [X, Y, Z, W, ..] The operation can be summarized as follows: out1 <- input1 + input2 out2 <- Act(out1 * mul_coef + add_coef) The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr. The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward. Resolves: COMPMID-5463 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add Subtraction operator to Dynamic Fusion interfaceRamy Elgammal
Partially-Resolves: COMPMID-5518 Change-Id: I8358784815bcac461d50e384fa7bc96f476d3983 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9045 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Remove fixed format strides hackJonathan Deakin
- Remove hack in CpuGemmAssemblyDispatch.cpp which tried to guess strides for fixed format kernels. Instead, expect that strides will have been correctly set on weights externally - Update fixed format test fixtures to set the strides - If the fixed format uses fast math mode, then weights should be of type BFLOAT16. Change the validation logic to accept this. Resolves: [ONCPUML-1131] Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I0f18d8b86b0f639be25fd122fa06a591e90645f2 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8985 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Fixed clang-cl linker errorsPablo Tello
* Linker errors caused by the declarations of the DWC functions not matching the functions implementation. Changed the functions declaration to match the implementation. * Partially resolves MLCE-996 Change-Id: Ie6458c80bc425deaa6c239828b9f4a2a6646f503 Signed-off-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9056 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Add Multiplication operator (FP only) to Dynamic Fusion InterfaceJakub Sujak
Note: we use a separate test fixture for Multiplication op instead of reusing ElementwiseBinaryFixture to avoid exposing the internal enum ElementwiseOp to the public utils/TypePrinters.h as required by the data test case macros to print the test data. We also do not consider modifying the enum ArithmeticOp in the standard interface to include MUL without an implementation. Future work should consider refactoring this test fixture into the ElementwiseBinaryFixture to reduce the total number of fixtures/code duplication. Resolves: COMPMID-5779 Change-Id: I84207658ce0407095b028fca0ab7bfa2950255ec Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9013 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Bazel and CMake buildsDavid Svantesson
Resolves: ONCPUML-1110, ONCPUML-1109 Co-authored-by: Georgios Pinitas <georgios.pinitas@arm.com> Co-authored-by: Joe Ramsay <joe.ramsay@arm.com> Signed-off-by: David Svantesson <david.svantesson@arm.com> Change-Id: Iea693dbe53bf0af87867d6a9e0d1fd9fbe59ef3a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8981 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-30Skip upsampling for deconvolution when not neededAnnop Wongwathanarat
If the input tensor's stride is 1 and the kernel size is 1x1, skip upsampling step and pass the input tensor pointer for convolution directly. Partially resolve: [ONCPUML-1137] Change-Id: I9de9444ff99cf35d44a51ccbe0fa6facc1035d27 Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8994 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-26Fix num_threads_hint() on macos.Pablo Marquez Tello
* Query the number of big cores rather than the total number of cores on the system * Resolves MLCE-994 Change-Id: I88cb6a4fd2ece9a035edd4cc5c0f5cf4aef93468 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9006 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-25Implement dynamic fusion softmax operatorRamy Elgammal
- Return aux tensorInfo by get_aux_tensors() at runtime to init the aux tensor with the right size. - Keep softmax unfusable for this commit - Hence, added Tensor3D to template writer arguments declaration, for sake of keeping dynamic fusion softmax componenets' kernels matching their cl counterparts. Resolves: COMPMID-5523 Change-Id: I667f39545db925f667036ef448302c79a0330373 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/483924 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8986 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-24Change dynamic fusion API to return destination tensor infoGunes Bayir
The new dynamic fusion API is introduced in the following patch: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8906 For each operator (except Conv2D, which is migrated in the above patch), we - remove destination tensor from is_supported, validate and create calls - make create_op return ITensorInfo* to the intermediate destination object Affected operators: - DepthwiseConv2D - Cast - Elementwise Ops - Clamp - Reshape - Resize Resolves: COMPMID-5777 Change-Id: Ib60ec8a5f081752808455d7a7d790f2ed0627059 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8991 Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-23Fix performance regression when stride equal to kernel sizeGunes Bayir
This patch prefers Gemm-based transposed deconvolution algorithm in case kernel sizes and strides are equal to each other in each dimension. Resolves: COMPMID-5815 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I22052e48341f3284d6bafbdbcce4a48399dc8e87 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8970 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2023-01-20Add missing direct conv2d tests to dynamic fusionSiCong Li
* Add direct conv2d tests as a separate fixture so that we can enable future direct conv2d specific tests * Move Conv2dAttributes to its own file Partially resolves COMPMID-5736 Change-Id: I530649488faf3bbed1a4fc7d16a74063bfdf33db Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8928 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Improve the strided_slice layer on all data typesOmar Al Khatib
Resolves : [COMPMID-5110] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I3889a79c311b697c56d7369305c862433e856487 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8903 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Add Auxiliary tensorsSiCong Li
The asssign_memory_descriptors method could not automatically assign Auxiliary tensors. Therefore changes are made to allow developers to explicitly mark auxiliary tensors. However, to avoid ambiguity between auxiliary and "intermediate" tensors, we solidify the definitions of both: Intermediate tensors are a strictly topological term. They are defined as "inner" tensors within a workload, hidden from the user, as opposed to input and output tensors exposed to the users. Auxiliary tensors are a subcategory of Intermediate tensors, and are also about memory allocation. They are intermediate tensors that need real memory backing. For more details please see the documentation of MemoryType enum Rename MemoryType::NoAlloc to MemoryType::Virtual Partially resolves: COMPMID-5523 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ibde44c2ec1570be9423e0fb38b53bb136ffc36dd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8940 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-01-20Dynamic Fusion Pooling Layer 2dMohammed Suhail Munshi
- Adds Dynamic fusion PoolingLayer2D as Unfusable Operator - Indices are not supported - Adds tests for F32/F16 Datatypes Resolves : [COMPMID-5520] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I0d112545eb9209c836bf9ea153069f8627531e0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8893 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Add enable_fast_math for NEDeconvolutionLayerAnnop Wongwathanarat
Resolves: [ONCPUML-1128] Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Change-Id: I287a71222d3f0289d8cccfcb15383b0a930a55e6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8952 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>