aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/kernels/CLDepthwiseConvolutionLayerNativeKernel.cpp
AgeCommit message (Collapse)Author
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Extend cl image support to input and output tensorsGian Marco Iodice
- Add support for texture image to input and output of direct convolution - Extend T_LOAD2D_INDIRECT macro to read values from cl image storages Resolves COMPMID-5715 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Idb0410f53f6d0763cd9e39895a7cbf9bc826d33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8904 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-06Rework DepthwiseConvolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5632 Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02Enable Winograd-based conv2d when IFM>=8 on GpuGian Marco Iodice
From an internal performance evaluation, it seems that Winograd-based Conv2D offers better performance than alternative methods such as direct convolution and gemm-based conv already from IFM=8. Before the condition was for IFM>=16 Resolves COMPMID-5532 Change-Id: I9ff04835d6fd07f5f0abeec9645c9d9cc913b6b7 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8147 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-05-26Disable unsafe FP optimizations causing accuracy issuesGunes Bayir
Resolves: COMPMID-5324 Change-Id: I289b1bb42296c562cb90b918c20def8c6c1825d2 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7639 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-11-17Improve start-up time for depthwise convolutionSheri Zhang
- Pass source and destination tensor dimension info at runtime Resolves: COMPMID-4887 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Ib7c9f3ce6fb7cef600f7b0cd0fadafa4fa6888a1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6635 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-08-25Move CPU/GPU files from Core/Runtime to the respective backend foldersGeorgios Pinitas
Legacy structure contained two libraries core/runtime with two backends in each. We reduce the core/runtime libraries to a single library thus merging the backend files Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-06Fix manual LOOP_UNROLLINGGian Marco Iodice
The issue is caused by the number of iterations passed to LOOP_UNROLLING. When we use the manual LOOP_UNROLLING, the number of iterations must be less than or equal to 128. To overcome this problem, we create a utility function to check if any of the critical iterations (kernel dimensions) are beyond that limit. If so, the utility function, disable the manual loop unrolling. Resolves COMPMID-4609 Change-Id: I7221c967609e462a5abd1cbb74e2a120f344fcb3 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5913 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: Ibe533f79c2860f9cac8e921895d5a8f947753a5c Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5893 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Implement FP GPU depthwise convolution 1x1 kernel for in-place computationSiCongLi
* Implement in-place graph node mutator for 1x1 depthwise convolution * Add in-place to validation fixture except for DepthwiseConvolutionLayerNativeValidationFixture as it would be a duplicate test otherwise (DepthwiseConvolutionLayerNative test tests the underlying kernel) Resolves: COMPMID-4432 Change-Id: Id7f10f5ebdce7d49f550c0b62dbaaab7f5b59d29 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5874 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-06-30Revert "Rework OpenCL Depthwise Convolution"Gian Marco Iodice
This reverts commit 561c176598cd14245e2e7918fdf136d1c888d1da. Reason for revert: <validation> Change-Id: I6f2d61c27520439bb538e9265736532104b24cf8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5127 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Set up the framework to choose the default LWSGiorgio Arena
Resolve COMPMID-4486 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ib38b7943bd776a6d75d1da163908724c49eae73d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5864 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: I253dd5d959a70783c82e62b1771a5e9f91621cb0 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-04-19Port DepthwiseConvolution to new APIMichalis Spyrou
Resolves: COMPMID-4185 Change-Id: Ib5f22356356a022d567bb18d44ea272b62d10ebf Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5424 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-11-16COMPMID-3973: CTS failure in QASYMM8_SIGNED Depthwise and Fully connected ↵Michele Di Giorgio
when fusing Bounded ReLU in Android R GpuAcc Change-Id: I6cfee002846d0c84de7e0a5f141dfc4807b93b33 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4421 Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-11-07COMPMID-3639: (3RDPARTY_UPDATE) Move CL kernels to srcSang-Hoon Park
Change-Id: I10d27db788e5086adae1841e3e2441cd9b76ef84 Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4310 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-10-23COMPMID-3713 Remove OpenCL padding: CLDepthwiseConvolutionLayerNativeKernelGiorgio Arena
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ic43aba8a6a0a106fc4c1f665ff5cc3ccb31f403d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4235 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2020-10-20COMPMID-3637: Move utility headers from arm_compute to srcSang-Hoon Park
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: If9d6fa8c900b68c4b6fd373f2fc1f9abb83ea917 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4145 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-07-09COMPMID-3324: Adjusting capitalization of Arm copyright claim to reflect Arm ↵Michele Di Giorgio
preferred presentation Change-Id: Ib7dcfcbb24b408999dfae366b9da396485aacf78 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3525 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-04-22COMPMID-3280: Make all ML primitives for CL use the new interface - Part1 - Fix1Manuel Bottini
- const fix in the CLKernels part 1 Change-Id: I17340cb6ff26afd52b14b46645efedbe07aef1b6 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3067 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-04-17COMPMID-3280: Make all ML primitives for CL use the new interface - Part 1Manuel Bottini
- Only CLKernels have been updated Change-Id: Ife55b847c2e39e712a186eb6ca452503d5b66937 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3001 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-03-10COMPMID-3069: Improve build time by splitting up ToolchainSupport.hMatthew Bentham
Split out the parts of ToolchainSupport coming from <memory> and the parts coming from <string> into their own new header files. This accounts for 99% of uses of ToolchainSupport, which means that expensive header files such as arm_neon.h don't need to be included everywhere. Knocks about 10% of compilation time off kernel files. Signed-off-by: Matthew Bentham <matthew.bentham@arm.com> Change-Id: I2ae718fe766b5ff28608812b0f686f30eeac1b21 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/2852 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2020-01-06COMPMID-2757: Add support for QASYMM8_SIGNED in CLDepthwiseConvolutionLayerMichele Di Giorgio
Change-Id: I1f292f98bc3a213ba5b26ac88aa78160c809cb87 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/2540 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-11-28COMPMID-2609: Enable quantization with multiplier greater than 1 on OpenCLMichele Di Giorgio
Change-Id: I050f1f84e214e61f7cbb0197a672b68a4940edae Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/2158 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2019-10-30COMPMID-2306: CLDepthwiseConvolution: support for QUANT8_PER_CHANNEL_SYMMMichele Di Giorgio
Change-Id: I18c886400daa2dcba0b91011bc4e503d807a4732 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/2143 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-10-15COMPMID-2600: Implement a new and generic depthwise convolution for CL ↵Michele Di Giorgio
QASYMM8 NHWC The NCHW case is supported at function level by permuting the inputs/outputs to NHWC. This patch also removes CLDirectConvolutionLayerOutputStageKernel which is deprecated and some kernels which were only used in the generic case of depthwise convolution. Change-Id: I91e0f02d0a2f4a4a352e08c248e648944137fe68 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/2056 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2019-10-08COMPMID-2486: Remove disabled compiler warningsMichalis Spyrou
Removed -Wno-unused-parameter and -Wno-deprecated-declarations compilation flags. Plus, 3RDPARTY_UPDATE. Change-Id: I43098c7af527d5651aad3c597b508a56f8813dda Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/2041 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-09-12COMPMID-2599: Implement a new and generic depthwise convolution on OpenCL ↵Gian Marco Iodice
(Fp32/FP16-NHWC) Part 1 Change-Id: I5e1d27a7006199e9229e455a1df9bfc2ed4e8341 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1898 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>