aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels/direct_convolution5x5.cl
AgeCommit message (Collapse)Author
2021-07-25Reorganize the kernels into nhwc, nchw and common foldersAdnan AlSinan
The Following kernels have been split into nchw/nhwc kernels files: - batchnormalization_layer - batch_to_space - channel_shuffle - depth_to_space - dequantization_layer - im2col - normalization_layer - normalize_planar_yuv_layer - normalize_planar_yuv_layer_quantized - pooling_layer - pooling_layer_quantized - remap - reorg_layer - scale - scale_quantized - space_to_batch - space_to_depth - upsample_layer - winograd_filter_transform - winograd_input_transform - winograd_output_transform The following kernels have been moved to nchw folder: - direct_convolution1x1 - direct_convolution3x3 - direct_convolution5x5 - direct_convolution_quantized - prior_box_layer The following kernels have been moved to nhwc folder: - direct_convolution - dwc_native_fp_nhwc - dwc_native_quantized_nhwc The following kernels have been removed: - sobel_filter While the rest kerenls have been moved to the common folder. Partially resolves COMPMID-4453 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-01-19Remove padding from direct convolution - OpenCLGian Marco Iodice
- Refactor direct convolution for NHWC - Remove old kernels for NHWC - Change the heuristic in CLConvolutionLayer.cpp. The new direct convolution implementation is faster than FFT Resolves COMPMID-3908 Change-Id: Iee15ce7b04e21847b6eaae5c6d3c1b18180e7efc Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4876 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2020-07-09COMPMID-3324: Adjusting capitalization of Arm copyright claim to reflect Arm ↵Michele Di Giorgio
preferred presentation Change-Id: Ib7dcfcbb24b408999dfae366b9da396485aacf78 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3525 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-1610: Fixed CLDirectConvolution mismatchesPablo Tello
Kernel size 5x5 layout NHWC. Change-Id: Ia82ff211d1c954df228962b5c2c5ad8df7112449 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/151740 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-807: NHWC support in CLDirectConvolution.Pablo Tello
Change-Id: I8738aca2cc0104e4c4d7c9605762ab59fce10a33 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/137333 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-298, APPBROWSER-306: Reimplement the common code of compute shaderJoel Liang
The new common code of compute shader is in file helpers_cs.h Rewrite the direct_convolution1x1.cs and softmax_layer.cs to use the new common code. It will also remove the dependence of the token pasting operator (##). We'll remove the "##" support after we rewrite all of the compute shader code. Change-Id: Icd8553ef6b61ad484a8507590ac8ed499bd47061 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/95455 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Frank Lei <frank.lei@arm.com> (cherry picked from commit 0a4f83570d261f839d9866b68979efe8d7a95883) Reviewed-on: http://mpd-gerrit.cambridge.arm.com/95601 Reviewed-by: Jim He <jim.he@arm.com>
2018-11-02COMPMID-631: Merge branches/gles_compute branchAnthony Barbier
Last commit: commit b25c5f68042b0c81bf611d59a1bb8535e1c42497 Author: Xinghang Zhou <xinghang.zhou@arm.com> Date: Wed Oct 25 18:48:10 2017 +0800 Synced validation's tolerances of GCSoftmax from cl side Change-Id: Ibe72054205c1c8721845d679a31af7ed0a7c5cf6 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/93283 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-606 - Fix for S8 failuresGian Marco Iodice
Added volatile to the for loop counter in direct_convolution1x1.cl, direct_convolution3x3.cl and direct_convolution5x5.cl This fix seems to solve the problem on Samsung S8 about the mismatches and clEnqueueMapBuffer. Change-Id: I51687ec94bb897af2698ceab1133c988821e4c4e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/90601 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-477 - Optimized Direct Convolution 3x3 and 5x5 (f32) for Bifrost.Gian Marco Iodice
Each work-item computes 4x3 output elements in case of 3x3 convolution and 4x2 in case of 5x5 convolution Change-Id: I6ebbaff8b7e971c1f90d5845c0b58d2a40f39df5 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84345 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-478 Implemnt CL direct convolution 5x5steniu01
Change-Id: I4b975aff310cda9964d8c5dcee182d5d5c82741b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83474 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>