aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels
AgeCommit message (Collapse)Author
2021-11-04Add validate tests for CLConvolutionLayer and CLGEMMConvolutionLayer with ↵SiCongLi
post ops * Add validate tests * Restrict post ops support in ClGemmConv2d to only those that do not need im2col or col2im. In practice this means we only support post ops in conv1x1 with stride = 1, dilation = 1 and data layout = NHWC Resolves COMPMID-4435 Change-Id: I1fdf0c5d565a4624857250075ac76db35c2f383b Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6573 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-11-04Add PRelu to supported PostOps in:ramelg01
- ClGemmMatrixMultiplyReshapedKernel - ClGemmMatrixMultiplyNativeKernel - ClGemmMatrixMultiplyReshapedOnlyRhsKernel Resolves: COMPMID-4713 Change-Id: I3adcb1b3d4af37ebcbc3bee19cc1845885d08600 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6553 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-03Fix out-of-bound reads in cl gemm kernelsSiCongLi
* Revert "Remove padding in FP Cl Gemm kernels" This reverts commit 48717a3d38fef8d316cd4b9fd9a3bc1a43db736b. * Allow different boundary row handling strategies across native, reshaped and reshaped_only_rhs kernels by introducing a ELTWISE_OPERAND_ROW parameter to the macro Resolves COMPMID-4919 Change-Id: Icefc23c0760a6abb838fef1d0d5bda06b07c79e3 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6569 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-11-02Add post ops to ClGemmMatrixMultiplyReshapedOnlyRHSKernel and ↵SiCongLi
ClGemmMatrixMultiplyNativeKernel Part 3 Partially resolves: COMPMID-4435 Change-Id: Ifc5affa3a24a70942ca2d001380205df09b03ad7 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6550 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-01Remove padding in FP Cl Gemm kernelsSiCongLi
* Remove rhs and bias padding in ClGemmMatrixMultiplyNativeKernel * Rework ClGemmMatrixMultiplyReshapedOnlyRHSKernel to use the same padding boundary condition as the other kernels Partially resolves COMPMID-4435 Change-Id: I1c17af9cca0b5cb3be087ce160948b7b0e62d297 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6549 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-10-28Add experimental PostOp interface to ClGemmMatrixMultiplyReshapedKernel Part 1SiCongLi
This interface supports the fusion of multiple elementwise operations Partially resolves: COMPMID-4435 Change-Id: If68dd7dd98dcf239fde7cb1f0a4a6d4d1e899a6f Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6483 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-20Implement CLDirectConv3DKernel - uint8/int8Giorgio Arena
Resolve COMPMID-4663 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I5c3c1cffed5385c06b789543318f7f4d6096987e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6468 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-10-18Remove legacy GeMM kernels on OpenCLGian Marco Iodice
Resolves COMPMID-4446 Change-Id: I1d3c2391b67681f4d3af440826aa95b47a1288a6 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6444 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-10-18Fix precision issue in ChannelShuffleKernelPablo Marquez Tello
* Fixed the issue in NHWC Neon * Fixed the rounding error in CL * Added a new test case to reproduce the problem * Resolves COMPMID-4831 Change-Id: I1613168cad580ca5acefe8ba340130af05cffaff Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6454 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-15Fix CLConv3D filelist and commentsGiorgio Arena
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I4d48f1b8eba6681a9de0ae5f1fd8a4ad1edf7fe8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6439 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-14Implement CLDirectConv3D f32/f16Giorgio Arena
Resolve COMPMID-4660 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-10-13Improve performance of Softmax uint8 on GPUAdnan AlSinan
Resolves COMPMID-4805 Change-Id: I0acd4479f196cf9518995a60d3b57a9a49e0db57 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6413 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Freddie Liardet <frederick.liardet@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-09-23Fix inefficient store in gemmlowp_mm_reshaped_only_rhs_tGian Marco Iodice
- The out-of-boundary condition was performed also for PARTIAL_STORE_N0 = 0 Resolves: COMPMID-4774 COMPMID-4771 Change-Id: I0d7e078c67615b513ffeb66860f224999b5135fa Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6302 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-09-14Optimize ClScaleKernel on NHWC (f32/f16/int8)Gian Marco Iodice
The new kernel performs the computation on multiples elements. The OpenCL kernel has been re-implemented using the new TILE macros Resolves COMPMID-4803,COMPMID-4804 Change-Id: Iac8fead65e21b64567a05dbc4fbaa61d362443f9 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6235 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-09-09Remove padding from ClGemmMatrixMultiplyReshapedOnlyRhsKernelGiorgio Arena
Resolve COMPMID-4450 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I6f280d5d66ec43fb5cb06c83fe15a1f227ad165d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6232 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-09-08Fix vload_partial macros on OpenCLGiorgio Arena
When calling vload_partial, the macros were overriding the first values with a hidden double assignment Resolve COMPMID-4792 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I96bca60ae546fc34a71e69d5c471581a472d8ddf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6231 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-07Remove padding from ClGemmMatrixMultiplyReshapedKernelGiorgio Arena
Create new macros for loading values from memory while being aware of boundaries of the tensor to not generate page faults. Resolves: COMPMID-4447 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ia5fd0a5dcb40942bccd5e686307d0055e1a1dd82 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6226 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-06Revert "Remove padding from ClGemmMatrixMultiplyReshapedKernel"Pablo Marquez Tello
This reverts commit 50335fd3d0734157382741fcf1bfdaf630c60c4b. Resolves COMPMID-4792 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Ia6580143d9cf5a7bd5c87ca4214022f7c241ec6f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6214 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-09-03Remove padding from ClPool2dKernel NCHWGiorgio Arena
- Simplify NCHW kernel structure by removing old optimized paths - Merge quantized with fp kernels Resolve COMPMID-4722 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I79016b119619aed6a6193295601cd6517f14b88c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6183 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-09-03Fix CLNormalizationLayer NCHW border calculationSiCongLi
* Calculate border using both norm size and vec_size_x * Expose reference tensor printer Resolves: COMPMID-4793 Change-Id: I7bd8e49779baf7d6848271757bc7993aa1ed2960 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6201 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-01Remove padding from ClGemmMatrixMultiplyReshapedKernelMichele Di Giorgio
Create new macros for loading values from memory while being aware of boundaries of the tensor to not generate page faults. Resolves: COMPMID-4447 Change-Id: If9a455291e395ebd9070ebe5e120b3064d8fab29 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6168 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-23Remove padding from ClScaleKernelGiorgio Arena
- Merge quantized kernels with fp for bilinear interpolation (both NCHW and NHWC) - Pass dimensions at compile time rather than at run time - Use tile-based approach to rework the NCHW kernels - Remove unused functions/files Resolve COMPMID-4723 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ifcdf02beb9daa9f318395751b3c85eb2fe874082 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6138 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-25Reorganize the kernels into nhwc, nchw and common foldersAdnan AlSinan
The Following kernels have been split into nchw/nhwc kernels files: - batchnormalization_layer - batch_to_space - channel_shuffle - depth_to_space - dequantization_layer - im2col - normalization_layer - normalize_planar_yuv_layer - normalize_planar_yuv_layer_quantized - pooling_layer - pooling_layer_quantized - remap - reorg_layer - scale - scale_quantized - space_to_batch - space_to_depth - upsample_layer - winograd_filter_transform - winograd_input_transform - winograd_output_transform The following kernels have been moved to nchw folder: - direct_convolution1x1 - direct_convolution3x3 - direct_convolution5x5 - direct_convolution_quantized - prior_box_layer The following kernels have been moved to nhwc folder: - direct_convolution - dwc_native_fp_nhwc - dwc_native_quantized_nhwc The following kernels have been removed: - sobel_filter While the rest kerenls have been moved to the common folder. Partially resolves COMPMID-4453 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-22Fix oclgrind int overflow warningFreddie Liardet
Fix warning found by oclgrind. Also remove duplicated code. Resolves COMPMID-4675 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: I6ad56cc0130b5df936f1e070db116695269317df Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5974 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-14Fix CL kernel compilation failureMichalis Spyrou
A typo was causing errors when building some CL kernels. Resolves: COMPMID-4637 Change-Id: I7d1821e1a046ef8ccd306f72afe192732dd2ad1e Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5944 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-13Add in-place calculation support for CL elementwise arithmetic kernelsSheri Zhang
- Add in-place calculation support in ClArithmeticKernel, ClSaturatedArithmeticKernel and ClMulKernel - Add in-place test cases Resolves: COMPMID-4431 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Id484bdb76b74478a33fedb471ae0c7f799c599f6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5885 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-09Limit the LOOP_UNROLLING on the kernel heightGian Marco Iodice
To reduce the risk of having a long OpenCL kernel, we limit the loop unrolling on the kernel height. In particular, we unroll only if the kernel height is less than or equal to 5 Resolves COMPMID-4604 Change-Id: Iece787989f36afb90f1c7676b53d9015e652bdbd Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5916 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-08Remove redundant implementations of Add/Sub operatorsGeorgios Pinitas
Allows only implementations where inputs/output are of the same data type and removes legacy Computer Vision ones. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ia2b3d23a04236aab682f0c36a1110a30f7c06d1c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5900 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: Ibe533f79c2860f9cac8e921895d5a8f947753a5c Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5893 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-01Add quantization helper functions for OpenCLGeorgios Pinitas
Add `T_QUANTIZE8_PER_TENSOR` and `T_QUANTIZE8_PER_CHANNEL` that can be used to perform quantization on tile constructs. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ie8e1efcb895c64715620acf2212b1de9a857ee0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5891 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-30Revert "Rework OpenCL Depthwise Convolution"Gian Marco Iodice
This reverts commit 561c176598cd14245e2e7918fdf136d1c888d1da. Reason for revert: <validation> Change-Id: I6f2d61c27520439bb538e9265736532104b24cf8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5127 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Enable global pooling optimization on OpenCLGian Marco Iodice
- Add loop unrolling on X and use POOL_X and POOL_Y defines for the for loop Resolves COMPMID-4573 Change-Id: I33cb825cfb55912ccb0ab9d03bd33a3dab4c8b44 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5872 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework gemmlowp reshaped_only_rhs using the new macrosGiorgio Arena
Resolve COMPMID-4416 Change-Id: I83cdf0de7adaf4d465ffebd494ab913182072485 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5788 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: I253dd5d959a70783c82e62b1771a5e9f91621cb0 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-06-22Add FP16 support to CLRemapFreddie Liardet
Add FP16 support to CLRemap when data layout is NHWC. Add relevant tests for FP16 and validation. Update NERemap function level to be consistent with CLRemap. Add depreciation notice for uint_8 only function level methods. Resolves: COMPMID-4335 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: If05f06801aef7a169b73ff1ebe760a42f11ca05c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5816 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-15Add NHWC support to CLRemapFrederick Liardet
Add NHWC support to CLRemap, also add relevant tests. Partially resolves COMPMID-4335. Change-Id: I119bea99be497fb85d5cd83a10f8d4e8e1f97f17 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5773 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-07Revert "Add optimization for global pooling in pooling_layer.cl"Pablo Tello
* Resolves COMPMID-4544 This reverts commit 50929ef951880469b9d579323d2f9c9f5025327d. Signed-off-by: Pablo Tello <pablo.tello@arm.com> Change-Id: Ibc40054cecc2fe99bff48154f11fe5b602b4a64a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/330380 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5774 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-02Fix bug in PReluLayer when input is 1xN sizeFreddie Liardet
Fix issue where gpu elementwise operation kernel would not compile when input is 1xN size and prelu is the chosen operator. Add relevant tests for 1xN input. Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: If0651cfa399ca1d9c65f2632b75536c7931f27d4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5760 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-01Fuse activation in ClDirectConv2dKernel for float typesGeorgios Pinitas
Resolves: COMPMID-4430 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I9a40033e09223d601460a7e52cc297c58c9a2737 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5757 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-20Enable unroll through pragma based on DDK versionGiorgio Arena
Change-Id: Id98a107d512369d3799961011a84e9cc4d99e775 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5679 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-19clCreateKernel failure of CL/ChannelShuffle/U8/Manuel Bottini
Create right vector for size of 16 elements Resolves: COMPMID-4532 Change-Id: I74dc89d158a6aa3f50db710cbc002a92156e2263 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5673 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-19Port DepthConvert to new ApiGeorgios Pinitas
- Renames DepthConvert to Cast - Ports both NEDepthConverLayer and CLDepthConvert variants - Removes legacy shift capability from DepthConvert, allowing only shifts of 0 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I806a0f8eb23d23502b632c529fda7edde19c8176 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5565 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-17Add macro to manually unroll loops in OpenCLGiorgio Arena
Change-Id: I092d10534816f5b3717325952033c351b8231380 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5643 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-17Fix MeanStdDevNormalizationLayer reference outputting nan for FP16Giorgio Arena
- Bring the epsilon up to 1e-3 for FP16 (both backends) since it was causing the reference's variance being negative and its square root being NaN - Bring the epsilon up to 1e-7 for FP16 NEON test for the same problem on the NEON kernel - Adjust the CL kernel's vec_size when input tensor's width < 16 and use macros agnostic of vector size for sum reduction - Add previously mismatching tensor shapes Resolve COMPMID-4354 Change-Id: I823c871aacb72326f90c86b24cb16c3e2d4bd15e Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5630 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-05-13Remove padding from CLChannelShuffleLayerKernelManuel Bottini
Only for NHWC Instead of starting from the input vector and insert the values in the output, we now create an output vector by taking from the input tensor. This makes it easier to store the result in the output with border awareness Resolves: COMPMID-4445 Change-Id: I77cf2d2d5ff30a383cddb8a3c81c462d7a39fd2e Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5627 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-05-12Fix GEMMLowp output stage validation crash when input's first dimension == 1Giorgio Arena
- Change vectors' fixed length of 4 in gemmlowp_output_stage_quantize_down_float with its kernel's adjusted vec_size Resolve COMPMID-4355 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I371ecf114356fb6a8d18c5c3727f09ae247484bd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5631 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-12Fix bug in Select operator when input is 1xNFreddie Liardet
Fix issue where gpu select kernel would not compile when input was 1xN size. Also add relevant tests for 1xN input. Resolves: COMPMID-4357 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: Ib5f339379e9cc7ef05605312889dbad34cbfe5dd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5620 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-12Fix 'ARM_DOT_K0XN0' macro redefinedGiorgio Arena
Resolve COMPMID-4339 Change-Id: Ieacd288ca8f69e04d88692a093325a24198ebe3e Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5623 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-07Fix missing DATA_TYPE in DOT_PRODUCT4_INTEGER8 OpenCL macroGian Marco Iodice
- DOT_PRODUCT8_INTEGER8 and DOT_PRODUCT16_INTEGER8 are calling DOT_PRODUCT4_INTEGER8 without passing DST_DATA_TYPE Resolves COMPMID-4491 Change-Id: I394bd2f9208489e820885e49ed40e607d6470620 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5594 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-07Remove TODOsSheri Zhang
Remove TODO/FIXME either already done or won't do. Partially resolve: COMPMID-4471 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Iec8f25b9bf1b2a70c072dd17d44625fa93e84ed1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5591 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com>