aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels
AgeCommit message (Collapse)Author
2021-09-23Fix inefficient store in gemmlowp_mm_reshaped_only_rhs_tGian Marco Iodice
- The out-of-boundary condition was performed also for PARTIAL_STORE_N0 = 0 Resolves: COMPMID-4774 COMPMID-4771 Change-Id: I0d7e078c67615b513ffeb66860f224999b5135fa Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6302 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-09-14Optimize ClScaleKernel on NHWC (f32/f16/int8)Gian Marco Iodice
The new kernel performs the computation on multiples elements. The OpenCL kernel has been re-implemented using the new TILE macros Resolves COMPMID-4803,COMPMID-4804 Change-Id: Iac8fead65e21b64567a05dbc4fbaa61d362443f9 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6235 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-09-09Remove padding from ClGemmMatrixMultiplyReshapedOnlyRhsKernelGiorgio Arena
Resolve COMPMID-4450 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I6f280d5d66ec43fb5cb06c83fe15a1f227ad165d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6232 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-09-08Fix vload_partial macros on OpenCLGiorgio Arena
When calling vload_partial, the macros were overriding the first values with a hidden double assignment Resolve COMPMID-4792 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I96bca60ae546fc34a71e69d5c471581a472d8ddf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6231 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-07Remove padding from ClGemmMatrixMultiplyReshapedKernelGiorgio Arena
Create new macros for loading values from memory while being aware of boundaries of the tensor to not generate page faults. Resolves: COMPMID-4447 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ia5fd0a5dcb40942bccd5e686307d0055e1a1dd82 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6226 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-06Revert "Remove padding from ClGemmMatrixMultiplyReshapedKernel"Pablo Marquez Tello
This reverts commit 50335fd3d0734157382741fcf1bfdaf630c60c4b. Resolves COMPMID-4792 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Ia6580143d9cf5a7bd5c87ca4214022f7c241ec6f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6214 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-09-03Remove padding from ClPool2dKernel NCHWGiorgio Arena
- Simplify NCHW kernel structure by removing old optimized paths - Merge quantized with fp kernels Resolve COMPMID-4722 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I79016b119619aed6a6193295601cd6517f14b88c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6183 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-09-03Fix CLNormalizationLayer NCHW border calculationSiCongLi
* Calculate border using both norm size and vec_size_x * Expose reference tensor printer Resolves: COMPMID-4793 Change-Id: I7bd8e49779baf7d6848271757bc7993aa1ed2960 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6201 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-01Remove padding from ClGemmMatrixMultiplyReshapedKernelMichele Di Giorgio
Create new macros for loading values from memory while being aware of boundaries of the tensor to not generate page faults. Resolves: COMPMID-4447 Change-Id: If9a455291e395ebd9070ebe5e120b3064d8fab29 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6168 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-23Remove padding from ClScaleKernelGiorgio Arena
- Merge quantized kernels with fp for bilinear interpolation (both NCHW and NHWC) - Pass dimensions at compile time rather than at run time - Use tile-based approach to rework the NCHW kernels - Remove unused functions/files Resolve COMPMID-4723 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ifcdf02beb9daa9f318395751b3c85eb2fe874082 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6138 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-25Reorganize the kernels into nhwc, nchw and common foldersAdnan AlSinan
The Following kernels have been split into nchw/nhwc kernels files: - batchnormalization_layer - batch_to_space - channel_shuffle - depth_to_space - dequantization_layer - im2col - normalization_layer - normalize_planar_yuv_layer - normalize_planar_yuv_layer_quantized - pooling_layer - pooling_layer_quantized - remap - reorg_layer - scale - scale_quantized - space_to_batch - space_to_depth - upsample_layer - winograd_filter_transform - winograd_input_transform - winograd_output_transform The following kernels have been moved to nchw folder: - direct_convolution1x1 - direct_convolution3x3 - direct_convolution5x5 - direct_convolution_quantized - prior_box_layer The following kernels have been moved to nhwc folder: - direct_convolution - dwc_native_fp_nhwc - dwc_native_quantized_nhwc The following kernels have been removed: - sobel_filter While the rest kerenls have been moved to the common folder. Partially resolves COMPMID-4453 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-22Fix oclgrind int overflow warningFreddie Liardet
Fix warning found by oclgrind. Also remove duplicated code. Resolves COMPMID-4675 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: I6ad56cc0130b5df936f1e070db116695269317df Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5974 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-14Fix CL kernel compilation failureMichalis Spyrou
A typo was causing errors when building some CL kernels. Resolves: COMPMID-4637 Change-Id: I7d1821e1a046ef8ccd306f72afe192732dd2ad1e Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5944 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-13Add in-place calculation support for CL elementwise arithmetic kernelsSheri Zhang
- Add in-place calculation support in ClArithmeticKernel, ClSaturatedArithmeticKernel and ClMulKernel - Add in-place test cases Resolves: COMPMID-4431 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Id484bdb76b74478a33fedb471ae0c7f799c599f6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5885 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-09Limit the LOOP_UNROLLING on the kernel heightGian Marco Iodice
To reduce the risk of having a long OpenCL kernel, we limit the loop unrolling on the kernel height. In particular, we unroll only if the kernel height is less than or equal to 5 Resolves COMPMID-4604 Change-Id: Iece787989f36afb90f1c7676b53d9015e652bdbd Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5916 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-08Remove redundant implementations of Add/Sub operatorsGeorgios Pinitas
Allows only implementations where inputs/output are of the same data type and removes legacy Computer Vision ones. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ia2b3d23a04236aab682f0c36a1110a30f7c06d1c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5900 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: Ibe533f79c2860f9cac8e921895d5a8f947753a5c Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5893 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-01Add quantization helper functions for OpenCLGeorgios Pinitas
Add `T_QUANTIZE8_PER_TENSOR` and `T_QUANTIZE8_PER_CHANNEL` that can be used to perform quantization on tile constructs. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ie8e1efcb895c64715620acf2212b1de9a857ee0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5891 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-30Revert "Rework OpenCL Depthwise Convolution"Gian Marco Iodice
This reverts commit 561c176598cd14245e2e7918fdf136d1c888d1da. Reason for revert: <validation> Change-Id: I6f2d61c27520439bb538e9265736532104b24cf8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5127 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Enable global pooling optimization on OpenCLGian Marco Iodice
- Add loop unrolling on X and use POOL_X and POOL_Y defines for the for loop Resolves COMPMID-4573 Change-Id: I33cb825cfb55912ccb0ab9d03bd33a3dab4c8b44 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5872 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework gemmlowp reshaped_only_rhs using the new macrosGiorgio Arena
Resolve COMPMID-4416 Change-Id: I83cdf0de7adaf4d465ffebd494ab913182072485 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5788 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: I253dd5d959a70783c82e62b1771a5e9f91621cb0 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-06-22Add FP16 support to CLRemapFreddie Liardet
Add FP16 support to CLRemap when data layout is NHWC. Add relevant tests for FP16 and validation. Update NERemap function level to be consistent with CLRemap. Add depreciation notice for uint_8 only function level methods. Resolves: COMPMID-4335 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: If05f06801aef7a169b73ff1ebe760a42f11ca05c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5816 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-15Add NHWC support to CLRemapFrederick Liardet
Add NHWC support to CLRemap, also add relevant tests. Partially resolves COMPMID-4335. Change-Id: I119bea99be497fb85d5cd83a10f8d4e8e1f97f17 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5773 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-07Revert "Add optimization for global pooling in pooling_layer.cl"Pablo Tello
* Resolves COMPMID-4544 This reverts commit 50929ef951880469b9d579323d2f9c9f5025327d. Signed-off-by: Pablo Tello <pablo.tello@arm.com> Change-Id: Ibc40054cecc2fe99bff48154f11fe5b602b4a64a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/330380 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5774 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-02Fix bug in PReluLayer when input is 1xN sizeFreddie Liardet
Fix issue where gpu elementwise operation kernel would not compile when input is 1xN size and prelu is the chosen operator. Add relevant tests for 1xN input. Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: If0651cfa399ca1d9c65f2632b75536c7931f27d4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5760 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-01Fuse activation in ClDirectConv2dKernel for float typesGeorgios Pinitas
Resolves: COMPMID-4430 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I9a40033e09223d601460a7e52cc297c58c9a2737 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5757 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-20Enable unroll through pragma based on DDK versionGiorgio Arena
Change-Id: Id98a107d512369d3799961011a84e9cc4d99e775 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5679 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-19clCreateKernel failure of CL/ChannelShuffle/U8/Manuel Bottini
Create right vector for size of 16 elements Resolves: COMPMID-4532 Change-Id: I74dc89d158a6aa3f50db710cbc002a92156e2263 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5673 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-19Port DepthConvert to new ApiGeorgios Pinitas
- Renames DepthConvert to Cast - Ports both NEDepthConverLayer and CLDepthConvert variants - Removes legacy shift capability from DepthConvert, allowing only shifts of 0 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I806a0f8eb23d23502b632c529fda7edde19c8176 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5565 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-17Add macro to manually unroll loops in OpenCLGiorgio Arena
Change-Id: I092d10534816f5b3717325952033c351b8231380 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5643 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-17Fix MeanStdDevNormalizationLayer reference outputting nan for FP16Giorgio Arena
- Bring the epsilon up to 1e-3 for FP16 (both backends) since it was causing the reference's variance being negative and its square root being NaN - Bring the epsilon up to 1e-7 for FP16 NEON test for the same problem on the NEON kernel - Adjust the CL kernel's vec_size when input tensor's width < 16 and use macros agnostic of vector size for sum reduction - Add previously mismatching tensor shapes Resolve COMPMID-4354 Change-Id: I823c871aacb72326f90c86b24cb16c3e2d4bd15e Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5630 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-05-13Remove padding from CLChannelShuffleLayerKernelManuel Bottini
Only for NHWC Instead of starting from the input vector and insert the values in the output, we now create an output vector by taking from the input tensor. This makes it easier to store the result in the output with border awareness Resolves: COMPMID-4445 Change-Id: I77cf2d2d5ff30a383cddb8a3c81c462d7a39fd2e Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5627 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-05-12Fix GEMMLowp output stage validation crash when input's first dimension == 1Giorgio Arena
- Change vectors' fixed length of 4 in gemmlowp_output_stage_quantize_down_float with its kernel's adjusted vec_size Resolve COMPMID-4355 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I371ecf114356fb6a8d18c5c3727f09ae247484bd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5631 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-12Fix bug in Select operator when input is 1xNFreddie Liardet
Fix issue where gpu select kernel would not compile when input was 1xN size. Also add relevant tests for 1xN input. Resolves: COMPMID-4357 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: Ib5f339379e9cc7ef05605312889dbad34cbfe5dd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5620 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-12Fix 'ARM_DOT_K0XN0' macro redefinedGiorgio Arena
Resolve COMPMID-4339 Change-Id: Ieacd288ca8f69e04d88692a093325a24198ebe3e Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5623 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-07Fix missing DATA_TYPE in DOT_PRODUCT4_INTEGER8 OpenCL macroGian Marco Iodice
- DOT_PRODUCT8_INTEGER8 and DOT_PRODUCT16_INTEGER8 are calling DOT_PRODUCT4_INTEGER8 without passing DST_DATA_TYPE Resolves COMPMID-4491 Change-Id: I394bd2f9208489e820885e49ed40e607d6470620 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5594 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-07Remove TODOsSheri Zhang
Remove TODO/FIXME either already done or won't do. Partially resolve: COMPMID-4471 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Iec8f25b9bf1b2a70c072dd17d44625fa93e84ed1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5591 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-05-05Adding S32 support to CLPixelWiseMultiplicationSuhail Munshi
Partially resolves : COMPMID-3793 Signed-off-by: Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Id82e00c784f0a039017fd896f11671bdda2dd4ab Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5530 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-04Fix bug on CLReductionOperationGiorgio Arena
Execution window along the X axis needs to be collapsed on the 3rd axis (rather than the 2nd) since there could be implicit padding added along the Y Resolve COMPMID-4425 Change-Id: I9623a31749b737fea7c623cabdcfbf77cbe8f6dc Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5551 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-30Update operator list documentation. Part 2.Teresa Charlin
All data type and data layout information for the operators are store in the function header files Signed-off-by: Teresa Charlin <teresa.charlinreyes@arm.com> Change-Id: I30b564f7eda6bbd99bf3ad36ddb6639ac118eb8b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/319829 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5531 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-04-30Add optimization for global pooling in pooling_layer.clGian Marco Iodice
- Simplify the implementation when the pooling size has the same spatial dimensions of the input tensor - Rework the heuristic for F32/F16 - Add test for validating the global pooling path - Fix compare_dimensions in validation. The validation fails because we have different number of dimensions for NCHW and NHWC (e.g. 1,1,2,1(NCHW) -> 2,1,1,1(NHWC) Resolves COMPMID-4426 Change-Id: Ia53ee659a9fbc3d011f286a8150d1be9d6d2cd05 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5533 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-04-29Remove Global pooling optimizationMichalis Spyrou
It doesn't take into consideration the padding. Change-Id: Ie7a2a0c7aee93fd7007deb6e46f6bec0e7f06066 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5529 Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-29Remove OpenCL padding: CLReductionOperationKernelGiorgio Arena
Change the parallel implementation across the X, now every thread computes one row Add missing test for MEAN_SUM Make reduction on any axis != 0 work with num_channels > 1 Resolve COMPMID-3917 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ib0f99540104e3c253bcd1ea637833db533f5e76e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5522 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-29Remove stale/solved TODOsMichele Di Giorgio
Change-Id: I5c440f4c6ca4186adcfa926e6b7d924086671f29 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5520 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-04-29Fix Global Pooling failuresMichalis Spyrou
Correctly calculate the filter size when we exclude padding. Resolves: COMPMID-4408 Change-Id: Ia96d6ddbda58d23a5a0e02854ecb22089d538f94 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-04-28Add explicit cast on GEMMLowp kernelMichalis Spyrou
Oclgrind flags this as a warning. Resolves: COMPMID-4338 Change-Id: Id5a075894027d867bdd21937626f052acb05b531 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5512 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-04-27Add optimization for global pooling in pooling_layer.clGian Marco Iodice
- Simplify the implementation when the pooling size has the same spatial dimensions of the input tensor - Rework the heuristic for F32/F16 - Add test for validating the global pooling path - Fix compare_dimensions in validation. The validation fails because we have different number of dimensions for NCHW and NHWC (e.g. 1,1,2,1(NCHW) -> 2,1,1,1(NHWC) Change-Id: Iba680cb30bf2a5d0952265a4cc9794f368549ca5 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5510 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-27Fixed CTS failures CLInstanceNormPablo Tello
* Resolves COMPMID-4400 Change-Id: I54c33a017c735194fbf4437d1c7df465208bc0ca Signed-off-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5505 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-04-23[Nightly #1129] CL/Winograd/ConvolutionLayer/F16 mismatch on Mate9Manuel Bottini
Fixing Conv5x5, Conv5x1, Conv1x5 Resolves: COMPMID-4380 Change-Id: I5206d9b85b1d73f6010f02c119aae91266395ba7 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5485 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>