aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/kernels
AgeCommit message (Collapse)Author
2024-01-18Fix divide-by-zero compilation errorViet-Hoa Do
* CONVERT_TO_TENSOR4D_STRUCT_NO_STEP is implemented and used in some CL kernels in the way that causes divide-by-zero issue. - Since the steps are all zeros, the issue might have been ignored by the compiler. Resolves: COMPMID-6795 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0fb38fc62d63671b8abefa39b3d9b3ca6f49c7fe Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10967 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-31[GPU] Update Reverse layer to allow negative axis and reversed axis orderAdnan AlSinan
- Adds option to use negative axis and inverted axis. - Adds validation tests for the above. Resolves COMPMID-6459 Change-Id: I88afd845d078f92c82ec8529ce7241fccd4c417e Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10523 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-10-12Remove padding from CL comparison operatorViet-Hoa Do
* Add support for processing left-over vector to comparison kernel. * Combine native and quantized versions of CL comparison. Resolves: COMPMID-6424 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I31d43bdf0eab999cee6fa8144b5d8e921a1093e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10467 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-10-11Optimize CL reduction operationViet-Hoa Do
* Batch dimension is added to reduction operation. - All the dimensions higher than the batch dimension are collapsed so that the input and output tensors are always 3-4D. - CL kernel is called once instead of being repeatedly called to process each sliding window. Resolves: COMPMID-6443 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icd99939d52d3bb648f08537e5f52ef27e894061b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10456 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-08-14Optimize CLReduce for Min/Max Axis=0Gunes Bayir
Resolves: COMPMID-6400 Change-Id: Id9935f9727f77a824afc75c35f044e3f5c173e0d Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10120 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-08-03Fix CL Tile operatorViet-Hoa Do
Resolves: COMPMID-6404 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I75aebe620567ed50817747589bbe8cfb63715a7b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10036 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-28Retain back-compatibility for arm_compute/core/Types.hSiCong Li
* Some symbols have been moved from core/Types.h. This patch retains back compatibility so that the user can still include this header for those symbols * A new header core/CoreTypes.h is created to avoid circular dependency. This header includes essential small types that are used across functions * Move all function info types into function_info folder for easier tracking Resolves COMPMID-6330 Related to https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4739175c2d4d184a9bc8e28b881b497fab03ca60 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9979 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-07-21Enable S64 output in CLArgMinMaxPablo Marquez Tello
Resolves MLCE-1089 Change-Id: I8b385ef8a00ec5de60299bc7a359766ba5417e68 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9918 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-07-18Break up core/Utils.h to reduce unused code being included everywhereMatthew Bentham
Makes a small difference to compile times and opens up other opportunities to simplify code. Change-Id: I232876910bbe4fa9719f4a0ce4a54c090faeb5ef Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532429 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9856 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-10Disable kernel size 3 in argminmax for axis 0Pablo Marquez Tello
* Only kernel sizes supported are 2, 4, 8 and 16. * Resolves COMPMID-6349 Change-Id: I30c85dcb3505d47fe56a2d2a08e9221ff426ee93 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9890 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-07-05Rewrote CLArgMinMax for axis 0Pablo Marquez Tello
* Simpler implementation without stages for axis 0 * Removed considerable amount of code. Resolves COMPMID-6271 Change-Id: Ie8bcb2f0b55f87472f44b38872a23a922619a211 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9849 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-06-15Break up arm_compute/core/Types.h a bitMatthew Bentham
Split some of the larger types with inlined code into their own header files, so that the implementation of them needn't be included everywhere. Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01 Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-03Support multi-dimensional indices in the CL Gather Layer up to ↵Omar Al Khatib
four-dimensional output tensors Resolves [COMPMID-5775] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I6f6c12ac08f0b0ad070ca5d715c531c2c3762c30 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-28Fix the gather layer indices checkViet-Hoa Do
* If the index is out-of-bound, both CPU and GPU implementations of the gather layer will output 0. Resolves: COMPMID-5964 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ib029b3acfb31452f2097c8c75448fb2697cfa332 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9487 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-04-03Add Cropping to CLBatchToSpaceOmar Al Khatib
- Deprecate dynamic block shape interface - Iterate over output window instead of input window for simpler implementation and better performance. - Add cropping support and cropping tests Resolves [COMPMID-5865] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-27Fix BatchToSpaceFixtureSiCong Li
* Use a vector to represent the (static) block shape instead of an N-D Tensor. The previous use of ND Tensor as block shape was wrong, not adhering to the specification, and non-functional (only first dim was used anyway). * The fixture now accepts a static block shape, because the dynamic case is not properly implemented and will be deprecated for now. * Fix an assertion error in reference implementation. Partially resolves COMPMID-5918 Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Extend cl image support to input and output tensorsGian Marco Iodice
- Add support for texture image to input and output of direct convolution - Extend T_LOAD2D_INDIRECT macro to read values from cl image storages Resolves COMPMID-5715 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Idb0410f53f6d0763cd9e39895a7cbf9bc826d33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8904 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-06Rework DepthwiseConvolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5632 Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02F16 Specialization for MeanStdDevNormMurray Kornelsen
Ran into issues with f16 meanstddevnorm. Essentially, with large enough tensors and/or large values in tensors, output becomes all 0. This is due to the variance computation. In f16, it reaches infinity quite easily, then the division results in 0. This change modifies the OpenCL and NEON implementations to compute the sum of squares and the variance using f32, while other operations remain f16. Update: Found that the square operation also benefits from f32, rather than squaring in f16 and accumulating f32. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: Ide00afd84ec6d26fec4d53b073e295814f08ba46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7959 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02Enable Winograd-based conv2d when IFM>=8 on GpuGian Marco Iodice
From an internal performance evaluation, it seems that Winograd-based Conv2D offers better performance than alternative methods such as direct convolution and gemm-based conv already from IFM=8. Before the condition was for IFM>=16 Resolves COMPMID-5532 Change-Id: I9ff04835d6fd07f5f0abeec9645c9d9cc913b6b7 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8147 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-05-26Disable unsafe FP optimizations causing accuracy issuesGunes Bayir
Resolves: COMPMID-5324 Change-Id: I289b1bb42296c562cb90b918c20def8c6c1825d2 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7639 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-02-09Remove deprecated remap functions.Adnan AlSinan
- Remove CLRemapKernel. - Remove NERemapKernel. Partially resolves COMPMID-4984 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-17Improve start-up time for depthwise convolutionSheri Zhang
- Pass source and destination tensor dimension info at runtime Resolves: COMPMID-4887 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Ib7c9f3ce6fb7cef600f7b0cd0fadafa4fa6888a1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6635 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-10-07Per-operator build dependenciesGeorgios Pinitas
Creates a list of operators their respective dependencies. Alters the build system to walk-through them resolve the dependencies and build Compute Library. Removes the following unused kernels/functions: -[NE|CL]MinMaxLayerKernel -CLFillBorder Resolves: COMPMID-4695,COMPMID-4696 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I35ebeef38dac25ec5459cfe9c5f7c9a708621124 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/357914 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6295 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-03Fix CLNormalizationLayer NCHW border calculationSiCongLi
* Calculate border using both norm size and vec_size_x * Expose reference tensor printer Resolves: COMPMID-4793 Change-Id: I7bd8e49779baf7d6848271757bc7993aa1ed2960 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6201 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-08-25Move CPU/GPU files from Core/Runtime to the respective backend foldersGeorgios Pinitas
Legacy structure contained two libraries core/runtime with two backends in each. We reduce the core/runtime libraries to a single library thus merging the backend files Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-27Port CLGEMMConvolutionLayerManuel Bottini
Details: port CLWeightsReshapeKernel to ClWeightsReshapeKernel port CLGEMMConvolutionLayer to ClGemmConvolution Resolves: COMPMID-4515 Change-Id: I7d5b4ec72db2742f6eb9f3ffc88f717c35b4f2a3 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5983 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-16Port CLIm2ColKernel to ClIm2ColKernelManuel Bottini
Resolves: COMPMID-4516 Change-Id: I6a6db66797fa801dfe1238fceca413277241d2ec Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5946 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-15Port CLCol2ImKernel to ClCol2ImKernelManuel Bottini
Resolves: COMPMID-4517 Change-Id: I50cb02116a1ab86fc29200371944c4774e830746 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5949 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-06Fix manual LOOP_UNROLLINGGian Marco Iodice
The issue is caused by the number of iterations passed to LOOP_UNROLLING. When we use the manual LOOP_UNROLLING, the number of iterations must be less than or equal to 128. To overcome this problem, we create a utility function to check if any of the critical iterations (kernel dimensions) are beyond that limit. If so, the utility function, disable the manual loop unrolling. Resolves COMPMID-4609 Change-Id: I7221c967609e462a5abd1cbb74e2a120f344fcb3 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5913 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: Ibe533f79c2860f9cac8e921895d5a8f947753a5c Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5893 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Implement FP GPU depthwise convolution 1x1 kernel for in-place computationSiCongLi
* Implement in-place graph node mutator for 1x1 depthwise convolution * Add in-place to validation fixture except for DepthwiseConvolutionLayerNativeValidationFixture as it would be a duplicate test otherwise (DepthwiseConvolutionLayerNative test tests the underlying kernel) Resolves: COMPMID-4432 Change-Id: Id7f10f5ebdce7d49f550c0b62dbaaab7f5b59d29 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5874 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-06-30Revert "Rework OpenCL Depthwise Convolution"Gian Marco Iodice
This reverts commit 561c176598cd14245e2e7918fdf136d1c888d1da. Reason for revert: <validation> Change-Id: I6f2d61c27520439bb538e9265736532104b24cf8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5127 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Port the ClGemmLowp kernels to the new APIGeorgios Pinitas
Ported kernels: - CLGEMMLowpMatrixMultiplyNativeKernel - CLGEMMLowpMatrixMultiplyReshapedKernel - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel - CLGEMMLowpOffsetContributionKernel - CLGEMMLowpOffsetContributionOutputStageKernel - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel - CLGEMMLowpQuantizeDownInt32ScaleKernel Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I9d5a744d6a2dd2f2726fdfb291bad000b6970de2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5870 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Set up the framework to choose the default LWSGiorgio Arena
Resolve COMPMID-4486 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ib38b7943bd776a6d75d1da163908724c49eae73d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5864 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: I253dd5d959a70783c82e62b1771a5e9f91621cb0 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-06-22Add FP16 support to CLRemapFreddie Liardet
Add FP16 support to CLRemap when data layout is NHWC. Add relevant tests for FP16 and validation. Update NERemap function level to be consistent with CLRemap. Add depreciation notice for uint_8 only function level methods. Resolves: COMPMID-4335 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: If05f06801aef7a169b73ff1ebe760a42f11ca05c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5816 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-15Port CLWinogradConvolutionLayer with ClWinogradConv2dManuel Bottini
Port CLWinogradInputTransformKernel Port CLWinogradFilterTransformKernel Port CLWinogradOutputTransformKernel Resolves: COMPMID-4504 Change-Id: I3177dda0b9c2f56b36cb317027e94abe8d47229e Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5680 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-15Add NHWC support to CLRemapFrederick Liardet
Add NHWC support to CLRemap, also add relevant tests. Partially resolves COMPMID-4335. Change-Id: I119bea99be497fb85d5cd83a10f8d4e8e1f97f17 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5773 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-01Rename ported functionsManuel Bottini
Rename CpuPooling to CpuPool2d Rename CpuPoolingKernel to CpuPool2dKernel Rename CpuPoolingAssemblyWrapperKernel to CpuPool2dAssemblyWrapperKernel Move CpuPool2dAssemblyWrapperKernel in internal subfolder Rename CpuDepthwiseConvolutionNativeKernel to CpuDepthwiseConv2dNativeKernel Rename CpuDepthwiseConvolutionAssemblyDispatch to CpuDepthwiseConv2dAssemblyDispatch Rename CpuDepthwiseConvolution to CpuDepthwiseConv2d Rename CpuDirectConvolutionKernel to CpuDirectConv2dKernel Rename CpuDirectConvolutionOutputStageKernel to CpuDirectConv2dOutputStageKernel Rename CpuDirectConvolution to CpuDirectConv2d Rename ClPoolingKernel to ClPool2dKernel Rename ClPooling to ClPool2d Rename ClDirectConvolutionKernel to ClDirectConv2dKernel Resolves: COMPMID-4405 Change-Id: I8e48f015e4e492a76a7512f5679cb3eb0cd028f6 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5708 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-19Port DepthConvert to new ApiGeorgios Pinitas
- Renames DepthConvert to Cast - Ports both NEDepthConverLayer and CLDepthConvert variants - Removes legacy shift capability from DepthConvert, allowing only shifts of 0 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I806a0f8eb23d23502b632c529fda7edde19c8176 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5565 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-18Port CLGEMM to memory injecting interfaceGeorgios Pinitas
Moves the following kernels: - CLGEMMMatrixMultiplyKernel - CLGEMMMatrixMultiplyNativeKernel - CLGEMMMatrixMultipluReshapedKernel - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel Moves the following functions - CLGEMM Introduces facilities to easy handling of auxiliary temporary buffers under then new run interface. Such are: - CLAuxTensorHandler: That allows wrapping of workspace buffers memory to CLBuffer objects - Ability to inject TensorInfo to allocator without transferring ownership. This reduce the copy overhead if needed. Resolves: COMPMID-4188 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I7055435d831b05b749b26302082e4ac45f26dfb0 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5498 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-17Fix oclgrind error on CLGEMMLowp reshaped only RHS quantized per channelGiorgio Arena
Fix corner case in which the quantization is per channel and OFM == 1 The function can safely set the per_channel quantization flag to false since there is only one output channel This way, the kernel will avoid adding useless padding to the output multipliers and shifts Resolve COMPMID-4384 Change-Id: Ic03452bfaf52d1be536cd371721adedd2e580a08 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5648 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-05-17Fix MeanStdDevNormalizationLayer reference outputting nan for FP16Giorgio Arena
- Bring the epsilon up to 1e-3 for FP16 (both backends) since it was causing the reference's variance being negative and its square root being NaN - Bring the epsilon up to 1e-7 for FP16 NEON test for the same problem on the NEON kernel - Adjust the CL kernel's vec_size when input tensor's width < 16 and use macros agnostic of vector size for sum reduction - Add previously mismatching tensor shapes Resolve COMPMID-4354 Change-Id: I823c871aacb72326f90c86b24cb16c3e2d4bd15e Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5630 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-05-13Fix integer overflow and null dereferenceMichalis Spyrou
Resolves: COMPMID-4527 Change-Id: If038d2477d8898d3ee307fe58301fb0b16b64c02 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5640 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-13Remove padding from CLChannelShuffleLayerKernelManuel Bottini
Only for NHWC Instead of starting from the input vector and insert the values in the output, we now create an output vector by taking from the input tensor. This makes it easier to store the result in the output with border awareness Resolves: COMPMID-4445 Change-Id: I77cf2d2d5ff30a383cddb8a3c81c462d7a39fd2e Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5627 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-05-07Remove TODOsSheri Zhang
Remove TODO/FIXME either already done or won't do. Partially resolve: COMPMID-4471 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Iec8f25b9bf1b2a70c072dd17d44625fa93e84ed1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5591 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-05-04Fix bug on CLReductionOperationGiorgio Arena
Execution window along the X axis needs to be collapsed on the 3rd axis (rather than the 2nd) since there could be implicit padding added along the Y Resolve COMPMID-4425 Change-Id: I9623a31749b737fea7c623cabdcfbf77cbe8f6dc Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5551 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>