aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels
AgeCommit message (Collapse)Author
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint. Resolves: COMPMID-5676 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize T_QUANTIZE8_ASYMMETRIC for Mali™ G52Pablo Marquez Tello
* Resolves MLCE-842 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Iae0521b25a5e6c9cc8046830f397d523dfbcc66e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8542 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-03Fix activation block in gemm.clGian Marco Iodice
- Replace VEC_SIZE with N0. VEC_SIZE was used in the old gemm kernel and not used anymore in the existing ones Resolves COMPMID-5678 Change-Id: Ia770200b9d6e24c51c57347e4634fb8eadd10385 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8556 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-02Partially Revert "Add threshold for floating-point SOFT_RELU activation"Gunes Bayir
Revert the range removal in tests for soft relu and bring the former implementation in CL backend back. Resolves: COMPMID-5677 Change-Id: I35d5ac03a134299041ce97aabc9fff2d4380d09f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8551 Reviewed-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Add threshold for floating-point SOFT_RELU activationMilos Puzovic
Added missing threshold for calculating SOFT_RELU when SVE and CL implementations are used. As a result removed from the testing bounds for input values that were set to be in the interval [-40, 40]. Resolves: COMPMID-5658 Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I3d14df60125e36e4eb85aeb222f4fb0cc5741521 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8536 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rework direct convolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5634 Change-Id: I075de70d509d0c4430b4bcf3f218384e237a3a56 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/453708 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8473 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2022-10-07Workaround CL compiler issue on FP16Viet-Hoa Do
Resolves: COMPMID-5600 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5196d1639c48d0b8a116d47ed1d6c7334dc8f41e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8374 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-06Rework DepthwiseConvolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5632 Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-06Improve start-up time in gemmlowp reshaped rhs only.Adnan AlSinan
- Does not Pass RESULT_MULTIPLIER, RESULT_SHIFT When PER_CHANNEL_QUANTIZATION is defined. Resolves COMPMID-5499 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ie433eaf83c003a6d5ccfeb89eb2783528dc2b48e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8316 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-14Adding GELU activationMurray Kornelsen
OpenCL implementation uses built in erf. NEON implementation requires new vectorized erf. Uses the following approximation: erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4 a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108 From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-09Add a macro guard in all OpenCL kernels in gemmlowp.clGian Marco Iodice
Resolves COMPMID-5498 Change-Id: I474f3f963257014255d082aab0ccbe3efe5aa067 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8222 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-02F16 Specialization for MeanStdDevNormMurray Kornelsen
Ran into issues with f16 meanstddevnorm. Essentially, with large enough tensors and/or large values in tensors, output becomes all 0. This is due to the variance computation. In f16, it reaches infinity quite easily, then the division results in 0. This change modifies the OpenCL and NEON implementations to compute the sum of squares and the variance using f32, while other operations remain f16. Update: Found that the square operation also benefits from f32, rather than squaring in f16 and accumulating f32. Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca> Change-Id: Ide00afd84ec6d26fec4d53b073e295814f08ba46 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7959 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-22Add GemmLowp MMUL Reshaped Only Rhs Support for QASYMM8/QASYMM8_SIGNEDFreddie Liardet
This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5398 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-21Fix direct convolution cases that were failing on OdroidAdnan AlSinan
- Affects OpenCL backend. - Resolves COMPMID-5416 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-13Add Gemm MMUL Reshaped Only Rhs Support for FP32/FP16Gunes Bayir
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5216 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-05Add G57 to GPUTargetSiCong Li
Relates to: COMPMID-5299 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I19f820f698cf11020da019f4a1334cccb1e40c7e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7880 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-06-27Implement new Elementwise Dynamic Fusion Operators: Div, FloorMichalis Spyrou
Resolves: COMPMID-5355 Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-06-15Fix performance regression in Winograd Output Transform (OpenCL)Gian Marco Iodice
The regression was caused by NUM_TILES_X passed at runtime. Resolves COMPMID-5327 Change-Id: Id6ccd93784eda93af09f420c0d786050e2bbccd7 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7727 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-05-31Add cl_khr_integer_dot_product extension supportViet-Hoa Do
* Replace arm_dot(_acc) with dot when cl_khr_integer_dot_product extension is available. Resolves: COMPMID-5206 Change-Id: I7fd763e2421987584e4dae271008972644ea2f41 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7647 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-05-09Mismatches in dynamically fused direct conv2d + add kernelMichalis Spyrou
Resolves: COMPMID-5269 Change-Id: I4372ea4365d14ead79153e4b08b690a1e20ab0b7 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7531 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-04-19Add CLPool3d Int8 SupportMohammed Suhail Munshi
- Adds Qasymm8 and Qasymm8_signed support to the 3d pool operator Resolves: COMPMID-4669 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I36038c2b7c4f36baf67f7aae801356890e104538 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/410496 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7391 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-04-14Include missing embedded headersSiCong Li
Partially resolves: COMPMID-5156 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I434586ac72d0f5a530e19108e6c5c319497c4fe0 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7411 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-03-15Implementation of ClPooling3dramelg01
- For NDHWC layout - For F16 and F32 data types - Mixed Precision stil not supported Resolves: COMPMID-4670 Signed-off-by: ramy.elgammal@arm.com Change-Id: I0e14a13e4625569e8e5ee67e6033bd1efe0da469 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7262 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-03-08Merge kernel prototype patchGiorgio Arena
Resolves: COMPMID-5151 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ic4024d5cd4819fe917a1d49621f1866ae2e90a37 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7260 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-11Improve start-up time for concatenation layersramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel of internest Resolves: COMPMID-5121 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I76b7c0cf56d803f58ebff5494c904ace2a86ef5a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7097 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-10Improve start-up time for winograd_output_transform_*_nhwcramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel of internest Resolves: COMPMID-5120 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I87c3b56ce0cd3c97ffdeabdd9c5d433f361bb005 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7101 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-09Remove deprecated remap functions.Adnan AlSinan
- Remove CLRemapKernel. - Remove NERemapKernel. Partially resolves COMPMID-4984 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-09Improve start-up time for winograd_input_transform_*_nhwcramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel(s) of internest Resolves: COMPMID-5119 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib01098e397011a1201c2800c62a8954ec70e63e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7083 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-08Improve start-up time for winograd_filter_transform_*_nhwcramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel of internest Resolves: COMPMID-5118 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ie42c3c07fdd817ce62e7cad354381bc22c6e9264 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7058 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-02Revert "Rework gemm_mm_reshaped_only_rhs_ kernels with new macros"Ramy Elgammal
This reverts commit 10e88a7351 "Rework gemm_mm_reshaped_only_rhs_ kernels with new macros" Resolves: COMPMID-5095 Signed-off-by: Ramy Elgammal<ramy.elgammal@arm.com> Change-Id: I46e167882f072e7508b6101d295accb6e089e740 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7045 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-01-25Rework gemm_mm_reshaped_only_rhs_ kernels with new macrosGian Marco Iodice
- Rework gemm_reshaped_rhs_only with new TILE macros - Fuse post ops in gemm_reshaped_rhs_only Resolves COMPMID-4890 Change-Id: I944948ecec6d08deaf3545b80cd3eeac26e44205 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6944 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-12-23Rework gemm_reshape_lhs_ with new macrosAdnan AlSinan
Resolves COMPMID-4892 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I52f23ca293506fc693ae829daccc6e889a050752 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6833 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-12-13Remove padding from ClDirectConv2dKernelAdnan AlSinan
- Delete old NCHW ClDirectConv2d kernels. - Merge all kernels on a single file. - Removed padding from ClDirectConv2dKernel Resolves COMPMID-4721 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I624d218fb770e7b5f3c0acd4e85a21ae48470f55 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6779 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-12-10Use #if directive instead of regular condition in CLDirectConv2DGiorgio Arena
Resolve COMPMID-5004 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ib3e1b5a891234316c411ea9825ec10c68c4ab5a3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6788 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-12-01Improve start-up direct convolution on OpenCLGian Marco Iodice
- Pass arguments at runtime - Rework ClConv2D heuristic to select direct convolution when OFM < IFM also for small kernel sizes Resolves COMPMID-5000 Change-Id: I9b538e29093829bc366d24d1e904341c247fa22b Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6771 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-29Use loop unrolling only when the kernel height is less than 5Gian Marco Iodice
- In the dwc_native_fp_nhwc.cl, loop unrolling should only be enabled when kernel height is less than 5. - No performance regression experimented - The patch reduces the compilation time required for the kernel Resolves COMPMID-4887 Change-Id: I93188b9764cf7d1ad34ac164694f6f1fd37a90e8 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6744 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-11-26Rework gemm_reshape_rhs_(nt,t) with new macrosGian Marco Iodice
Resolves COMPMID-4891 Change-Id: Ifdf2a0eaed23347a1b4465ea8d58c11b72083952 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6741 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-11-20Improve start-up timer for GeMM (floating-point):ramelg01
- Pass M,N,K at runtime as kernel parameters - Add a guard macro to compile only kernel of interest - Move reshpaing kernels to gemm_utils.cl - Remove the fallback reshaping kernel with Y-Padding support Resolves: COMPMID-4888 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ida3851326f0b77e410633271de9ecca106e37931 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6662 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-17Improve start-up timer for ClIm2ColGiorgio Arena
Resolve COMPMID-4889 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I4a88082b13865fdaeaba1b7216503cd640aa54df Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6680 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-17Improve start-up time for depthwise convolutionSheri Zhang
- Pass source and destination tensor dimension info at runtime Resolves: COMPMID-4887 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Ib7c9f3ce6fb7cef600f7b0cd0fadafa4fa6888a1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6635 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-11-09Improve start-up time for ClScaleAdnan AlSinan
- Add macro guard for different kernels in scale.cl - Rework TENSOR4D to the new format - Pass scale_x and scale_y at runtime Resolves COMPMID-4886 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-04Add validate tests for CLConvolutionLayer and CLGEMMConvolutionLayer with ↵SiCongLi
post ops * Add validate tests * Restrict post ops support in ClGemmConv2d to only those that do not need im2col or col2im. In practice this means we only support post ops in conv1x1 with stride = 1, dilation = 1 and data layout = NHWC Resolves COMPMID-4435 Change-Id: I1fdf0c5d565a4624857250075ac76db35c2f383b Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6573 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-11-04Add PRelu to supported PostOps in:ramelg01
- ClGemmMatrixMultiplyReshapedKernel - ClGemmMatrixMultiplyNativeKernel - ClGemmMatrixMultiplyReshapedOnlyRhsKernel Resolves: COMPMID-4713 Change-Id: I3adcb1b3d4af37ebcbc3bee19cc1845885d08600 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6553 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-03Fix out-of-bound reads in cl gemm kernelsSiCongLi
* Revert "Remove padding in FP Cl Gemm kernels" This reverts commit 48717a3d38fef8d316cd4b9fd9a3bc1a43db736b. * Allow different boundary row handling strategies across native, reshaped and reshaped_only_rhs kernels by introducing a ELTWISE_OPERAND_ROW parameter to the macro Resolves COMPMID-4919 Change-Id: Icefc23c0760a6abb838fef1d0d5bda06b07c79e3 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6569 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-11-02Add post ops to ClGemmMatrixMultiplyReshapedOnlyRHSKernel and ↵SiCongLi
ClGemmMatrixMultiplyNativeKernel Part 3 Partially resolves: COMPMID-4435 Change-Id: Ifc5affa3a24a70942ca2d001380205df09b03ad7 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6550 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-01Remove padding in FP Cl Gemm kernelsSiCongLi
* Remove rhs and bias padding in ClGemmMatrixMultiplyNativeKernel * Rework ClGemmMatrixMultiplyReshapedOnlyRHSKernel to use the same padding boundary condition as the other kernels Partially resolves COMPMID-4435 Change-Id: I1c17af9cca0b5cb3be087ce160948b7b0e62d297 Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6549 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-10-28Add experimental PostOp interface to ClGemmMatrixMultiplyReshapedKernel Part 1SiCongLi
This interface supports the fusion of multiple elementwise operations Partially resolves: COMPMID-4435 Change-Id: If68dd7dd98dcf239fde7cb1f0a4a6d4d1e899a6f Signed-off-by: SiCongLi <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6483 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-20Implement CLDirectConv3DKernel - uint8/int8Giorgio Arena
Resolve COMPMID-4663 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I5c3c1cffed5385c06b789543318f7f4d6096987e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6468 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-10-18Remove legacy GeMM kernels on OpenCLGian Marco Iodice
Resolves COMPMID-4446 Change-Id: I1d3c2391b67681f4d3af440826aa95b47a1288a6 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6444 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>