aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels/tile_helpers.h
AgeCommit message (Collapse)Author
2023-07-06Fix nightly failures in MatMulLowpNativeKernel when using bounded activation ↵Mohammed Suhail Munshi
functions - Added checks for supported activation functions in MatMulLowpKernel validate - Replaced incorrect float activation macro with quantized implementation in mat_mul_quantized Resolves: [COMPMID-6339] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I15661f14877f1d3305644e6473feb5482a67e773 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/532858 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9855 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-05-11Fix invalid vector length in CLViet-Hoa Do
Resolves: COMPMID-6252 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I97ddf8a6c83bc2621abc712094db6bc0fe3d97b1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9620 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-04-17Add quantized CL MatMul kernels for Lhs NT/T, Rhs NTGunes Bayir
Implement OpenCL kernels for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED. Quantized MatMul is supported with the following MatMul attributes: * adj_x = false, adj_y = false * adj_x = true, adj_y = false We consider native format kernels only. In other words, no reshaping of the operand matrices is done. Resolves: COMPMID-5921, COMPMID-5922 Change-Id: I99e0f68054a2bd635c60ec2641acc2e7ff398473 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9435 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-03-20Implement OpenCL MatMul for Lhs T Rhs T/NT FP32/16Gunes Bayir
- Implement opencl kernel for LHS transposed and RHS non-transposed - Implement opencl kernel for LHS transposed and RHS transposed - Add validation tests Resolves: COMPMID-5953, COMPMID-5955 Change-Id: I55589acbffe86c44e29807574975978a1ec09bad Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
- Implement ClNativeMatMulKernel class - Implement opencl kernel for LHS non-transposed and RHS non-transposed - Implement opencl kernel for LHS non-transposed and RHS transposed - Add test fixture and dataset for matmul - Implement transpose_tensor() for reference implementation to transpose high dimensional tensors Resolves: COMPMID-5944, COMPMID-5951 Co-authored-by: Gunes Bayir <gunes.bayir@arm.com> Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I1d5b8978f41be27baddb3153ade880472141573f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-02-01Add Subtraction operator to Dynamic Fusion interfaceRamy Elgammal
Partially-Resolves: COMPMID-5518 Change-Id: I8358784815bcac461d50e384fa7bc96f476d3983 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9045 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: SiCong Li <sicong.li@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-31Add Multiplication operator (FP only) to Dynamic Fusion InterfaceJakub Sujak
Note: we use a separate test fixture for Multiplication op instead of reusing ElementwiseBinaryFixture to avoid exposing the internal enum ElementwiseOp to the public utils/TypePrinters.h as required by the data test case macros to print the test data. We also do not consider modifying the enum ArithmeticOp in the standard interface to include MUL without an implementation. Future work should consider refactoring this test fixture into the ElementwiseBinaryFixture to reduce the total number of fixtures/code duplication. Resolves: COMPMID-5779 Change-Id: I84207658ce0407095b028fca0ab7bfa2950255ec Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9013 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Extend cl image support to input and output tensorsGian Marco Iodice
- Add support for texture image to input and output of direct convolution - Extend T_LOAD2D_INDIRECT macro to read values from cl image storages Resolves COMPMID-5715 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Idb0410f53f6d0763cd9e39895a7cbf9bc826d33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8904 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-06LHS broadcasting addition for dynamic fusionViet-Hoa Do
* Binary elementwise operator now can have broadcasting in either X dimension, Y+Z dimension, or both, in either LHS or RHS operand. * Fix bug in CL code to support batching. Resolves: COMPMID-5704 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I51b04986d30861f255ca9f754adffa0e6c85a26b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8898 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
- Implement indirect convolution kernel - Add operator support - Add test Resolves COMPMID-5709 Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-29Adding GpuAdd to dynamic fusion operatorsRamy Elgammal
- Provide support for Add operator - Auto initialize the destination tensor before testing fusion in conv2d and elementwise binary ops. Resolves: COMPMID-5518 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ibd815020f02b57f88eea7c2921bdcf98605d99c5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8617 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize T_QUANTIZE8_ASYMMETRIC for Mali™ G52Pablo Marquez Tello
* Resolves MLCE-842 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Iae0521b25a5e6c9cc8046830f397d523dfbcc66e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8542 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rework direct convolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5634 Change-Id: I075de70d509d0c4430b4bcf3f218384e237a3a56 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/453708 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8473 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2022-07-13Add Gemm MMUL Reshaped Only Rhs Support for FP32/FP16Gunes Bayir
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615 Resolves: COMPMID-5216 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-27Implement new Elementwise Dynamic Fusion Operators: Div, FloorMichalis Spyrou
Resolves: COMPMID-5355 Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-05-31Add cl_khr_integer_dot_product extension supportViet-Hoa Do
* Replace arm_dot(_acc) with dot when cl_khr_integer_dot_product extension is available. Resolves: COMPMID-5206 Change-Id: I7fd763e2421987584e4dae271008972644ea2f41 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7647 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-05-09Mismatches in dynamically fused direct conv2d + add kernelMichalis Spyrou
Resolves: COMPMID-5269 Change-Id: I4372ea4365d14ead79153e4b08b690a1e20ab0b7 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7531 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-04-14Include missing embedded headersSiCong Li
Partially resolves: COMPMID-5156 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I434586ac72d0f5a530e19108e6c5c319497c4fe0 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7411 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-02Revert "Rework gemm_mm_reshaped_only_rhs_ kernels with new macros"Ramy Elgammal
This reverts commit 10e88a7351 "Rework gemm_mm_reshaped_only_rhs_ kernels with new macros" Resolves: COMPMID-5095 Signed-off-by: Ramy Elgammal<ramy.elgammal@arm.com> Change-Id: I46e167882f072e7508b6101d295accb6e089e740 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7045 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-01-25Rework gemm_mm_reshaped_only_rhs_ kernels with new macrosGian Marco Iodice
- Rework gemm_reshaped_rhs_only with new TILE macros - Fuse post ops in gemm_reshaped_rhs_only Resolves COMPMID-4890 Change-Id: I944948ecec6d08deaf3545b80cd3eeac26e44205 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6944 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-12-23Rework gemm_reshape_lhs_ with new macrosAdnan AlSinan
Resolves COMPMID-4892 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I52f23ca293506fc693ae829daccc6e889a050752 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6833 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-11-26Rework gemm_reshape_rhs_(nt,t) with new macrosGian Marco Iodice
Resolves COMPMID-4891 Change-Id: Ifdf2a0eaed23347a1b4465ea8d58c11b72083952 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6741 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-11-09Improve start-up time for ClScaleAdnan AlSinan
- Add macro guard for different kernels in scale.cl - Rework TENSOR4D to the new format - Pass scale_x and scale_y at runtime Resolves COMPMID-4886 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-14Implement CLDirectConv3D f32/f16Giorgio Arena
Resolve COMPMID-4660 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-07-22Fix oclgrind int overflow warningFreddie Liardet
Fix warning found by oclgrind. Also remove duplicated code. Resolves COMPMID-4675 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: I6ad56cc0130b5df936f1e070db116695269317df Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5974 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-14Fix CL kernel compilation failureMichalis Spyrou
A typo was causing errors when building some CL kernels. Resolves: COMPMID-4637 Change-Id: I7d1821e1a046ef8ccd306f72afe192732dd2ad1e Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5944 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: Ibe533f79c2860f9cac8e921895d5a8f947753a5c Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5893 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-01Add quantization helper functions for OpenCLGeorgios Pinitas
Add `T_QUANTIZE8_PER_TENSOR` and `T_QUANTIZE8_PER_CHANNEL` that can be used to perform quantization on tile constructs. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ie8e1efcb895c64715620acf2212b1de9a857ee0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5891 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-30Revert "Rework OpenCL Depthwise Convolution"Gian Marco Iodice
This reverts commit 561c176598cd14245e2e7918fdf136d1c888d1da. Reason for revert: <validation> Change-Id: I6f2d61c27520439bb538e9265736532104b24cf8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5127 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-24Rework OpenCL Depthwise ConvolutionGian Marco Iodice
- Remove dedicated kernels for NCHW. Now we only use NHWC with permute - Remove specialized kernels for 3x3 NHWC - Simplify CLDepthwiseConvolutionLayer.cpp to call just the native implementation for both floating-point and quantized data types - Develop two parametric opencl kernels for depthwise convolution layer NHWC (floating-point and quantized) - Add support to export the weights to cl_image - Extend test for depthwise convolution on opencl Resolves COMPMID-4417 Change-Id: I253dd5d959a70783c82e62b1771a5e9f91621cb0 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5806 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-05-20Enable unroll through pragma based on DDK versionGiorgio Arena
Change-Id: Id98a107d512369d3799961011a84e9cc4d99e775 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5679 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-05-17Add macro to manually unroll loops in OpenCLGiorgio Arena
Change-Id: I092d10534816f5b3717325952033c351b8231380 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5643 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-07Fix missing DATA_TYPE in DOT_PRODUCT4_INTEGER8 OpenCL macroGian Marco Iodice
- DOT_PRODUCT8_INTEGER8 and DOT_PRODUCT16_INTEGER8 are calling DOT_PRODUCT4_INTEGER8 without passing DST_DATA_TYPE Resolves COMPMID-4491 Change-Id: I394bd2f9208489e820885e49ed40e607d6470620 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5594 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-20Remove OpenCL padding: CLPixelWiseMultiplicationKernelGiorgio Arena
- Change kernel's vec_size to 16 / sizeof(output) - Change ICLKernel.cpp to handle broadcast without padding Resolve COMPMID-3913 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I03e884b250ef5784dc109bff8cf2c96b345d119f Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5450 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-04-12Add support for cl_image in CLDirectConvolutionLayerGian Marco Iodice
- The cl_image object can be used for the weights - cl_image can only work for f32/f16 - Fix the implicit padding on the first dimension X Resolves COMPMID-4341 Change-Id: I04e0901c69e7765c42afceca38c4a840645b9123 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5393 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-04-08Rework the OpenCL Winograd Input Transformations NHWCGian Marco Iodice
- Rework Winograd Input Transform 3x3 NHWC using the new macros - Rework Winograd Input Transform 5x5 NHWC using the new macros - Rework Winograd Input Transform 7x7 NHWC using the new macros - The new implementation is also faster than before - Winograd Input Transform 5x5/7x7 3x faster Resolves COMPMID-4139 Change-Id: Ia9c8af23a2d47d2db60ec4c44650a63a34ffa0d5 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5358 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-03-25Improve performance of Winograd Output Transform 3x3Gian Marco Iodice
This patch reworks the winograd output transform 3x3 NHWC on OpenCL - Use utility macros in tile_helpers.h to rewrite the kernel - Implement the tile utility macro for the activation Resolves COMPMID-4144 Change-Id: I86a9bb9ea96b9629a18642b56bb63750710e6af5 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5324 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-03-23Extend direct convolution (F32/F16/QASYMM8)Gian Marco Iodice
The new function can handle different block sizes (M0, N0) New utility macros have been developed to simplify the work and the future OpenCL kernel development. In particular the work has been done to also consider cases with: - the texture pipe support - dynamic tensor shape support Change-Id: Ife4c64baf07517938bb8ad18e6a5f4579345c40f Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5297 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>