aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels/nhwc
AgeCommit message (Collapse)Author
2023-01-18Revert "Update the heuristic for CLDepthwiseConvolutionNative kernel"Gian Marco Iodice
Resolves COMPMID-5813 Change-Id: I5ef6fe9fb6a54db18e41a71085896fd08bc08dbb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8975 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-12Update the heuristic for CLDepthwiseConvolutionNative kernelGian Marco Iodice
- Use T_LOAD2D_INDIRECT macro instead of T_LOAD_NHWC_WITH_DILATION in the depthwise convolution opencl kernels - Update the heuristic for Arm® Mali™-G77 Resolves COMPMID-5716 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I32d375b220e04bf05f5d8f0af2231bde600f0665 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8930 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-01-10Extend cl image support to input and output tensorsGian Marco Iodice
- Add support for texture image to input and output of direct convolution - Extend T_LOAD2D_INDIRECT macro to read values from cl image storages Resolves COMPMID-5715 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: Idb0410f53f6d0763cd9e39895a7cbf9bc826d33a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8904 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Optimize CL Scale/Resize Quantized by removing (de)quant. codeGunes Bayir
This patch removes the quant/dequant code in CLScale and the Resize operator in dynamic fusion. We don't support different quantization information for input and output and in this case the quantization and dequantization is not necessary. The very same optimization was delivered for cpu. It also moves the SCALE_X and SCALE_Y arguments to look-up table from build options in the template writer of Resize. Change-Id: Icd043c8671220c8feea935dd4b24a5b17c6c4ea4 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8888 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-29Extend Transposed Conv. for tiles with N0>1Gunes Bayir
Partially Resolves: COMPMID-5724 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I0aeddddcdd87c8c79f6dae9a76ffdc2ba0c08e17 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8883 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-21Update direct conv2d kernel in dynamic fusionGian Marco Iodice
Resolves COMPMID-5780 Change-Id: I34c764cd1df652f8a938772924dc49baf6ac16db Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8825 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-12-14Optimize Transposed Convolution for CL backend (Quantized)Gunes Bayir
This patch optimizes transposed convolution for QASYMM and QASYMM8_SIGNED types, by extending the transposed convolution kernel written for FP32/16. Resolves: COMPMID-5723 Change-Id: Iab8f09231938adb949c506fd915ed45b885e5c7c Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8792 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
- Implement indirect convolution kernel - Add operator support - Add test Resolves COMPMID-5709 Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-25Implement address precalculation for indirect conv2d - OpenCLGian Marco Iodice
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel) - Implement OpenCL kernel (indirect_convolution.cl) - Add test Resolves COMPMID-5708 Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint. Resolves: COMPMID-5676 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rework direct convolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5634 Change-Id: I075de70d509d0c4430b4bcf3f218384e237a3a56 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/453708 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8473 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2022-10-07Workaround CL compiler issue on FP16Viet-Hoa Do
Resolves: COMPMID-5600 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5196d1639c48d0b8a116d47ed1d6c7334dc8f41e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8374 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-06Rework DepthwiseConvolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5632 Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-09-07Optimize depthwise convolution on OpenCLGian Marco Iodice
The optimization concerns the case where the depth multiplier is > 1. The depth multiplier for loop has been removed from the OpenCL kernel and the GWS has been mapped to the output shape. In this way, we can still perform a tile with N0 columns and improve the performance of depthwise conv over 80% when depth multiplier is > 1. Resolves COMPMID-5568 Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-21Fix direct convolution cases that were failing on OdroidAdnan AlSinan
- Affects OpenCL backend. - Resolves COMPMID-5416 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-06-27Implement new Elementwise Dynamic Fusion Operators: Div, FloorMichalis Spyrou
Resolves: COMPMID-5355 Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2022-06-15Fix performance regression in Winograd Output Transform (OpenCL)Gian Marco Iodice
The regression was caused by NUM_TILES_X passed at runtime. Resolves COMPMID-5327 Change-Id: Id6ccd93784eda93af09f420c0d786050e2bbccd7 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7727 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-04-19Add CLPool3d Int8 SupportMohammed Suhail Munshi
- Adds Qasymm8 and Qasymm8_signed support to the 3d pool operator Resolves: COMPMID-4669 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I36038c2b7c4f36baf67f7aae801356890e104538 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/410496 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7391 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-03-15Implementation of ClPooling3dramelg01
- For NDHWC layout - For F16 and F32 data types - Mixed Precision stil not supported Resolves: COMPMID-4670 Signed-off-by: ramy.elgammal@arm.com Change-Id: I0e14a13e4625569e8e5ee67e6033bd1efe0da469 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7262 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-02-10Improve start-up time for winograd_output_transform_*_nhwcramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel of internest Resolves: COMPMID-5120 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I87c3b56ce0cd3c97ffdeabdd9c5d433f361bb005 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7101 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-09Remove deprecated remap functions.Adnan AlSinan
- Remove CLRemapKernel. - Remove NERemapKernel. Partially resolves COMPMID-4984 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-09Improve start-up time for winograd_input_transform_*_nhwcramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel(s) of internest Resolves: COMPMID-5119 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib01098e397011a1201c2800c62a8954ec70e63e8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7083 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-08Improve start-up time for winograd_filter_transform_*_nhwcramelg01
- pass tensor's dimensions at runtime rather than compile time - Add guard macro to compile only kernel of internest Resolves: COMPMID-5118 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ie42c3c07fdd817ce62e7cad354381bc22c6e9264 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7058 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-12-10Use #if directive instead of regular condition in CLDirectConv2DGiorgio Arena
Resolve COMPMID-5004 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ib3e1b5a891234316c411ea9825ec10c68c4ab5a3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6788 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-12-01Improve start-up direct convolution on OpenCLGian Marco Iodice
- Pass arguments at runtime - Rework ClConv2D heuristic to select direct convolution when OFM < IFM also for small kernel sizes Resolves COMPMID-5000 Change-Id: I9b538e29093829bc366d24d1e904341c247fa22b Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6771 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-29Use loop unrolling only when the kernel height is less than 5Gian Marco Iodice
- In the dwc_native_fp_nhwc.cl, loop unrolling should only be enabled when kernel height is less than 5. - No performance regression experimented - The patch reduces the compilation time required for the kernel Resolves COMPMID-4887 Change-Id: I93188b9764cf7d1ad34ac164694f6f1fd37a90e8 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6744 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-11-17Improve start-up timer for ClIm2ColGiorgio Arena
Resolve COMPMID-4889 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I4a88082b13865fdaeaba1b7216503cd640aa54df Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6680 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-17Improve start-up time for depthwise convolutionSheri Zhang
- Pass source and destination tensor dimension info at runtime Resolves: COMPMID-4887 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Ib7c9f3ce6fb7cef600f7b0cd0fadafa4fa6888a1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6635 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-11-09Improve start-up time for ClScaleAdnan AlSinan
- Add macro guard for different kernels in scale.cl - Rework TENSOR4D to the new format - Pass scale_x and scale_y at runtime Resolves COMPMID-4886 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-20Implement CLDirectConv3DKernel - uint8/int8Giorgio Arena
Resolve COMPMID-4663 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I5c3c1cffed5385c06b789543318f7f4d6096987e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6468 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-10-15Fix CLConv3D filelist and commentsGiorgio Arena
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I4d48f1b8eba6681a9de0ae5f1fd8a4ad1edf7fe8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6439 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-14Implement CLDirectConv3D f32/f16Giorgio Arena
Resolve COMPMID-4660 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2021-09-14Optimize ClScaleKernel on NHWC (f32/f16/int8)Gian Marco Iodice
The new kernel performs the computation on multiples elements. The OpenCL kernel has been re-implemented using the new TILE macros Resolves COMPMID-4803,COMPMID-4804 Change-Id: Iac8fead65e21b64567a05dbc4fbaa61d362443f9 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6235 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-23Remove padding from ClScaleKernelGiorgio Arena
- Merge quantized kernels with fp for bilinear interpolation (both NCHW and NHWC) - Pass dimensions at compile time rather than at run time - Use tile-based approach to rework the NCHW kernels - Remove unused functions/files Resolve COMPMID-4723 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ifcdf02beb9daa9f318395751b3c85eb2fe874082 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6138 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-25Reorganize the kernels into nhwc, nchw and common foldersAdnan AlSinan
The Following kernels have been split into nchw/nhwc kernels files: - batchnormalization_layer - batch_to_space - channel_shuffle - depth_to_space - dequantization_layer - im2col - normalization_layer - normalize_planar_yuv_layer - normalize_planar_yuv_layer_quantized - pooling_layer - pooling_layer_quantized - remap - reorg_layer - scale - scale_quantized - space_to_batch - space_to_depth - upsample_layer - winograd_filter_transform - winograd_input_transform - winograd_output_transform The following kernels have been moved to nchw folder: - direct_convolution1x1 - direct_convolution3x3 - direct_convolution5x5 - direct_convolution_quantized - prior_box_layer The following kernels have been moved to nhwc folder: - direct_convolution - dwc_native_fp_nhwc - dwc_native_quantized_nhwc The following kernels have been removed: - sobel_filter While the rest kerenls have been moved to the common folder. Partially resolves COMPMID-4453 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>