Age | Commit message (Collapse) | Author |
|
This patch removes the quant/dequant code in CLScale and the Resize operator in dynamic fusion. We don't support different quantization information for input and output and in this case the quantization and dequantization is not necessary. The very same optimization was delivered for cpu.
It also moves the SCALE_X and SCALE_Y arguments to look-up table from build options in the template writer of Resize.
Change-Id: Icd043c8671220c8feea935dd4b24a5b17c6c4ea4
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8888
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves: COMPMID-5724
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I0aeddddcdd87c8c79f6dae9a76ffdc2ba0c08e17
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8883
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5780
Change-Id: I34c764cd1df652f8a938772924dc49baf6ac16db
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8825
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch optimizes transposed convolution for QASYMM and QASYMM8_SIGNED types, by extending the transposed convolution kernel written for FP32/16.
Resolves: COMPMID-5723
Change-Id: Iab8f09231938adb949c506fd915ed45b885e5c7c
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8792
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement indirect convolution kernel
- Add operator support
- Add test
Resolves COMPMID-5709
Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel)
- Implement OpenCL kernel (indirect_convolution.cl)
- Add test
Resolves COMPMID-5708
Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint.
Resolves: COMPMID-5676
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5634
Change-Id: I075de70d509d0c4430b4bcf3f218384e237a3a56
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/453708
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8473
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Resolves: COMPMID-5600
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I5196d1639c48d0b8a116d47ed1d6c7334dc8f41e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8374
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5632
Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The optimization concerns the case where the depth multiplier is > 1.
The depth multiplier for loop has been removed from the OpenCL kernel
and the GWS has been mapped to the output shape. In this way, we can
still perform a tile with N0 columns and improve the performance of
depthwise conv over 80% when depth multiplier is > 1.
Resolves COMPMID-5568
Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Affects OpenCL backend.
- Resolves COMPMID-5416
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5355
Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
|
|
The regression was caused by NUM_TILES_X passed at runtime.
Resolves COMPMID-5327
Change-Id: Id6ccd93784eda93af09f420c0d786050e2bbccd7
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7727
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds Qasymm8 and Qasymm8_signed support to the 3d pool operator
Resolves: COMPMID-4669
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I36038c2b7c4f36baf67f7aae801356890e104538
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/410496
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7391
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- For NDHWC layout
- For F16 and F32 data types
- Mixed Precision stil not supported
Resolves: COMPMID-4670
Signed-off-by: ramy.elgammal@arm.com
Change-Id: I0e14a13e4625569e8e5ee67e6033bd1efe0da469
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7262
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- pass tensor's dimensions at runtime rather than compile time
- Add guard macro to compile only kernel of internest
Resolves: COMPMID-5120
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I87c3b56ce0cd3c97ffdeabdd9c5d433f361bb005
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7101
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove CLRemapKernel.
- Remove NERemapKernel.
Partially resolves COMPMID-4984
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- pass tensor's dimensions at runtime rather than compile time
- Add guard macro to compile only kernel(s) of internest
Resolves: COMPMID-5119
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib01098e397011a1201c2800c62a8954ec70e63e8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7083
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- pass tensor's dimensions at runtime rather than compile time
- Add guard macro to compile only kernel of internest
Resolves: COMPMID-5118
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ie42c3c07fdd817ce62e7cad354381bc22c6e9264
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7058
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-5004
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ib3e1b5a891234316c411ea9825ec10c68c4ab5a3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6788
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
- Pass arguments at runtime
- Rework ClConv2D heuristic to select direct convolution when OFM < IFM
also for small kernel sizes
Resolves COMPMID-5000
Change-Id: I9b538e29093829bc366d24d1e904341c247fa22b
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6771
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- In the dwc_native_fp_nhwc.cl, loop unrolling should only be enabled
when kernel height is less than 5.
- No performance regression experimented
- The patch reduces the compilation time required for the kernel
Resolves COMPMID-4887
Change-Id: I93188b9764cf7d1ad34ac164694f6f1fd37a90e8
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6744
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4889
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I4a88082b13865fdaeaba1b7216503cd640aa54df
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6680
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Pass source and destination tensor dimension info at runtime
Resolves: COMPMID-4887
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ib7c9f3ce6fb7cef600f7b0cd0fadafa4fa6888a1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6635
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
- Add macro guard for different kernels in scale.cl
- Rework TENSOR4D to the new format
- Pass scale_x and scale_y at runtime
Resolves COMPMID-4886
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4663
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I5c3c1cffed5385c06b789543318f7f4d6096987e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6468
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I4d48f1b8eba6681a9de0ae5f1fd8a4ad1edf7fe8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6439
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4660
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
The new kernel performs the computation on multiples elements. The
OpenCL kernel has been re-implemented using the new TILE macros
Resolves COMPMID-4803,COMPMID-4804
Change-Id: Iac8fead65e21b64567a05dbc4fbaa61d362443f9
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6235
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Merge quantized kernels with fp for bilinear interpolation (both NCHW and NHWC)
- Pass dimensions at compile time rather than at run time
- Use tile-based approach to rework the NCHW kernels
- Remove unused functions/files
Resolve COMPMID-4723
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ifcdf02beb9daa9f318395751b3c85eb2fe874082
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6138
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
The Following kernels have been split into nchw/nhwc kernels files:
- batchnormalization_layer
- batch_to_space
- channel_shuffle
- depth_to_space
- dequantization_layer
- im2col
- normalization_layer
- normalize_planar_yuv_layer
- normalize_planar_yuv_layer_quantized
- pooling_layer
- pooling_layer_quantized
- remap
- reorg_layer
- scale
- scale_quantized
- space_to_batch
- space_to_depth
- upsample_layer
- winograd_filter_transform
- winograd_input_transform
- winograd_output_transform
The following kernels have been moved to nchw folder:
- direct_convolution1x1
- direct_convolution3x3
- direct_convolution5x5
- direct_convolution_quantized
- prior_box_layer
The following kernels have been moved to nhwc folder:
- direct_convolution
- dwc_native_fp_nhwc
- dwc_native_quantized_nhwc
The following kernels have been removed:
- sobel_filter
While the rest kerenls have been moved to the common folder.
Partially resolves COMPMID-4453
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|