Age | Commit message (Collapse) | Author |
|
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel)
- Implement OpenCL kernel (indirect_convolution.cl)
- Add test
Resolves COMPMID-5708
Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint.
Resolves: COMPMID-5676
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5634
Change-Id: I075de70d509d0c4430b4bcf3f218384e237a3a56
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/453708
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8473
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Resolves: COMPMID-5600
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I5196d1639c48d0b8a116d47ed1d6c7334dc8f41e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8374
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5632
Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The optimization concerns the case where the depth multiplier is > 1.
The depth multiplier for loop has been removed from the OpenCL kernel
and the GWS has been mapped to the output shape. In this way, we can
still perform a tile with N0 columns and improve the performance of
depthwise conv over 80% when depth multiplier is > 1.
Resolves COMPMID-5568
Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Affects OpenCL backend.
- Resolves COMPMID-5416
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5355
Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
|
|
The regression was caused by NUM_TILES_X passed at runtime.
Resolves COMPMID-5327
Change-Id: Id6ccd93784eda93af09f420c0d786050e2bbccd7
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7727
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds Qasymm8 and Qasymm8_signed support to the 3d pool operator
Resolves: COMPMID-4669
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I36038c2b7c4f36baf67f7aae801356890e104538
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/410496
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7391
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- For NDHWC layout
- For F16 and F32 data types
- Mixed Precision stil not supported
Resolves: COMPMID-4670
Signed-off-by: ramy.elgammal@arm.com
Change-Id: I0e14a13e4625569e8e5ee67e6033bd1efe0da469
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7262
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- pass tensor's dimensions at runtime rather than compile time
- Add guard macro to compile only kernel of internest
Resolves: COMPMID-5120
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I87c3b56ce0cd3c97ffdeabdd9c5d433f361bb005
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7101
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove CLRemapKernel.
- Remove NERemapKernel.
Partially resolves COMPMID-4984
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- pass tensor's dimensions at runtime rather than compile time
- Add guard macro to compile only kernel(s) of internest
Resolves: COMPMID-5119
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib01098e397011a1201c2800c62a8954ec70e63e8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7083
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- pass tensor's dimensions at runtime rather than compile time
- Add guard macro to compile only kernel of internest
Resolves: COMPMID-5118
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ie42c3c07fdd817ce62e7cad354381bc22c6e9264
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7058
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-5004
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ib3e1b5a891234316c411ea9825ec10c68c4ab5a3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6788
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
- Pass arguments at runtime
- Rework ClConv2D heuristic to select direct convolution when OFM < IFM
also for small kernel sizes
Resolves COMPMID-5000
Change-Id: I9b538e29093829bc366d24d1e904341c247fa22b
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6771
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- In the dwc_native_fp_nhwc.cl, loop unrolling should only be enabled
when kernel height is less than 5.
- No performance regression experimented
- The patch reduces the compilation time required for the kernel
Resolves COMPMID-4887
Change-Id: I93188b9764cf7d1ad34ac164694f6f1fd37a90e8
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6744
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4889
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I4a88082b13865fdaeaba1b7216503cd640aa54df
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6680
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Pass source and destination tensor dimension info at runtime
Resolves: COMPMID-4887
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ib7c9f3ce6fb7cef600f7b0cd0fadafa4fa6888a1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6635
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
- Add macro guard for different kernels in scale.cl
- Rework TENSOR4D to the new format
- Pass scale_x and scale_y at runtime
Resolves COMPMID-4886
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4663
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I5c3c1cffed5385c06b789543318f7f4d6096987e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6468
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I4d48f1b8eba6681a9de0ae5f1fd8a4ad1edf7fe8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6439
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolve COMPMID-4660
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
The new kernel performs the computation on multiples elements. The
OpenCL kernel has been re-implemented using the new TILE macros
Resolves COMPMID-4803,COMPMID-4804
Change-Id: Iac8fead65e21b64567a05dbc4fbaa61d362443f9
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6235
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Merge quantized kernels with fp for bilinear interpolation (both NCHW and NHWC)
- Pass dimensions at compile time rather than at run time
- Use tile-based approach to rework the NCHW kernels
- Remove unused functions/files
Resolve COMPMID-4723
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ifcdf02beb9daa9f318395751b3c85eb2fe874082
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6138
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
The Following kernels have been split into nchw/nhwc kernels files:
- batchnormalization_layer
- batch_to_space
- channel_shuffle
- depth_to_space
- dequantization_layer
- im2col
- normalization_layer
- normalize_planar_yuv_layer
- normalize_planar_yuv_layer_quantized
- pooling_layer
- pooling_layer_quantized
- remap
- reorg_layer
- scale
- scale_quantized
- space_to_batch
- space_to_depth
- upsample_layer
- winograd_filter_transform
- winograd_input_transform
- winograd_output_transform
The following kernels have been moved to nchw folder:
- direct_convolution1x1
- direct_convolution3x3
- direct_convolution5x5
- direct_convolution_quantized
- prior_box_layer
The following kernels have been moved to nhwc folder:
- direct_convolution
- dwc_native_fp_nhwc
- dwc_native_quantized_nhwc
The following kernels have been removed:
- sobel_filter
While the rest kerenls have been moved to the common folder.
Partially resolves COMPMID-4453
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|