Age | Commit message (Collapse) | Author |
|
- Update heuristic for CLGEMMReshapedKernel - FP16
- Update heuristic for CLGEMMReshapedOnlyRHSKernel - FP16
Change-Id: I35aa73e59d8c2d1bc6b2dd318fd8eeb3e42c27a4
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4026
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
very few rows
Also added 2D version of the 16-bit route, and altered the selection
heuristic so that 2D mode will be used in cases where 1D mode won't
thread well.
Change-Id: I0057fde08456771dc0090ac51f50d82f8bb86044
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3903
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- The change affects Mali-G71 GPUs and should improve the performance of
GEMM in case of m = 1
Change-Id: I6b0e217e93fe468ec1325a5da74684811519c42f
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4002
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
CLGEMMMatrixMultiplyReshapedKernel
Resolves: COMPMID-3671, COMPMID-3672
- Extend cl image support to f16 in CLGEMMMatrixMultiplyReshapedKernel
- Extend cl image support to f16 in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
- Change the interface of create_image2d_from_buffer
- Extend test
Change-Id: I27363be71fa515fbf71aa4be5ed0d6c730f38f34
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3992
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Solves also:
- COMPMID-3766: CTS Failures in Transpose Neon + FP16
Change-Id: I9d323f45f49cc0bce9e6329790bcf2f0eeec8572
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3949
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Pablo Marquez <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: Ib5b252e1b65794a8f360276d03ff94922e1991f8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3946
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Division follows the flooring division approach where for example 5/2=2 while
-5/2=-3
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I65756e0b31fe8d97f743a4c13dc5f96304722f75
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3929
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Iaf1465f3144371e153ce123ac00da5cc092f77df
Signed-off-by: morgolock <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3939
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add S32 support to NEPixelWiseMultiplication and NEPixelWiseMultiplicationKernel
* Scale == 1/255 is not supported for S32, as on non-aarch64 the
precision requirement is not met, and scale is a non-standard
parameter anyway.
* Fix the data types validation logics to also test for all invalid data
type combinations.
* Add validation tests for S32 NEON PixelWiseMultiplication
* The wrap tolerance for ScaleOther (scale == 1/2^n) cases is set to
1 instead of 0 because the reference uses floating point division
followed by rounding, which is isn't bit accurate.
Change-Id: I28839afda7a4f98c985d1763620e08d98f740142
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3923
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Prefer NEDepthwiseConvolutionLayerNativeKernel as it has a native format
of NHWC avoiding extra transformation to the NCHW domain.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: If5d8de11691b8ef7f4c3816941f87417d0c8646b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3930
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I93c3b795cf6fe0b27008543b6671a3be0a965603
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3916
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Treat bf16 memory on memset as raw memory by casting to void*. This
hides the class-memaccess warning and is safe for the current class
layout of arm_compute::bfloat16
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I5e242827d3737b4491d29abe7570eefee5b6edc1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3928
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fix convert policy validate logics and add missing validate test
* Add S32 support to NEArithmeticSubtraction and NEArithmeticSubtractionKernel
* Add S32 validation tests
Change-Id: I1b6cb15b024613c202fe9f17747a83da43a5ddcf
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3908
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Base the dimensions of the valid region generated by the reshape
kernel on the output shape dimensions.
This allows correct scaling on inputs that are in NHWC format and have
width and height equal to 1 e.g. 1x1x32.
Underlying problem causing this issue is the fact that Compute Library
removes trailing 1's of a given shape.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Icfdafc469214840998e7c198b33f7358d566d2e7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3924
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove padding from NEGEMMTranspose1xWKernel
- Extend test for validating zero padding requirement
Change-Id: I9ce4ca95a500229b045dc140cfff21fdf7373700
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3920
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove padding from NEGEMMInterleave4x4Kernel
- Extend test for validating zero padding requirement
Change-Id: I94abc271e005f9dd6e1721b185631f55f598dbfd
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3915
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I81b0c2482bc20b1ab5124ed6179bb94cbced7875
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3869
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ic0569fe9ed99e61084b601ce84ddc7ef288d1899
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3852
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Properly support "axis" in CL and NEON (and GC) SoftmaxLayer and
LogSoftmaxLayer in accord with mainstream frameworks. Axis now defines
the dimension on which softmax is performed, and supports the range
[-rank, rank)
* Extend validation tests to include valid and invalid axes
* Remove unnecessary LogSoftmaxLayer fixture, as it is only a
specialisation of the SoftmaxLayer fixture
* Change the validation fill value range from [-1000, 1000] to [-10,
10], as the former often results in sparse outputs with a single one and
zeros elsewhere
Change-Id: I8a0040453182b04ed88260de3ba434e98258d863
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3830
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: I8045166d2d3202612fec3f6bf9651f3e6bfcb20f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3783
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I2c5e98aae7698963f106d7423df0e65cd00ee2a9
Signed-off-by: morgolock <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3710
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I33436dab77a47868fbd9872e0b4cf54b3a173e65
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3694
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The issue concerned gemmlowp_mm_reshaped_only_rhs_t_fused_output_stage_fixedpoint.
In particular the issue was with the z index to access the elements from
the lhs reduced tensor used to calculate the offset contribution.
Change-Id: I74f6398fc08894fc323ccd04fda9220752652d31
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3726
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
CLGEMMMatrixMultiplyReshapedKernel
- Change the interface of STORE_BLOCK_BOUNDARY_AWARE passing the
conditions on Y and X rather than the X/ coordinates. This allows to
use the macro with both GEMM reshaped and GEMM reshaped rhs only
- Remove padding from the output tensor of
CLGEMMMatrixMultiplyReshapedKernel
- Add tests for validating the zero padding requirement
Change-Id: I13263cc71ce065c5be34ed198def320dd5823495
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3712
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I3f85ece08c9fc4bdbbfd72b5a872d4f2c4b76357
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3709
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove padding requirement for the input tensor of
CLGEMMReshapeLHSMatrixKernel
- Add utility function to load a boundary aware 2d tensor from buffer
- Extend validation for validating the zero padding requirement
Change-Id: I0ac6b1b517d75fd56998f406e0cce97b40918ce1
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3701
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Remove channel paddings from all nhwc kernels (im2col_3x3_nhwc,
im2col_9x9_nhwc, im2col_generic_nhwc)
* Validate that input total spatial dimensions (with x and y paddings)
are bigger than or equal to the kernel spatial dimension.
- Otherwise it would result in invalid memory reads.
- This problem likely existed before, but was made obvious with the
removal of implicit paddings
* Add zero padding validation tests
* Fix Im2ColValidationFixture by not permuting the input shape in case of
NHWC
Change-Id: I1f895e8938af0e9130cb516106f0b4b665531709
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3696
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
For the elements that shouldn't contribute to the sum, zero
is used to compute the correct sum.
Change-Id: I5360534b5b0f81ee3d3aaaf5a046b99ecd943894
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3703
Reviewed-by: Pablo Marquez <pablo.tello@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Put an additional cast for correctly handling scalar cases
According to opencl specs, logical operators, when performed on
scalar types, always return int regardless of the type of the scalar.
Thus if we were to use the results of a scalar logical op on the
method select, it would be incorrect for any types of width different
than 4 (the width of int)
A concrete example would be that if the VECTOR_SIZE is 1 (scalar case),
and DATA_TYPE is half/f16 (width < 4), then the result type of the ||
op would be int instead of short, which it's supposed to be, and this
would result in the ambiguous function call error for select.
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ibc4985f707f667116668c43b9f9bf39dda789528
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3698
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Fix ArmNN compiling issue through removing defulat template value for offset_no_padding() and pooling2_nchw_maxpool_indices().
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ie7114d102b8533e757b8e841afcac1adfbcb3b54
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3697
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
COMPMID-3682: Fix performance regression on Mali-G76 (Convolution)
Updated the heuristic for GEMMReshapedOnlYRHS for Mali-G76 in order to
take into account small workload cases
Change-Id: I99fccbd0e94e4e21c0d1b88e23f02af06ef16ee9
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3689
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fix out-of-bound mem reads in cases where M < M0 in
CLGEMMMatrixMultiplyNativeKernel and
CLGEMMMatrixMultiplyReshapedOnlyRHSKernel, as a result of the new
boundary-aware reading logics.
* Add fixture tests (alongside the padding configuration tests) in
these 2 kernels to catch all 5 possible scenarios with block dimension
configurations, which includes this particular bug as the
"...BoundaryHandlingFullInXSinglePartialInY" test case
Change-Id: I8a10ab67594171e3ea4fb6e35c84ddc4ab964fba
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3650
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add OpenCL kernel for Max unpooling layer
- Add tests for validating the result
Change-Id: If7ca79566a1198e3141f880abf46738980a62c81
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3606
Reviewed-by: Pablo Marquez <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Fix PoolingLayer max pooling reference bug to extract indices.
Extend CLPoolingLayer max pooling to extract indices, all the paddings need to be substracted.
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: If8e82e7f7e03172ad05f5a7cd5f13cf682fd1ffc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3649
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Collapse InputTensorMap and OutputTensorMap to a single TensorPack
mechanism.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Ie2fdfc6b07d84ad589169ec99ca64fcf45a00bec
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/253783
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3641
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
* Fix invalid use of vstore_partial_1
* Add configuration tests to catch this error case
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I25a2b16a530992acc869a4335c48a8fffa420850
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3628
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Allow computations with aligned corners when the tensors have
width/height equal to 1.
Change-Id: Ia01733f6c02e0740835b26a794b9a79fa35319b4
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3634
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sadik Armagan <sadik.armagan@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: Ib14d158b9c5568981835312dcd9d5b9ca116649a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3637
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Ic05ef206f76477cc2fbb9e7ad56ec1974fa013ea
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3626
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: Idc5ac2dd2ba5295c00c88b44a783645327a27e15
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3617
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
In order to fix the issue caused by the limited precision of FP16.
mixed precision (float accumulator) is introduced to
NEInstanceNormalizationLayerKernel. Since the reference kernel
is doing the mixed precision, currently mixed preicision computation
is default when it is called from NEInstanceNormalizationLayer.
- Make NEInstanceNormalizationLayerKernel use kernel descriptor
to enable mixed precision computation
- NEInstanceNormalizationLayer is modified to use the descriptor
Change-Id: I7766622d715df054e303f9b441380b15b51f02b2
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3589
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: I94007565e688f8a0aead4f14c9fc30bfd9f9f7eb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3613
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Hybrid kernel turns to be faster for qasymm8 than quantized_wrapper with interleaved.
Signed-off-by: Aleksandr Nikolaev <aleksandr.nikolaev@arm.com>
Change-Id: I200646aee6cdcabfe125b746c7d87bfa7d06e0fc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3585
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Upgrade the current 'is_preferred()' mechanism with a new framework,
where kernels instead provide an estimated cycle count figure.
Compatibility with old mechanism is achieved via a wrapper which
replaces a "true" result with an estimate of 0, and a "false" result
with UINT64_MAX.
This mechanism is then used to select between 'interleaved' and
'hybrid' FP32 NEON kernels. This uses a simple system based on
counting MACs performed and bytes of data transferred (for
rearrange/merge operations) and dividing by fixed performance figures,
which are provided for A53, A55, A73 and 'default' figures (based on
A76).
Separately, a new route for performing int8 GEMMs by using the int16
kernel is provided. This performs significantly (for uint8) or
slightly (for int8) better on A53 than the existing int8 route.
Optimized 8-to-16 bit transforms are also included.
Change-Id: I53b2e59eb9368793c78c2081e17d2445361bcc47
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/250120
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3609
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Quantization wasn't done correctly and since we have helpers for that,
the code has been modified to use them.
Change-Id: Ia16577cea57dcb1864d91a06ab6aebf8ead67de5
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3608
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Supported strides 1 and 2
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I4b9f087c0c328234159b2d1eacc2e465b3bb3c54
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3603
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I611adf4f506d406540e920b0bd6befb4b5108918
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3601
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: I9ff7e8d2fb4d36c4b7c44e885abf34ff6d4c577c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3587
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Update the heuristic for Arm Mali-G77 (F32) in order to use the OpenCL image2d object on GEMM
Change-Id: Ife6736a22ec2a114368bb338908f0c5f239dfad6
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3593
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Add broadcast support for NEPixelWiseMultiplicationKernel with QASYMM8/QASYMM8_SIGNED
Add test case for QASYMM8 broadcast
Fix QASYMM8 saturation issue
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ie67cfa8b94ab542133b031efbff8379cc57cfc2d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3586
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|