Age | Commit message (Collapse) | Author |
|
- Performs more output elements per work-item in the case of Fp16
computation in Winograd Input/Output transform
Resolves COMPMID-6018
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Change-Id: If5e6f5182eff8c1f05a3505c437d0a997490f0bd
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9447
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5988
Change-Id: I93e78edf31c9eec8242ccbb8c3c768f46a7c7c38
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9456
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is required for the case where rhs (B) is dynamic and needs to be
pretransposed in every run.
In a multi-threaded setting, this means the previously single-threaded
pretranspose_B_array would become the bottleneck
Resolves COMPMID-5896
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Id508c46992188a0f76a505152931d4955d04c16d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- If input value to round has the decimal point beyond the fraction
part (23 bit) then simply return the rounded value = input value.
Resolves: COMPMID-6025
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I1994e49a9bca7daeaeec7681aec099c63a97b53f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9473
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
RHS transposed
Resolves: [COMPMID-5924]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I9ba657737eb1e3a096c8341ad4ad311571f8edeb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9454
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The existing 4x4 tiling for 32-bit transpose is not efficient on aarch64, given that there are a lot more Neon registers available. So making the tile size to 8x8 will greatly improve NETranspose latency.
For example, on AWS Graviton3 processors, with this change I have observed transposing a 768x768 matrix improves latency from 0.32ms down to 0.19ms. Improvement can also be seen across different matrix sizes.
Further enlarging the tile size to 8x16 or 16x16 won't make it perform as good as 8x8 due to register pressure.
This change is to mitigate the issue reported at:
https://github.com/ARM-software/ComputeLibrary/issues/1045
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: Ia09859cdf2f6d312e67219a9d95a3a3bf1db1999
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9448
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Resolves: COMPMID-5899
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I89d96e292c3492ba9b1900a3e5683f9dcd11dfc6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9440
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implement OpenCL kernels for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED.
Quantized MatMul is supported with the following MatMul attributes:
* adj_x = false, adj_y = false
* adj_x = true, adj_y = false
We consider native format kernels only. In other words, no reshaping of the operand matrices is done.
Resolves: COMPMID-5921, COMPMID-5922
Change-Id: I99e0f68054a2bd635c60ec2641acc2e7ff398473
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9435
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Case: when the dequantized float value < 0.f the unary op was
not called if operator is not LOG or RSQRT
Resolves: COMPMID-5994
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I24d69db22042701f506188ace91ea4ab3dafeccf
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9437
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Ensure naming of MatMul on GPU conforms to the naming convention <backend><operator><config> i.e. ClMatMul operator with the backend ClMatMulNativeKernel.
Resolves: COMPMID-6015
Change-Id: I021d235b023ad17fe97bd6913e6a50d0ba4b194e
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9443
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5995
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I707b8918bebee7e70d4de5207ef555c806e7a305
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9405
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implements MatMul function and operator for floating point datatype FP16/FP32
- Includes support for transposing dynamic tensors prior to matrix multiplication.
- Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors)
- Updates fixture to allow for testing fused activation in MatMul
- Adds tests for matmul with and without fused activation
Resolved: [COMPMID-5898]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adding fallback functions neon_qasymm8_signed_elementwise_unary() and
neon_qasymm8_elementwise_unary()
- They would be called in case target is not aarch64
Resolves: COMPMID-5994
Change-Id: Id0db1e7cb0fe92f1eaef0b3a9ed2bea01b3f2a15
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9416
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The fully connected function and operator running on GPU have been adapted to support dynamic weights.
Dynamic weights require the reshape and data layout conversion of weight tensors at runtime in the prepare stage of the operator. The implementation for GPU is identical to the CPU implementation.
This patch also deprecates the `are_weights_reshaped` option in Fully Connected.
Resolves: COMPMID-5870
Change-Id: I28f967695879d82cc91a928d95308a4e0e52a597
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9403
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5949
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Idd8cfe6ea94a14f0b23178f6781251b5f0955563
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9390
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Deprecate dynamic block shape interface
- Iterate over output window instead of input window for simpler implementation and better performance.
- Add cropping support and cropping tests
Resolves [COMPMID-5865]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Deprecate dynamic block shape interface
- Iterate over output window instead of input window for simpler
implementation and better performance
- Add cropping support and cropping tests
Resolves COMPMID-5918
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ifea0f5f7760ffd0f4d5d4f3a5ae8d14d0b98b790
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9378
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed namespace arm_compute::utils::requires to fix the build error
‘requires’ is a keyword in C++20 [-Wc++20-compat]
* Added missing includes for cstdint.h
* Resolves MLCE-1040
Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add quantized unary elementwise in CPU using LUT.
* Widen the input data range of the test suite.
- Fix CPU exponential function overflow/underflow range.
- Fix saturation issue of CL round operator.
Resolves: COMPMID-5763
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use a vector to represent the (static) block shape instead of an N-D
Tensor. The previous use of ND Tensor as block shape was wrong, not
adhering to the specification, and non-functional (only first dim was
used anyway).
* The fixture now accepts a static block shape, because the dynamic
case is not properly implemented and will be deprecated for now.
* Fix an assertion error in reference implementation.
Partially resolves COMPMID-5918
Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5952, COMPMID-5956
Change-Id: Idbd14538e7660792254072fa9631a6f03966f89b
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9371
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5985
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I0e789619f09e3adefe3655df347390f057300c0f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9373
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5945, COMPMID-5954
Change-Id: I7b27021d21f8e08c4896f6b1f595a75125064f9e
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9356
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* This patch adds support for rounding modes in vmlaq_qasymm8_signed
which is used to compute Relu for quantized types
* Partially resolves MLCE-1018
Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
threading mode.
Resolves: COMPMID-5844
Change-Id: Iceb0018114bbca2bfdac4d4406936f9b260539e9
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9070
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
Resolves: COMPMID-5917
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement opencl kernel for LHS transposed and RHS non-transposed
- Implement opencl kernel for LHS transposed and RHS transposed
- Add validation tests
Resolves: COMPMID-5953, COMPMID-5955
Change-Id: I55589acbffe86c44e29807574975978a1ec09bad
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5863
Change-Id: I9ff67face62826c1d335a6b941e8516be39bdac8
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/488768
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9225
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement ClNativeMatMulKernel class
- Implement opencl kernel for LHS non-transposed and RHS non-transposed
- Implement opencl kernel for LHS non-transposed and RHS transposed
- Add test fixture and dataset for matmul
- Implement transpose_tensor() for reference implementation to transpose high dimensional tensors
Resolves: COMPMID-5944, COMPMID-5951
Co-authored-by: Gunes Bayir <gunes.bayir@arm.com>
Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I1d5b8978f41be27baddb3153ade880472141573f
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves COMPMID-5918, COMPMID-5865
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ib3b01e7dc1c944184a4c038045bf0469fbb9ff45
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9321
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5966
Change-Id: Ic0d694493178da029a297643855bd0cff01b174f
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9302
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Without this, we have to pass in weights to be NHWC, even if they
are in fact blocked/interleaved for consumption by a fixed
format kernel.
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Change-Id: I9ee8720a21a16b17816dbecf6308e1668ddda59c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9285
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* The shape of input and indices tensors, and the gather axis
can be any number, as long as these are valid and the output
tensor doesn't have more dimensions than the library supports.
* Update the reference code to be more generic and straightforward.
* Add necessary test cases.
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Resolves: COMPMID-5919
Change-Id: Ic7e2032777aa97ecc147f61d5388528697508ab1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9199
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add sigmoid and tanh activation functions for dynamic fusion.
* Add corresponding tests, but both activation functions share
the same fixture implementation.
Resolves: COMPMID-5939
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I0aae0eaa18b746ce89680d2773c66e09b0f854ce
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9257
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The SME kernels for quantized int8/uint8 GEMMs erroneously required that
maxthreads==1 before they could be selected. This resulted in them not
being available on multi-thread runs. Remove that restriction.
Resolves COMPMID-5962
Change-Id: Ia7933d0c66020b5e2981604ca97ff7ead95ec14e
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9274
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Fusion source files
Resloves: [COMPMID-5960]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I1b11f01c51a029082ed05823717b4c4ae4897798
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9270
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Ensure CLTuner uses the real GWS used by run(), instead of the
static GWS (which is usually changed at run time), by caching GWS in
each kernel
Note this is a somewhat inelegant workaround. The real issue stems
from the fact that execution window and scheduler are very much
coupled with our operator run() / run_op() method.
(Please see COMPMID-5934)
* Restrict LWS values to explore within GWS bound for exhaustive mode
* Refactor gws_from_window() to include all the information required
to calculate GWS
* Log lws search space used for tuning
* Fix ClDirectConv2dKernel config id
Resolves COMPMID-5892
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I420490d8b94d13ada2e44eb0a12078f883379334
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9193
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is so that we can leverage fixed format kernel when
using gemm convolution method.
Partially resolves: [ONCPUML-1129]
Change-Id: I61ffa74f5cd9d75579dbc1f9aa187371f855e932
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9248
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Currently the validation routine incorrectly prevents optimized INT8 Gemm kernel from being used if the input is QASYMM8 and output type is S32.
This change allows QASYMM8 input and S32 output types to leverage optimized kernel.
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: I65b060f522795db07d6d4df86fb7c6ddd1c626d4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9250
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Put input and output tensor shape value directly to the CL code.
* Use texture for weights when it is possible.
Resolves: COMPMID-5938
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ib53b310a80ce857eac36564b352136fdde55b131
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9249
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a max pooling implementation that returns kernel indices.
- Add a parameter in pooling info object to pick kernel indices impl.
- Add validation tests.
Resolves: [ONCPUML-1187]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a parameter in PoolingLayerInfo class to pick which value to use as min for max-pooling.
Resolves: [ONCPUML-1166]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I34e1cccc15176bbf31523c61e99f3188ddca23e1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8989
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Resolves: COMPMID-5936
Change-Id: Iedfcc632f5d900865f38bd7b164121af48546542
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9220
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves: COMPMID-5909
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Ia0ae7bfe22f8866e3d82f62098757e3bc55f46ca
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9183
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Dividing scale by number of elements causes accuracy loss due to limitations in float datatype and truncation to int
- Adds rounding after division on aarch64 to negate this.
Resolves: [COMPMID-5839]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I54ef0f7e56f39da1fa5f30378f551b5ca419a61d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/492456
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9110
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Skip upsampling step for deconvolution when input strides are 1
regardless of kernel size. This is achieved by setting correct
paddings for unit strides convolution.
Resolve: [ONCPUML-1183]
Change-Id: Ief88f9fe30f6f56d3358e3cf6a506ab8b5691f18
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9134
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5849
Change-Id: I86f8bbc1f3a7c12c66d5ad8fcd74dd9e69629aa0
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9102
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Dynamic-Fusion: Jakub Sujak <jakub.sujak@arm.com>
|
|
Resolves: COMPMID-5805
Change-Id: Idf720bbb136474810086f5089c5ed23b3f79835a
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9081
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Resolve COMPMID-5689
Change-Id: I81a3791ad054db59562b76d1c729f2b2168aee8b
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Signed-off-by: Andrew Mundy <andrew.mundy@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: [COMPMID-5854]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ib0228409be5e816acca7e123f2660eb01a79e38f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9078
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|