Age | Commit message (Collapse) | Author |
|
The existing 4x4 tiling for 32-bit transpose is not efficient on aarch64, given that there are a lot more Neon registers available. So making the tile size to 8x8 will greatly improve NETranspose latency.
For example, on AWS Graviton3 processors, with this change I have observed transposing a 768x768 matrix improves latency from 0.32ms down to 0.19ms. Improvement can also be seen across different matrix sizes.
Further enlarging the tile size to 8x16 or 16x16 won't make it perform as good as 8x8 due to register pressure.
This change is to mitigate the issue reported at:
https://github.com/ARM-software/ComputeLibrary/issues/1045
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: Ia09859cdf2f6d312e67219a9d95a3a3bf1db1999
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9448
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Resolves: COMPMID-5899
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I89d96e292c3492ba9b1900a3e5683f9dcd11dfc6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9440
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implement OpenCL kernels for batched Matrix Multiplication for the quantized data types QASYMM8 and QASYMM8_SIGNED.
Quantized MatMul is supported with the following MatMul attributes:
* adj_x = false, adj_y = false
* adj_x = true, adj_y = false
We consider native format kernels only. In other words, no reshaping of the operand matrices is done.
Resolves: COMPMID-5921, COMPMID-5922
Change-Id: I99e0f68054a2bd635c60ec2641acc2e7ff398473
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9435
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Includes an example
Resolves: [COMPMID-5867]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I02c0cafecf56e9cbaf9e8284d43f8ef87af8f4d8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/510081
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9441
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Case: when the dequantized float value < 0.f the unary op was
not called if operator is not LOG or RSQRT
Resolves: COMPMID-5994
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I24d69db22042701f506188ace91ea4ab3dafeccf
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9437
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Ensure naming of MatMul on GPU conforms to the naming convention <backend><operator><config> i.e. ClMatMul operator with the backend ClMatMulNativeKernel.
Resolves: COMPMID-6015
Change-Id: I021d235b023ad17fe97bd6913e6a50d0ba4b194e
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9443
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5995
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I707b8918bebee7e70d4de5207ef555c806e7a305
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9405
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5904
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I03bc51a7c5b05cca5db16a39f95e92d72240ab3a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9420
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implements MatMul function and operator for floating point datatype FP16/FP32
- Includes support for transposing dynamic tensors prior to matrix multiplication.
- Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors)
- Updates fixture to allow for testing fused activation in MatMul
- Adds tests for matmul with and without fused activation
Resolved: [COMPMID-5898]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adding fallback functions neon_qasymm8_signed_elementwise_unary() and
neon_qasymm8_elementwise_unary()
- They would be called in case target is not aarch64
Resolves: COMPMID-5994
Change-Id: Id0db1e7cb0fe92f1eaef0b3a9ed2bea01b3f2a15
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9416
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resovles: COMPMID-6002
Change-Id: Ifc2b7c889679b21d7e58f533be9c865854e132ef
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9408
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
The fully connected function and operator running on GPU have been adapted to support dynamic weights.
Dynamic weights require the reshape and data layout conversion of weight tensors at runtime in the prepare stage of the operator. The implementation for GPU is identical to the CPU implementation.
This patch also deprecates the `are_weights_reshaped` option in Fully Connected.
Resolves: COMPMID-5870
Change-Id: I28f967695879d82cc91a928d95308a4e0e52a597
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9403
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5949
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Idd8cfe6ea94a14f0b23178f6781251b5f0955563
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9390
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Deprecate dynamic block shape interface
- Iterate over output window instead of input window for simpler implementation and better performance.
- Add cropping support and cropping tests
Resolves [COMPMID-5865]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: Ic67d44a6a39299ecdafc507f12e3dc5d517dfb62
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9385
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5989
Change-Id: I2bc34e3d1889d88ce9afbd262ea4ef1a5b0b9be5
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9397
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Deprecate dynamic block shape interface
- Iterate over output window instead of input window for simpler
implementation and better performance
- Add cropping support and cropping tests
Resolves COMPMID-5918
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ifea0f5f7760ffd0f4d5d4f3a5ae8d14d0b98b790
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9378
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed namespace arm_compute::utils::requires to fix the build error
‘requires’ is a keyword in C++20 [-Wc++20-compat]
* Added missing includes for cstdint.h
* Resolves MLCE-1040
Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Adds additional ARM_COMPUTE_ENABLE_FP16 guards to Convolution layer
testing to ensure that validation suite passes on armv8a hardware when
built with arch=armv8a, and multi_isa=0.
Partially resolves ONCPUML-1209
Change-Id: Ib485502e534df1fa91c5c2d7b222ea08a354cc54
Signed-off-by: Nathan John Sircombe <nathan.sircombe@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9383
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add quantized unary elementwise in CPU using LUT.
* Widen the input data range of the test suite.
- Fix CPU exponential function overflow/underflow range.
- Fix saturation issue of CL round operator.
Resolves: COMPMID-5763
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use a vector to represent the (static) block shape instead of an N-D
Tensor. The previous use of ND Tensor as block shape was wrong, not
adhering to the specification, and non-functional (only first dim was
used anyway).
* The fixture now accepts a static block shape, because the dynamic
case is not properly implemented and will be deprecated for now.
* Fix an assertion error in reference implementation.
Partially resolves COMPMID-5918
Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5952, COMPMID-5956
Change-Id: Idbd14538e7660792254072fa9631a6f03966f89b
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9371
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5985
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I0e789619f09e3adefe3655df347390f057300c0f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9373
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5945, COMPMID-5954
Change-Id: I7b27021d21f8e08c4896f6b1f595a75125064f9e
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9356
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* This patch adds support for rounding modes in vmlaq_qasymm8_signed
which is used to compute Relu for quantized types
* Partially resolves MLCE-1018
Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
threading mode.
Resolves: COMPMID-5844
Change-Id: Iceb0018114bbca2bfdac4d4406936f9b260539e9
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9070
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
Resolves: COMPMID-5917
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement opencl kernel for LHS transposed and RHS non-transposed
- Implement opencl kernel for LHS transposed and RHS transposed
- Add validation tests
Resolves: COMPMID-5953, COMPMID-5955
Change-Id: I55589acbffe86c44e29807574975978a1ec09bad
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5863
Change-Id: I9ff67face62826c1d335a6b941e8516be39bdac8
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/488768
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9225
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement ClNativeMatMulKernel class
- Implement opencl kernel for LHS non-transposed and RHS non-transposed
- Implement opencl kernel for LHS non-transposed and RHS transposed
- Add test fixture and dataset for matmul
- Implement transpose_tensor() for reference implementation to transpose high dimensional tensors
Resolves: COMPMID-5944, COMPMID-5951
Co-authored-by: Gunes Bayir <gunes.bayir@arm.com>
Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I1d5b8978f41be27baddb3153ade880472141573f
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-5969
Change-Id: I0ab9c93c81111ddd3ee9e7feb5033a7102cc98a5
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9341
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-5969
Change-Id: I2fc6dcec53051886e46404857763ee69e2779014
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9330
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-5969
Change-Id: I05f7c77d7ff6ba1e65b752b4f705d8964b04357f
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9331
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-5969
Change-Id: I09f41abbda378cd6551bdf5bf4866b2bf4ca3096
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9332
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves COMPMID-5918, COMPMID-5865
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ib3b01e7dc1c944184a4c038045bf0469fbb9ff45
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9321
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5966
Change-Id: Ic0d694493178da029a297643855bd0cff01b174f
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9302
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Without this, we have to pass in weights to be NHWC, even if they
are in fact blocked/interleaved for consumption by a fixed
format kernel.
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Change-Id: I9ee8720a21a16b17816dbecf6308e1668ddda59c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9285
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* The shape of input and indices tensors, and the gather axis
can be any number, as long as these are valid and the output
tensor doesn't have more dimensions than the library supports.
* Update the reference code to be more generic and straightforward.
* Add necessary test cases.
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Resolves: COMPMID-5919
Change-Id: Ic7e2032777aa97ecc147f61d5388528697508ab1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9199
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add sigmoid and tanh activation functions for dynamic fusion.
* Add corresponding tests, but both activation functions share
the same fixture implementation.
Resolves: COMPMID-5939
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I0aae0eaa18b746ce89680d2773c66e09b0f854ce
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9257
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add an option in scons script to export compile_commands.json
file to support development using language server.
* Add .cache directory to the git ignore list. It is normally
used by clangd as the local cache.
Resolves: COMPMID-5940
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I8c2a1ac85942d34ada22adea3e7de2baf2189eb2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9258
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The SME kernels for quantized int8/uint8 GEMMs erroneously required that
maxthreads==1 before they could be selected. This resulted in them not
being available on multi-thread runs. Remove that restriction.
Resolves COMPMID-5962
Change-Id: Ia7933d0c66020b5e2981604ca97ff7ead95ec14e
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9274
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Fusion source files
Resloves: [COMPMID-5960]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I1b11f01c51a029082ed05823717b4c4ae4897798
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9270
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Ensure CLTuner uses the real GWS used by run(), instead of the
static GWS (which is usually changed at run time), by caching GWS in
each kernel
Note this is a somewhat inelegant workaround. The real issue stems
from the fact that execution window and scheduler are very much
coupled with our operator run() / run_op() method.
(Please see COMPMID-5934)
* Restrict LWS values to explore within GWS bound for exhaustive mode
* Refactor gws_from_window() to include all the information required
to calculate GWS
* Log lws search space used for tuning
* Fix ClDirectConv2dKernel config id
Resolves COMPMID-5892
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I420490d8b94d13ada2e44eb0a12078f883379334
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9193
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is so that we can leverage fixed format kernel when
using gemm convolution method.
Partially resolves: [ONCPUML-1129]
Change-Id: I61ffa74f5cd9d75579dbc1f9aa187371f855e932
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9248
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Currently the validation routine incorrectly prevents optimized INT8 Gemm kernel from being used if the input is QASYMM8 and output type is S32.
This change allows QASYMM8 input and S32 output types to leverage optimized kernel.
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: I65b060f522795db07d6d4df86fb7c6ddd1c626d4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9250
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Put input and output tensor shape value directly to the CL code.
* Use texture for weights when it is possible.
Resolves: COMPMID-5938
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ib53b310a80ce857eac36564b352136fdde55b131
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9249
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a max pooling implementation that returns kernel indices.
- Add a parameter in pooling info object to pick kernel indices impl.
- Add validation tests.
Resolves: [ONCPUML-1187]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a parameter in PoolingLayerInfo class to pick which value to use as min for max-pooling.
Resolves: [ONCPUML-1166]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I34e1cccc15176bbf31523c61e99f3188ddca23e1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8989
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves : [COMPMID-5930]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Icb6c8a9d1b5a2c5c57e37cb7c877414ed500d0cc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/494758
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Sicong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9181
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Resolves: COMPMID-5936
Change-Id: Iedfcc632f5d900865f38bd7b164121af48546542
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9220
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fix 4 failing tests for multi_isa builds when experimental_fixed_format_kernels=1
- Fixes for CMake and Bazel builds to pass validation tests
- Update documentation, remove “-DCPPTHREADS=1” flag from CMake build example
Partially resolves: ONCPUML-1181
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Change-Id: I7101676260a0adcb7b6ff6f4342ae36f921e7120
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9189
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|