Age | Commit message (Collapse) | Author |
|
Resolves: COMPMID-6155
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ie651be65404b0b737464d7a79ebcc58475863ba0
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9555
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* There is an issue with quantized fully connected and matmul
when the lower bound of bounded ReLU is negative.
* Use int32_t for the calculation of min/max quantized value
rather than PixelValue to avoid this issue.
Partially resolves: COMPMID-5996
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I7b22e9d56a2441fc6a4c5c4e627f57d6e00d6ff1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9502
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is required for the case where rhs (B) is dynamic and needs to be
pretransposed in every run.
In a multi-threaded setting, this means the previously single-threaded
pretranspose_B_array would become the bottleneck
Resolves COMPMID-5896
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Id508c46992188a0f76a505152931d4955d04c16d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The existing 4x4 tiling for 32-bit transpose is not efficient on aarch64, given that there are a lot more Neon registers available. So making the tile size to 8x8 will greatly improve NETranspose latency.
For example, on AWS Graviton3 processors, with this change I have observed transposing a 768x768 matrix improves latency from 0.32ms down to 0.19ms. Improvement can also be seen across different matrix sizes.
Further enlarging the tile size to 8x16 or 16x16 won't make it perform as good as 8x8 due to register pressure.
This change is to mitigate the issue reported at:
https://github.com/ARM-software/ComputeLibrary/issues/1045
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: Ia09859cdf2f6d312e67219a9d95a3a3bf1db1999
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9448
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Resolves: COMPMID-5899
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I89d96e292c3492ba9b1900a3e5683f9dcd11dfc6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9440
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Case: when the dequantized float value < 0.f the unary op was
not called if operator is not LOG or RSQRT
Resolves: COMPMID-5994
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I24d69db22042701f506188ace91ea4ab3dafeccf
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9437
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
Resolves: COMPMID-5995
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I707b8918bebee7e70d4de5207ef555c806e7a305
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9405
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implements MatMul function and operator for floating point datatype FP16/FP32
- Includes support for transposing dynamic tensors prior to matrix multiplication.
- Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors)
- Updates fixture to allow for testing fused activation in MatMul
- Adds tests for matmul with and without fused activation
Resolved: [COMPMID-5898]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adding fallback functions neon_qasymm8_signed_elementwise_unary() and
neon_qasymm8_elementwise_unary()
- They would be called in case target is not aarch64
Resolves: COMPMID-5994
Change-Id: Id0db1e7cb0fe92f1eaef0b3a9ed2bea01b3f2a15
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9416
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed namespace arm_compute::utils::requires to fix the build error
‘requires’ is a keyword in C++20 [-Wc++20-compat]
* Added missing includes for cstdint.h
* Resolves MLCE-1040
Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add quantized unary elementwise in CPU using LUT.
* Widen the input data range of the test suite.
- Fix CPU exponential function overflow/underflow range.
- Fix saturation issue of CL round operator.
Resolves: COMPMID-5763
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* This patch adds support for rounding modes in vmlaq_qasymm8_signed
which is used to compute Relu for quantized types
* Partially resolves MLCE-1018
Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5917
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Without this, we have to pass in weights to be NHWC, even if they
are in fact blocked/interleaved for consumption by a fixed
format kernel.
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Change-Id: I9ee8720a21a16b17816dbecf6308e1668ddda59c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9285
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Currently the validation routine incorrectly prevents optimized INT8 Gemm kernel from being used if the input is QASYMM8 and output type is S32.
This change allows QASYMM8 input and S32 output types to leverage optimized kernel.
Signed-off-by: Ethan Doe <yidoe@amazon.com>
Change-Id: I65b060f522795db07d6d4df86fb7c6ddd1c626d4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9250
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a max pooling implementation that returns kernel indices.
- Add a parameter in pooling info object to pick kernel indices impl.
- Add validation tests.
Resolves: [ONCPUML-1187]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a parameter in PoolingLayerInfo class to pick which value to use as min for max-pooling.
Resolves: [ONCPUML-1166]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I34e1cccc15176bbf31523c61e99f3188ddca23e1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8989
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5805
Change-Id: Idf720bbb136474810086f5089c5ed23b3f79835a
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9081
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Resolve COMPMID-5689
Change-Id: I81a3791ad054db59562b76d1c729f2b2168aee8b
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Signed-off-by: Andrew Mundy <andrew.mundy@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: [COMPMID-5854]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ib0228409be5e816acca7e123f2660eb01a79e38f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9078
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fixes Column Offset matrix is not being iterated through in y dimension
Resolves : COMPMID-5795
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I0190474be404b4f0e171855739cfd0a48cbed5bc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9020
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types.
The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel).
The inputs are
- input1 : nD tensor [X, Y, Z, W, ..]
- input2 : nD tensor [X, Y, Z, W, ..]
- add_coef : 1D tensor [X]
- mul_coef : 1D tensor [X]
The outputs are
- out1 : nD tensor (intermediate output) [X, Y, Z, W, ..]
- out2 : nD tensor (final output) [X, Y, Z, W, ..]
The operation can be summarized as follows:
out1 <- input1 + input2
out2 <- Act(out1 * mul_coef + add_coef)
The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr.
The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward.
Resolves: COMPMID-5463
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove hack in CpuGemmAssemblyDispatch.cpp which tried to guess
strides for fixed format kernels. Instead, expect that strides will
have been correctly set on weights externally
- Update fixed format test fixtures to set the strides
- If the fixed format uses fast math mode, then weights should be of
type BFLOAT16. Change the validation logic to accept this.
Resolves: [ONCPUML-1131]
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I0f18d8b86b0f639be25fd122fa06a591e90645f2
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8985
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Related to: COMPMID-5660
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I2314c8b21acc638402c77080d59db2f3fed58fe2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8911
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit 3c59f01c209d2732a15d97d65565ead964787a8b.
Resolves: COMPMID-5817
Change-Id: Ie2443a21854a95db1e3d0cafa2121c0187a5e237
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8974
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5805
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Change-Id: I250f64531e209625e4ff176dd5a552c1c34bc484
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8909
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed BF16 validation tests for DepthConvert
* Revert back to using inline assembly to convert to/from BF16
* Resolves COMPMID-5800
Change-Id: I803b2ad19ead297417f780c97c5b724cca6b394c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8929
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-5629]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I061ea5bdafa3a01e66ff869d158f26a38d19e125
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8835
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fixes various mismatches when converting FP32 to BF16 and
BF16 to FP32
* Fixed segfault when trying logging=1 and trying to log BF16
* Resolves MLCE-979
Change-Id: Ie517d0b7411b4e3a7fecdee588f0e073d290625a
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8830
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The fp32 mws selection variables are guarded because they are flagged as unused when data type support does not include fp32.
Resolves: COMPMID-5761
Change-Id: I7ac783e3d5ca51868b562ab879d03e02140b51a1
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8707
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: fadara01 <fadi.arafeh@arm.com>
Change-Id: Ieaa2fa4a6a69e7e0a48633967dabe91c786b42b7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8682
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Added approximate values for MWS for the following binary operators:
Add, Sub, Mul, Min, Max, Div
Change-Id: I5c4c75511129982a3f44c038ee272f09598469de
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/459609
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Signed-off-by: fadara01 <fadi.arafeh@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8392
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Regression is caused by the small default mws in ActivationLayer
- Syncronization of threads takes longer than the workload on small sized tensors.
- Size 1536 is chosen arbitrarily based on the size of tensors in benchmarked networks
Resolves: [COMPMID-5655]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I02e865b578399e75484f471e67806dd4cf7502c0
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/468454
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8615
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-5698]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I21ce976473e0e8807c14989e98e68aca69c7f1f3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8603
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* When the tensors are reinterpreted as 1D, any thing smaller than
10KB won't be splitted into different thread.
Resolves: COMPMID-5630
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Icff7089e37c85c8b325f099008a080a5805d36a2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8581
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Datatype supported qasymm8 and qasymm8_signed.
* Resolves COMPMID-5211
Change-Id: Ib161af3af5ad11ec6b46de0bf4fbac172d2525fb
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7729
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-5461]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I89b99d267c32b00ef44f9bb6e7c714dfe4a0d29d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8420
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* The range check condition is incorrect. The maximum value
of 8-bit quantized data is 255, not 1023.
* Use 256 to make the check slightly stricter than it should.
Resolves: COMPMID-5458
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I5c071680531574b409a7ce71a732f6480caa10d8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8537
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Check whether weights are defined as constant, if they are not constant then do not pack them if they are already packed so that they can be updated.
Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I73447e31e3660b05f8f40e04ea4ea2003eb9b802
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8539
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Added missing threshold for calculating SOFT_RELU when SVE and CL implementations are used. As a result removed from the testing bounds for input values that were set to be in the interval [-40, 40].
Resolves: COMPMID-5658
Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I3d14df60125e36e4eb85aeb222f4fb0cc5741521
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8536
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Relates to : COMPMID-5507
Change-Id: Ia2c4ea153ac2524ffa6b2a9c10f3a0318a8a67a1
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8509
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use the same rounding function for the left-over part with the
vectorized part.
Resolves: COMPMID-5640
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I07450b2a43390b77539b78cd5d3e6772bdc38548
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8520
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use the same implementation as other layers.
Resolves: COMPMID-5108
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I5a50259b398b71ca1f61b5ee8daa539bf8263fac
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8501
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds tests for batched matrix multiplication
- Bugfix for issue : 3d tensors input tensors with offsets in GemmLowp results in mismatches
Resolves : [COMPMID-5507]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I68e036fbca642c1841dd4321033045aadc8f5636
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/461298
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8482
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Do not change the tensor info after configure stage.
* By fixing this, the 1D optimization for activation layer can be
applied to all data types and tensor layout.
Resolves: COMPMID-5644
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I557f9bb84e5e456c28d6b423584887d7a3648ad4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8470
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Use a 1d execution window to improve memory access pattern.
Resolves: [COMPMID-5465]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ida30669ffa06eb002ca43a6edf15e25a6eaad2f6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8344
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5494
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I8f512745855b8ca21181a9ab21323bfff6aeb866
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/458884
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8391
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use the window instead of the tensor shape to determine the
number of elements in the x-dimension.
* Remove the LUT implementation in 32-bit build.
Resolves: COMPMID-5641
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I0a79aa38d8f6a105ad01785bd94571f5a2ecb348
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8380
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Resolves: COMPMID-5462
Change-Id: I2c7151c8faf4016cc33592fff04d492d7cbc8fd6
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8366
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5601
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I1baf92c4751d784d017c0b2f7de1fc09e42ce69c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8309
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|