Age | Commit message (Collapse) | Author |
|
* Use fixed-point arithmetic where possible.
* Various optimization for the FP32-based implementation.
This implementation is kept as the fall-back solution
in case of unrealistic quantization parameters that exceed
the range of fixed-point solution.
Resolves: COMPMID-5458
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I221d2d3801ecaae4fe0b7cf6ae8ef00ca3743665
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8317
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC.
This patch
- Reduces the memory footprint by disabling precomputation of indices and weights when they're not used
- Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8)
- Adds S8(Int8) Bilinear Scale for Border mode REPLICATE
- Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation
- Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset
Resolves: COMPMID-5453, COMPMID-5454
Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
OpenCL implementation uses built in erf.
NEON implementation requires new vectorized erf.
Uses the following approximation:
erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4
a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108
From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations
Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* scons arch=armv8.6-a translates to -march=armv8.6-a
* scons arch=armv8.6-a-sve translates to -march=armv8.6-a+sve
* scons arch=armv8.6-a-sve2 translates to -march=armv8.6-a+sve2
* Resolves COMPMID-5408
Change-Id: I0901e1de864d00109759509af7cc2b5c9ae1cd75
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7943
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* bf16 instrinsics should be used when __ARM_FEATURE_SVE_BF16 is present
* Fixed NDK14 compiler warning declaring copy ctor for Window explicitly
* Resolves MLCE-867
Change-Id: I84ac5f213d9700e2fda7da55d83bba7cf79ad52c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7728
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
A smaller core library is created using a subset of the operators.
Changed the structure of filelist.json in order to include more
information about the kernels and make the selection easier.
Resolves: COMPMID-4514
Change-Id: I079ca7d8e64346174eebdd13b834e1dd4dc36ca2
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5786
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Changes our build system to allow building both Neon(TM) and SVE
kernels and package them in the same binary. This will allow
runtime selection of the underlying architecture.
Adds new build option, fat_binary, for enabling this feature.
Change-Id: I8e8386149773ce28e071a2fb7ddd8c8ae0f28a4a
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5704
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-4299
Change-Id: Ie6a52c1371b9a2a7b5bb4f019ecd5e70a2008567
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5338
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Full trademarks available in README.md
Resolves: COMPMID-4257
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ibfba2adf2eef3449433f467464ebd87d7198474d
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5116
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Implements COMPMID-3875
Change-Id: I38991eed3f4966db125862af066bfedff5994a25
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4854
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
SVE kernels are added to all previously supported arithmetic
and comparison operations with exception of S16 arithmetic
operations due to complexity of widening and narrowing of
integer vectors.
Partially implements: COMPMID-3872
Change-Id: Ic433eb7227dfcfd0d1429f18acebec2d934ca8bd
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4778
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I4ec7561a7f6a42a22b8187968ae302dbe75023bc
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4753
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Few bit-width dependent intrinsics are added.
- Few math functions are added.
Partially implements: COMPMID-3872
Change-Id: Ia6ab46bd170fec9c7c8d4410b7ef4d84710b68ed
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4718
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ib1ecd7aa10fec0b7e2b3d929e212c1af34c0f58d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4533
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- LEAKY RELU activation is supported for QASYMM8 data type
- vquantize on NEON side has been modified to match with
other backends (OpenCL and reference)
Change-Id: I194631225c8d4f3cc96027d64812ec2be2b4328a
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4593
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- For AArch64, NEActivationLayerKernel uses vsqrt rather than
vinvsqrt.
- For non-AArch64, it masks values to ensure zero input
results in zero output without producing NaN.
- Test cases for FP16 and FP32's positive boundary values
are added.
Change-Id: Ic0104ee5d7045059c2e9bd052616a4a3b43a315d
Signed-off-by: Sang-Hoon Park <sang-hoon.park@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4150
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I524b0c4b49c7a7035b7d078b9585d77b0d438e10
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4083
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|