Age | Commit message (Collapse) | Author |
|
- That would force CpuConv2d::get_convolution_method()
choose GEMM_CONV2D or GEMM methods instead.
Resolves: COMPMID-5531
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I4dec8772e8c150da003d9a89c1d036057c4d28b0
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8233
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
The optimization concerns the case where the depth multiplier is > 1.
The depth multiplier for loop has been removed from the OpenCL kernel
and the GWS has been mapped to the output shape. In this way, we can
still perform a tile with N0 columns and improve the performance of
depthwise conv over 80% when depth multiplier is > 1.
Resolves COMPMID-5568
Change-Id: I604e287d4eeb31c54b9cc6c3072a698cd0e3e136
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8184
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Ran into issues with f16 meanstddevnorm. Essentially, with large enough tensors and/or large values in tensors, output becomes all 0.
This is due to the variance computation. In f16, it reaches infinity quite easily, then the division results in 0.
This change modifies the OpenCL and NEON implementations to compute the sum of squares and the variance using f32, while other operations remain f16.
Update: Found that the square operation also benefits from f32, rather than squaring in f16 and accumulating f32.
Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: Ide00afd84ec6d26fec4d53b073e295814f08ba46
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7959
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
From an internal performance evaluation, it seems that Winograd-based
Conv2D offers better performance than alternative methods such as direct
convolution and gemm-based conv already from IFM=8. Before the condition
was for IFM>=16
Resolves COMPMID-5532
Change-Id: I9ff04835d6fd07f5f0abeec9645c9d9cc913b6b7
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8147
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
repeatedly when mapping multiple children of the same parent.
Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: I7ff554915a37320dfd94f15a6fb01a72a235cf39
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7920
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Previously, the add_as_1d_array kernels were used on tensors with
non-matching strides which caused the wrong elements to be added. The
fix is to check that the strides are equal when selecting the addition
kernel.
Change-Id: I914ca2b95e5b8ed1875ec5ebe129bdfe2845496b
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8120
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves MLCE-799
Change-Id: I3d5b2afbf65d159aa5e645743c1e139110dfc20e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8093
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves MLCE-903
Change-Id: I39fb3b4b395a37c0f32481830f94d85ec15e205f
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8124
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Removes FP Bilinear SVE kernels and uses Neon™ kernels instead
Resolves: COMPMID-5449
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8e01de44bd884cb6578ca0b9358509b69bc31ca2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8100
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Reverting commit e54d8c07e75d70baeb80fecbb43088027ea45658
because it has caused unexpected regressions.
Resolves: COMPMID-5504
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I2a0bcc6a311009a81f20a146079758ad138fff5b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8092
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Move LUT implementation to a seperate file. It will be used
for both QASYMM8 and QASYMM8_SIGNED.
* Fix wrong constant value related to QASYMM8_SIGNED leaky ReLU
in 32-bit build.
Resolves: COMPMID-5464
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I2b24d52409a38f1b66fd532f431eff8a9e4547b6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8066
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Call gemm-based convolution when the kernel is large and the stride is
unit
Resolves: COMPMID-5504
Change-Id: I8dd83bd012000ed76824b96a8e37c98c861c59e4
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8081
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Affect CL backend.
- For FP32 datatype, affect different platforms.
Resolves COMPMID-5479
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I705d718bc9b7def218034958f7ef86f2c2abe06d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8064
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
The unsafe FP optimizations flag causes accuracy issues when certain conditions are met regarding the hardware type, data type and the activation function.
Resolves: COMPMID-5375
Change-Id: I1b0b06549b8c108617962d006a20dd263d5e3c21
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8061
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The issue is caused by GPUTarget not being set explicitly
for the Depthwise convolution kernel, but it's being used
in its build configuration. This causes the default value
to be used and enables some unsafe FP optimizations.
Resolves: COMPMID-5490
Change-Id: I5300a1168962cacb62cf49db795f052cf6740c7e
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8059
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* For 3x3 kernel, only choose the implementation with larger tile
size if the input tensor is larger than the tile.
Resolves: COMPMID-5467
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I2cf95ddb25f477cb05da3b3501e0afe9548fc33a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8022
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5420
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I24dca916f49f82e7e5ec809500ae5fe32c8adc97
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8020
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Minor tweaks and test for running fixed format kernels with BF16
operations when specified by the user.
Change-Id: Ic8167f67b86b1298da65e46cfebed9f3b86940e4
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8000
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Allow fusing arbitrary number of existing elementwise operators
- Fix issues with 3D and 4D tensors in Elementwise Addition and Floor components
- Collapse the 3D/4D window in the same way as that used by Conv2d,
i.e. collapse dim 1 and dim 2 together
- Fix Floor component issues when used after other components
- Add Dynamic Fusion Tests (Floor + Div, Conv2d + Add + Div)
- Add Addition ElementWise Broadcasting Test
Resolves: [COMPMID-5356]
Change-Id: I58b93a90175bb0440d43531d18cac94b5f5c2689
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/433956
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7957
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implements required plumbing in order to be able to ask and execute
fixed format kernels from NEFullyConnected, NEGEMM and NEGEMMConv2d.
These APIs are used to accelerate oneDNN primitives (inner product, matrix
multiplication and indirect GEMM respectively) and without changes it
would not be possible to call fixed format kernels from those oneDNN
primitives.
Change-Id: I27534f0491ce28d0ccb98c19f318bd33dcdf2ff5
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7999
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5405
Change-Id: I995b5e79bef13529097ed17f7854763a4cf89272
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7986
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5108
Change-Id: I544f8160fbe5b4ffbef348d1fbd3dd626a6e1bdb
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8002
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
If number of work items is greater than number of available threads then
OpenMP scheduler will only execute as many work items as there are
threads. This fix makes sure that we iterate through all work items and
execute all of them.
Change-Id: I3ad4b732c01fadc70dacaf09af3007d2b31086c7
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8001
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
- related to 91780021e2 Fix for inclusion of "arm_gemm"
- arm_compute::WeightFormat was passed to a function
expecting arm_gemm::WeightFormat
Resolves: COMPMID-5415
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib53fad2eab0148f466a5e2f11b931754569da3d6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7989
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat
when needed through two map function.
- Moved to_string(WeightFormat) to TypePrinter.h
Resolves: COMPMID-5415
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Sicong Li <sicong.li@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* scons arch=armv8.6-a translates to -march=armv8.6-a
* scons arch=armv8.6-a-sve translates to -march=armv8.6-a+sve
* scons arch=armv8.6-a-sve2 translates to -march=armv8.6-a+sve2
* Resolves COMPMID-5408
Change-Id: I0901e1de864d00109759509af7cc2b5c9ae1cd75
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7943
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615
Resolves: COMPMID-5398
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5298
Change-Id: I4a7d788bc1f5f568bedcc22e7aca47ede6de71bf
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7891
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Affects OpenCL backend.
- Resolves COMPMID-5416
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I8953f9ac5c1ec9edf99399a651a544df4276ccf1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7951
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
API changes for NEGEMMConvolutionLayer and CpuGemmConv2d
Built with:
scons neon=1 opencl=0 os=linux arch=armv8.2-a multi_isa=1 \
build=native -j32 Werror=false validation_tests=1 build_dir=opt \
standalone=1 asserts=1 experimental_fixed_format_kernels=1 .
Tested with:
./build/opt/tests/arm_compute_validation
Hardware where the test executable was run:
Neoverse N1
Test coverage:
* NEGEMMConvolutionLayer, CpuGemmConv2d
* NHWC (the only one supported by the fixed-format kernels)
* F16, F32
* Shapes: RunSmall
Change-Id: I4fd3e495a7cbf61210ea02d37440ba9652934e99
Signed-off-by: Francesco Petrogalli <francesco.petrogalli@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7632
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5401
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I432b304483236efd392dfc47d541e6759c135104
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7934
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Resolves: COMPMID-5400
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib4428436dd7a6e40d8b2d8a2f8dac1b079154551
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7894
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615
Resolves: COMPMID-5216
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves MLCE-739
Change-Id: Ice06a96d6a8a26b31e334ba4e697cd41d352b026
Signed-off-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7364
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5298
Change-Id: Ie9b907e5dcf86aa6add8d08799fa7ba7c264edea
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7888
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
incorrect cl cache behaviour
Resolves: COMPMID-5286
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Change-Id: I1aa82825ef7d31d7830a00282d1a3523ccf1d746
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7883
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Relates to: COMPMID-5299
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I19f820f698cf11020da019f4a1334cccb1e40c7e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7880
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Extensive use of templates resulted in a compiler crash
on NDK 23 and 24. This rework solves the issue and also
reduces the library size by 101Kb.
Resolves: COMPMID-5384
Change-Id: I9c5c68c5e36f236b0891e44d25478743417fb16d
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7871
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Partially resolves COMPMID-5396
Change-Id: I7f3e692e73da548a901a07ed6d7212df40b8c26f
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7870
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5350
Change-Id: I2f5021ebe9fe4c71371f2cd6cb65cc9c0b2110b5
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7861
Reviewed-by: Francesco Petrogalli <francesco.petrogalli@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add LUT generation function for Leaky ReLU.
* Some additional changes in the existing LUT implementation:
+ Bring back the NEON implementation of hard swish for 32-bit
build. Library size of 64-bit build is not affected.
+ Add some extra #ifdef to remove unnecessary code in 32-bit
build.
Resolves: COMPMID-5386
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I1ea49611cc922765ee741e31138c888401d33e9b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7845
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- num_tiles_x was not initialized when we were passing the argument at
compilation time
Resolves COMPMID-5394
Change-Id: I6004a2d47656edda3c832f980c78622045de7068
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7857
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5355
Change-Id: I92f73fbe885f28bbe7b07965b90cfd807c93602f
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7745
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
|
|
* Changed window_step from 16 to tensor_shape().x() when calling
into the assembly byte substitution code.
* Resolve COMPMID-5211
Change-Id: I5c1f5273455999bb35f94c76a8afb4290e728858
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7843
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
* Resolves COMPMID-5211
Change-Id: I560ab2992c6089774c7ebee3538847905521607d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7840
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Based on ArmNN work: IVGCVSW-6960
Device not found running MLTS with Support Library
Change-Id: I8cb8acc30c8a4afa60bc1cf80eb5b6b2c43dfdc1
Signed-off-by: Ohad Almagor <ohada@google.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7836
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fix integer overflow in substraction step.
* Fix incorrect vector when convert the result to qasymm8_signed.
Resolves: COMPMID-5389
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Id745f2eb2a1b0823b02b136560351b5f8fb85624
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7738
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* bf16 instrinsics should be used when __ARM_FEATURE_SVE_BF16 is present
* Fixed NDK14 compiler warning declaring copy ctor for Window explicitly
* Resolves MLCE-867
Change-Id: I84ac5f213d9700e2fda7da55d83bba7cf79ad52c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7728
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add OpenCL version 3 detection.
* Use -cl-std=CL3.0 build option to support non-uniform workgroup
when OpenCL 3 is detected and the feature is supported.
Resolves: COMPMID-5208
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ifd8cbae6b34228c07e761bcb94ee8f35bdf1bace
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7655
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The regression was caused by NUM_TILES_X passed at runtime.
Resolves COMPMID-5327
Change-Id: Id6ccd93784eda93af09f420c0d786050e2bbccd7
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7727
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|