Age | Commit message (Collapse) | Author |
|
Towards: COMPMID-6625, COMPMID-6627
Change-Id: I360dfdc48b429647e4e19d6216de310130d563d0
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11041
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Adnan AlSinan <adnan.alsinan@arm.com>
|
|
* Partially resolves MLCE-1191
Change-Id: I083c9c81dbb391dfba49518843fec2e8febae726
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11043
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
The code in convolver.hpp generates pointers into either the
appropriate point in the input activation tensor or the padding buffer
for each kernel point of each output point of the convolution. This is
done at runtime interspersed with the data transform and matrix
multiplication steps. As such, it can have a significant impact on
performance, particularly for low input channel counts.
This change improves the performance of this code by streamlining the
checks for out of range input points (which must be directed to the
padding buffer). The previous implementation checked all four borders
for every point. The revised code does the checks one at a time, and
for any failing check applies the result to as many output points as
possible without repeating the other checks.
Signed-off-by: David Mansell <David.Mansell@arm.com>
Change-Id: I36a4fa114b425c1bcba2be40acf36718522519f5
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11004
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
|
|
BF16 kernels are not expected to have the same tolerance/accuracy standards as full float kernels. The reference implementation is a standard floating point implementation, thus resulting in small mismatches.
We increase the tolerance of the MatMul BF16 tests, and add more tests to cover more shapes. Previously, the only tested bf16 kernel was a64_hybrid_fp32bf16fp32_mmla_4x24. With the inclusion of new shapes, heuristics also choose a64_hybrid_fp32bf16fp32_mmla_6x16 and stress this kernel as well, covering every implementation.
Resolves: COMPMID-6654
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I15342606912013c123b94c7e0ea2e6bbb25680d7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11014
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Debug mode built using -O3, this patch moves -O3 to a Release only
variable so that Debug builds with just -O0.
Change-Id: I1acca68514cd3682ccf98f9a1f39904f79933a2f
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10965
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6746
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Change-Id: I96c158820469af3e54dca0c5909c888106eb1940
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11005
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6753
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Change-Id: I80df0479eb4c7cc2c5380df708844cc9ffdd2aed
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11001
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* The tensor info objects created by calling create_tensor_info
is now solely owned by the context object. The user only receives
pointers to those objects.
- Internally pointers to tensor info objects are used in various
places. It's safer for dynamic fusion to manage these objects
directly rather than relying on the users.
- The validation test is updated to use the modified API.
* Make various changes in dynamic fusion API to make it more
friendly (e.g. making some of the objects moveable).
Partially resolves: COMPMID-6707
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ifee70e53c05f8e7b72bf9ef123701ff291c5ee80
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10990
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
CKW support is still experimental.
Change-Id: Iad98652b4b24acd918772edabc1b6bae4eeb259f
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10996
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Change-Id: I4781da2121d515a1e7ea7863ac1483caa8f94c39
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10989
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Anitha Raj <Anitha.Raj@arm.com>
|
|
* Documented the use of the compiler directive .inst
* Updated the multi_isa section
* Resolves MLCE-1156
Change-Id: I6a04ac66bc244c3adc010e1f8545f2045992d6db
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10981
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* CONVERT_TO_TENSOR4D_STRUCT_NO_STEP is implemented and used
in some CL kernels in the way that causes divide-by-zero issue.
- Since the steps are all zeros, the issue might have been
ignored by the compiler.
Resolves: COMPMID-6795
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I0fb38fc62d63671b8abefa39b3d9b3ca6f49c7fe
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10967
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: [COMPMID-6799]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I47baeeea75f1d03609d1fa1e9a10d2f53d5694f7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10969
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Locks pointer before checking for validity to prevent race condition
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I6872b10d058ee7f3707ba641f44bb6116e26880a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10960
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The reorders supported at the moment are:
ab->BA4b4a
ab->BA8b4a
Co-Authored-By: David Mansell <David.Mansell@arm.com>
Change-Id: Ic466465629ce3bcdcee0089e251485b79b60e1f3
Signed-off-by: Renato Arantes <renato.arantes@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10775
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Remove a std::move flagged by -Wpessimizing-move
Resolves: COMPMID-6777
Change-Id: Ie082dc2eab0cb11e9a29f6f6fc98866306fd2cfa
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10957
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Suppress a false positive compiler warning caused by a bug in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104165
This issue is known to be reproducible in some versions of GCC 11, 12 and 13.
Remove a redundant std::move flagged by -Werror=redundant-move
Resolves: COMPMID-6777
Change-Id: I782e87b5e3df4c09195e67a37f49d122dc918224
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10950
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Comments-Addressed: <felixjohnny.thomasmathibalan@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Enables FP16 lut for logistic activation
- Adds LUTManager to re-use lut where appropriate.
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- For quantized RELU activation, de-quantization and re-quantization
is not required since comparison against the quantization bias is
only required.
Resolves: COMPMID-6340
Change-Id: I574bd220f3d0d893b7f7c4819a883e2a131f61f4
Signed-off-by: Sangwon Ha <sangwon.ha@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10916
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch calculates the output quantization info based on the inputs'
quantization information.
The previous approach was using the same quantization information for
input, weights and output.
Remove QSYMM8_PER_CHANNEL path from the fixture as there are no
related tests
Remove repeated shapes from the dataset now that we get rid of the
quantization info from the dataset.
Combine signed and unsigned SmallGEMMLowpFusedBatchedMatMulDataset
into one as they become identical
Resolves COMPMID-6481, COMPMID-6634
Change-Id: I9f5a20f4bb45c3e5adab388564135ae8a5c0a9ea
Signed-off-by: SiCong Li <sicong.li@arm.com>
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10680
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The issue appears when this kernel is used by convolution operators because the stride calculations consider only simple matrix multiplication.
In conv2d triggered runs, Rhs does not have the same dimension as Lhs and Dst. Also, cases where Lhs and Dst are interpreted as 3d, where their X and Y dimensions (in convolution sense) are collapsed into one.
Resolves: COMPMID-6764
Change-Id: If443e6eb8f7a5cca1acc58b37c598122a013e69b
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10913
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues.
Resolves: COMPMID-6766
Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The workaround is not relevant anymore as we update our memory debugging tools.
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Ib00e0ad9ba693f97fee87158dd03d3617dce9282
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10908
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
While writing this gemm kernel, code pieces, including validations were adapted from ClGemmMatrixMultiplyReshapedOnlyRhsKernel, and this validation should be about reinterpret_input_as_3d. This reveals a test gap for this kernel. There are currently no tests stressing this condition; but this is not going to be addressed as part of the bug ticket.
The corresponding snippet in ClGemmMatrixMultiplyReshapedOnlyRhsKernel is
if (gemm_info.reinterpret_input_as_3d)
{
ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) * src0->dimension(2) != m);
}
else
{
ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) != m);
}
Resolves: COMPMID-6757
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I4363effcaf2b43ff3674a3443058384338fb9714
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10891
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
This reverts commit 270576a9fbeeda5210483931388e62f9a1059dd9.
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Ia4e965156af46a9afd78819e90fd2a033a97fc2b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10888
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
This fix modifies some of the conversions done in the generate proposals kernel that causes DDK issues while compiling the kernel.
The issues are mostly related to conversion from i64 to fp16, and it doesn't affect fp32. Firstly, type identifier size_t is converted into unsigned int. But, this alone was compiling but causing mismatches, even in older devices, where it was passing before. Therefore, the fp16 conversion delayed until vector construction where the integers are now converted to fp32, and then fp16. This, although may not be ideal, seems like the best solution.
Resolves: COMPMID-6756
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Iee61216c908fe51431985b80c3653fc32add4741
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10879
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I18143be45eff2d5bbca9cc86e8c18f9f6bc3f119
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10883
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Icee8b38db1f219d66ac22a6e0980f4325fd21fbd
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10868
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
While writing this gemm kernel, code pieces, including validations were adapted from ClGemmMatrixMultiplyReshapedOnlyRhsKernel, and this validation should be about reinterpret_input_as_3d, which is not dealt with in this kernel. The mmul kernel only deals with reinterpret_output_as_3d, which is equivalent to depth_output_gemm3d != 0. This reveals a test gap for this kernel. There are currently no tests stressing this condition; but this is not going to be addressed as part of the bug ticket.
The corresponding snippet in ClGemmMatrixMultiplyReshapedOnlyRhsKernel is
if (gemm_info.reinterpret_input_as_3d)
{
ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) * src0->dimension(2) != m);
}
else
{
ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) != m);
}
Resolves: COMPMID-6757
Change-Id: I73b203594b22098a5374c1fac6969ee769969901
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10874
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Changes in filelist.json: moved fp16 code from common to fp16
* Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
with ENABLE_FP16_KERNELS.
* Resolves COMPMID-6755
Change-Id: I4da1c53d3f9e4734e5e67125265ab4e3fc0dcbe4
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10865
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Both macros ARM_COMPUTE_ENABLE_FP16 and ENABLE_FP16_KERNELS
must be declared to enable FP16
* The failure was caused by not compiling the validation suite with the same
definitions used to compile the library. ARM_COMPUTE_ENABLE_FP16 was missing
and the call from the test into error_on_unsupported_cpu_fp16() failed.
* Resolves COMPMID-6727
Change-Id: I278c813aef799d9d0e21e5323b2b8e9e45252d6c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10848
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The graph example has fixed quantization information given for certain layers, and some of the offsets exceed the 8-bit range for Int8 data type.
This shouldn't have been the case and the offsets should respect the 8-bit quantization specification laid out here: https://www.tensorflow.org/lite/performance/quantization_spec
However, the mechanism added in the helper function introduces robustness in case of such irregularities with little/no cost; and therefore added as a fix.
Resolves: COMPMID-6748
Change-Id: If39bf323382f109fa100ee2b87ce63cc7bc89759
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10858
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6728
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ic0682550a09db9aa420057a90ee65386e16e6034
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10853
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The function pointer for clImportMemoryARM should be loaded in a portable way as recommended by Khronos® as outlined here:
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#getting-opencl-api-extension-function-pointers
using clGetExtensionFunctionAddressForPlatform() call.
All extensions should ideally be loaded using the above mentioned function.
Resolves: COMPMID-6732
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I482b6bde721267d5e8c08301e5780d28a9c5ba85
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10852
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6736
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ib887c56afcf481366f1fa9c9f456a43c27269e52
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10844
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-6622
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab.
Resolves COMPMID-6735
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e.
softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) )
This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor.
It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs.
It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future.
Resolves: COMPMID-6500
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves COMPMID-6733
Change-Id: I7f0428719b5c0aa79a0b356c50bb801db16558e8
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10792
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <felixjohnny.thomasmathibalan@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6734
Change-Id: I0f0a7a312504b7cca4ed36263843ccd1a190c09e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10803
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Nikhil Raj Arm <nikhil.raj@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* This is the initial patch to start working on enabling fp16 in all
multi_isa builds. More changes are required in the way we register
the kernels using the macro REGISTER_FP16_NEON.
* In this patch we add the capability to build the fp16 files in listed in
filelist.json with the correct arch option to enable FP16
* This patch is required towards building an universal multi_isa binary
where fp16 is enable.
* Enable REGISTER_FP16_NEON macro for all builds by removing
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC guard from the macro definition.
The macro has to be used across all types of builds.
Change-Id: I99f4c273f6ee04cad3c097e5e374200f48568fa9
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10682
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Moved NCHW kernels fp16 and fp32 to their corresponding files
src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and
src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp
* Changes in filelist.json to include the new fp16 and fp32 files
* Moved the template batch_normalization_nchw to impl.h as we
need to instantiate it from fp16.cpp and fp32.cpp
* Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that
prevented the FP16 kernel execution.
* Partially resolves MLCE-1102
Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Change-Id: I654f53e5b4e53abc69ce385f6c706293bf8f7198
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10784
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Moved fp16 and fp32 to their corresponding files
src/cpu/kernels/mul/generic/neon/fp16.cpp and
src/cpu/kernels/mul/generic/neon/fp32.cpp
* Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels
* Partially resolves MLCE-1102
Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
Resolves ONCPUML-1331
This patch adds an option to make _custom_scheduler thread_local to
support usage of multiple schedulers handled outside of ACL.
It also adds num_threads() function to Scheduler which reverts to
querying CPUInfo if no scheduler has been set.
Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912
Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Stop building and linking to the legacy libarm_compute_core artifact.
This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users should link only to the main libarm_compute library for core functionality.
Resolves: COMPMID-6329
Change-Id: Ife9d2c25d275e7c676deb09632ae461f697efde9
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10728
Reviewed-by: Anitha Raj <Anitha.Raj@arm.com>
Reviewed-by: Sang Won Ha <sangwon.ha@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Moved the template arm_compute::normalize_float to impl.h because
we need to instantiate it from both NENormalizationLayerKernel.cpp
and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp
* Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels
* Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in
NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that
the fp16 kernels can be compiled in for multi_isa builds
* Moved fp32 kernels to the corresponding file
src/cpu/kernels/norm_layer/generic/neon/fp32.cpp
* Partially resolves MLCE-1102
Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6677
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I99bf2385f6edc0836faacb31f5c66ed4fb051e40
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10729
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Performing the second pass in reverse order doesn't seem to work
reliably in some specific devices. This patch introduces
another approach to workaround the device issue.
Resolves: COMPMID-6669
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I591f05ff06f8439ebe4d32093441ae871a292f4c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10730
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolved COMPMID-6367
Signed-off-by: Anitha Raj <anitha.raj@arm.com>
Change-Id: I96f244811a81a4e278f0c5e47d5014229cad3a25
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10727
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|