ComputeLibrary.git -

Age	Commit message (Collapse)	Author
2024-01-18	Update Documentation for 24.01 releasev24.01 branches/arm_compute_24_01	Felix Thomasmathibalan
	Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: I4781da2121d515a1e7ea7863ac1483caa8f94c39 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10989 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Anitha Raj <Anitha.Raj@arm.com>
2024-01-18	Fix minor issue, clean lut code	Mohammed Suhail Munshi
	Resolves: [COMPMID-6799] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I47baeeea75f1d03609d1fa1e9a10d2f53d5694f7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10969 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-01-12	Fix potential threading issue in LUTManager	Mohammed Suhail Munshi
	- Locks pointer before checking for validity to prevent race condition Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I6872b10d058ee7f3707ba641f44bb6116e26880a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10960 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-12	[ONCPUML-1387] Add ACL based reorder for f32 to bf16 data type conversion.	Renato Arantes
	The reorders supported at the moment are: ab->BA4b4a ab->BA8b4a Co-Authored-By: David Mansell <David.Mansell@arm.com> Change-Id: Ic466465629ce3bcdcee0089e251485b79b60e1f3 Signed-off-by: Renato Arantes <renato.arantes@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10775 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-11	Fix test compilation error on GCC 13.2	Jakub Sujak
	Remove a std::move flagged by -Wpessimizing-move Resolves: COMPMID-6777 Change-Id: Ie082dc2eab0cb11e9a29f6f6fc98866306fd2cfa Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10957 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-01-10	Fix compilation error on GCC 13.2	Jakub Sujak
	Suppress a false positive compiler warning caused by a bug in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104165 This issue is known to be reproducible in some versions of GCC 11, 12 and 13. Remove a redundant std::move flagged by -Werror=redundant-move Resolves: COMPMID-6777 Change-Id: I782e87b5e3df4c09195e67a37f49d122dc918224 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10950 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-01-10	Use look up table for fp16 activation	Mohammed Suhail Munshi
	- Enables FP16 lut for logistic activation - Adds LUTManager to re-use lut where appropriate. Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I94667b63b452a8e58a1eb59cb0b5866178954523 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10864 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-04	Prevent RELU from being processed thru LUT in INT8	Sangwon Ha
	- For quantized RELU activation, de-quantization and re-quantization is not required since comparison against the quantization bias is only required. Resolves: COMPMID-6340 Change-Id: I574bd220f3d0d893b7f7c4819a883e2a131f61f4 Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10916 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-01-04	Implement dynamic quantization for GEMMLowp tests	SiCong Li
	This patch calculates the output quantization info based on the inputs' quantization information. The previous approach was using the same quantization information for input, weights and output. Remove QSYMM8_PER_CHANNEL path from the fixture as there are no related tests Remove repeated shapes from the dataset now that we get rid of the quantization info from the dataset. Combine signed and unsigned SmallGEMMLowpFusedBatchedMatMulDataset into one as they become identical Resolves COMPMID-6481, COMPMID-6634 Change-Id: I9f5a20f4bb45c3e5adab388564135ae8a5c0a9ea Signed-off-by: SiCong Li <sicong.li@arm.com> Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10680 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-22	Fix nightly issue caused by gemm_reshaped_only_rhs_mmul kernel	Gunes Bayir
	The issue appears when this kernel is used by convolution operators because the stride calculations consider only simple matrix multiplication. In conv2d triggered runs, Rhs does not have the same dimension as Lhs and Dst. Also, cases where Lhs and Dst are interpreted as 3d, where their X and Y dimensions (in convolution sense) are collapsed into one. Resolves: COMPMID-6764 Change-Id: If443e6eb8f7a5cca1acc58b37c598122a013e69b Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10913 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-22	Add Mali™-G720 and Mali™-G620 as GpuTargets	Gunes Bayir
	This patch adds adds the latest Gpus as Gpu Target and sets up kernel selection heuristics for MatMul to address some nightly issues. Resolves: COMPMID-6766 Change-Id: I29dbb08c5ecfb3fcd63230b0b1675ab557074aca Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10902 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-22	Call std::round() directly in non Android™ and Bare Metal builds	Gunes Bayir
	The workaround is not relevant anymore as we update our memory debugging tools. Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Ib00e0ad9ba693f97fee87158dd03d3617dce9282 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10908 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-15	Fix nightly bug caused by not validation 3d cases for input tensor	Gunes Bayir
	While writing this gemm kernel, code pieces, including validations were adapted from ClGemmMatrixMultiplyReshapedOnlyRhsKernel, and this validation should be about reinterpret_input_as_3d. This reveals a test gap for this kernel. There are currently no tests stressing this condition; but this is not going to be addressed as part of the bug ticket. The corresponding snippet in ClGemmMatrixMultiplyReshapedOnlyRhsKernel is if (gemm_info.reinterpret_input_as_3d) { ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) * src0->dimension(2) != m); } else { ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) != m); } Resolves: COMPMID-6757 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I4363effcaf2b43ff3674a3443058384338fb9714 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10891 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2023-12-15	Revert "Fix nightly bug caused by wrong validation in Gemm mmul kernel"	Gunes Bayir
	This reverts commit 270576a9fbeeda5210483931388e62f9a1059dd9. Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Ia4e965156af46a9afd78819e90fd2a033a97fc2b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10888 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-14	Fix validation error in CL generate proposals kernel	Gunes Bayir
	This fix modifies some of the conversions done in the generate proposals kernel that causes DDK issues while compiling the kernel. The issues are mostly related to conversion from i64 to fp16, and it doesn't affect fp32. Firstly, type identifier size_t is converted into unsigned int. But, this alone was compiling but causing mismatches, even in older devices, where it was passing before. Therefore, the fp16 conversion delayed until vector construction where the integers are now converted to fp32, and then fp16. This, although may not be ideal, seems like the best solution. Resolves: COMPMID-6756 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Iee61216c908fe51431985b80c3653fc32add4741 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10879 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-14	Update scripts requirements	Pablo Marquez Tello
	Change-Id: I18143be45eff2d5bbca9cc86e8c18f9f6bc3f119 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10883 Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-14	Fix Run Example in Validate Tests	Mohammed Suhail Munshi
	Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Icee8b38db1f219d66ac22a6e0980f4325fd21fbd Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10868 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-13	Fix nightly bug caused by wrong validation in Gemm mmul kernel	Gunes Bayir
	While writing this gemm kernel, code pieces, including validations were adapted from ClGemmMatrixMultiplyReshapedOnlyRhsKernel, and this validation should be about reinterpret_input_as_3d, which is not dealt with in this kernel. The mmul kernel only deals with reinterpret_output_as_3d, which is equivalent to depth_output_gemm3d != 0. This reveals a test gap for this kernel. There are currently no tests stressing this condition; but this is not going to be addressed as part of the bug ticket. The corresponding snippet in ClGemmMatrixMultiplyReshapedOnlyRhsKernel is if (gemm_info.reinterpret_input_as_3d) { ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) * src0->dimension(2) != m); } else { ARM_COMPUTE_RETURN_ERROR_ON(src0->dimension(1) != m); } Resolves: COMPMID-6757 Change-Id: I73b203594b22098a5374c1fac6969ee769969901 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10874 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-12	Winograd changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Changes in filelist.json: moved fp16 code from common to fp16 * Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC with ENABLE_FP16_KERNELS. * Resolves COMPMID-6755 Change-Id: I4da1c53d3f9e4734e5e67125265ab4e3fc0dcbe4 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10865 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-11	Fix nightly test failure	Pablo Marquez Tello
	* Both macros ARM_COMPUTE_ENABLE_FP16 and ENABLE_FP16_KERNELS must be declared to enable FP16 * The failure was caused by not compiling the validation suite with the same definitions used to compile the library. ARM_COMPUTE_ENABLE_FP16 was missing and the call from the test into error_on_unsupported_cpu_fp16() failed. * Resolves COMPMID-6727 Change-Id: I278c813aef799d9d0e21e5323b2b8e9e45252d6c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10848 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-08	Fix validation error in graph_ssd_mobilenet	Gunes Bayir
	The graph example has fixed quantization information given for certain layers, and some of the offsets exceed the 8-bit range for Int8 data type. This shouldn't have been the case and the offsets should respect the 8-bit quantization specification laid out here: https://www.tensorflow.org/lite/performance/quantization_spec However, the mechanism added in the helper function introduces robustness in case of such irregularities with little/no cost; and therefore added as a fix. Resolves: COMPMID-6748 Change-Id: If39bf323382f109fa100ee2b87ce63cc7bc89759 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10858 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-12-08	Adjust NEReduceMean test tolerance	SiCong Li
	Resolves COMPMID-6728 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ic0682550a09db9aa420057a90ee65386e16e6034 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10853 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-08	Fix unit tests failing in CL/UNIT/TensorAllocator	Gunes Bayir
	The function pointer for clImportMemoryARM should be loaded in a portable way as recommended by Khronos® as outlined here: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#getting-opencl-api-extension-function-pointers using clGetExtensionFunctionAddressForPlatform() call. All extensions should ideally be loaded using the above mentioned function. Resolves: COMPMID-6732 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I482b6bde721267d5e8c08301e5780d28a9c5ba85 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10852 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-07	Use the correct output qinfo for ssd_mobilenet QASYMM8_SIGNED	SiCong Li
	Resolves COMPMID-6736 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib887c56afcf481366f1fa9c9f456a43c27269e52 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10844 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-07	Optimize CPU depth-to-space	Viet-Hoa Do
	Resolves: COMPMID-6622 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Ibac276618bdda125dcbb9c851c547f12739b15b4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10749 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-06	Revert "thread_local _custom_scheduler"	Pablo Marquez Tello
	This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab. Resolves COMPMID-6735 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-05	Optimize CpuSoftmaxKernel for axis=0	Gunes Bayir
	Implement a single kernel instead of having two consecutive ones. In the previous setup, one kernel was calculating the maximum value in the axis, and this maximum was being subtracted from each data while calculating the softmax, i.e. softmax(x_i) = exp(x_i - max) / sum_i( exp(x_i - max) ) This patch integrates these two stages into a single kernel for Neon™ for all data types. This will save some memory because we don't need to hold the max values in a separate auxiliary tensor. It also introduces some other optimizations that will ease memory pressure when the data type is float/half, by using the dst tensor as temporary storage for already exponentiated inputs. It removes the references to SVE and SVE2 implementations, and most of the associated files; but, it leaves the implementations as these may be used in the future. Resolves: COMPMID-6500 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: Icff9976d1214c4c6cbe15a62ca60b8a77d3784cc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10688 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-12-04	Fix bare metal build	Pablo Marquez Tello
	* Resolves COMPMID-6733 Change-Id: I7f0428719b5c0aa79a0b356c50bb801db16558e8 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10792 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-30	Fix driver build error	Pablo Marquez Tello
	Resolves COMPMID-6734 Change-Id: I0f0a7a312504b7cca4ed36263843ccd1a190c09e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10803 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Nikhil Raj Arm <nikhil.raj@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-28	Changes to enable FP16 in armv8a multi_isa	Pablo Marquez Tello
	* This is the initial patch to start working on enabling fp16 in all multi_isa builds. More changes are required in the way we register the kernels using the macro REGISTER_FP16_NEON. * In this patch we add the capability to build the fp16 files in listed in filelist.json with the correct arch option to enable FP16 * This patch is required towards building an universal multi_isa binary where fp16 is enable. * Enable REGISTER_FP16_NEON macro for all builds by removing __ARM_FEATURE_FP16_VECTOR_ARITHMETIC guard from the macro definition. The macro has to be used across all types of builds. Change-Id: I99f4c273f6ee04cad3c097e5e374200f48568fa9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10682 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27	BatchNorm changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Moved NCHW kernels fp16 and fp32 to their corresponding files src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp16.cpp and src/cpu/kernels/fuse_batch_normalization/nchw/neon/fp32.cpp * Changes in filelist.json to include the new fp16 and fp32 files * Moved the template batch_normalization_nchw to impl.h as we need to instantiate it from fp16.cpp and fp32.cpp * Pooling layer: removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC that prevented the FP16 kernel execution. * Partially resolves MLCE-1102 Change-Id: Ia8c85e9ffb76c9e387f9ae2685e5df5e52c8dc27 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10777 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-27	Check copyright for all files	Jakub Sujak
	Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Change-Id: I654f53e5b4e53abc69ce385f6c706293bf8f7198 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10784 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-27	CpuMul changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Moved fp16 and fp32 to their corresponding files src/cpu/kernels/mul/generic/neon/fp16.cpp and src/cpu/kernels/mul/generic/neon/fp32.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Partially resolves MLCE-1102 Change-Id: I88f24cf034c11b55ff84644b182ba76c7cb94296 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10778 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2023-11-24	thread_local _custom_scheduler	David Svantesson
	Resolves ONCPUML-1331 This patch adds an option to make _custom_scheduler thread_local to support usage of multiple schedulers handled outside of ACL. It also adds num_threads() function to Scheduler which reverts to querying CPUInfo if no scheduler has been set. Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-23	Remove the legacy core library	Jakub Sujak
	Stop building and linking to the legacy libarm_compute_core artifact. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. Users should link only to the main libarm_compute library for core functionality. Resolves: COMPMID-6329 Change-Id: Ife9d2c25d275e7c676deb09632ae461f697efde9 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10728 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-16	NormalizationLayer changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Moved the template arm_compute::normalize_float to impl.h because we need to instantiate it from both NENormalizationLayerKernel.cpp and src/cpu/kernels/norm_layer/generic/neon/fp16.cpp * Changes in filelist.json: added a new fp16.cpp file for the float16_t kernels * Replaced the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in NENormalizationLayerKernel by ARM_COMPUTE_ENABLE_FP16 so that the fp16 kernels can be compiled in for multi_isa builds * Moved fp32 kernels to the corresponding file src/cpu/kernels/norm_layer/generic/neon/fp32.cpp * Partially resolves MLCE-1102 Change-Id: I3f2eb2ed0b6c7f68092b17872b85082fbb5f39e2 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10739 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-15	Fix various coverity issues	SiCong Li
	Resolves COMPMID-6677 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I99bf2385f6edc0836faacb31f5c66ed4fb051e40 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10729 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-15	Fix device issue with CL softmax	Viet-Hoa Do
	* Performing the second pass in reverse order doesn't seem to work reliably in some specific devices. This patch introduces another approach to workaround the device issue. Resolves: COMPMID-6669 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I591f05ff06f8439ebe4d32093441ae871a292f4c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10730 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14	Update comments to suppress doxygen warnings.	Anitha Raj
	Resolved COMPMID-6367 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I96f244811a81a4e278f0c5e47d5014229cad3a25 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10727 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-14	Update Release notes for 23.11	Anitha Raj
	Resolves COMPMID-6369 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I997a48c4e8efb67fbe53efd6fc498df4e6b41eea Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10726 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-13	Update SONAME_VERSION in SConscript and VERSION in CMakeLists	Anitha Raj
	Resolves COMPMID-6369 Change-Id: I67dd589cdc02070dafe6f000988e6abafd6c5d79 Signed-off-by: Anitha Raj <anitha.raj@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10722 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-13	Update README for 23.11 release	Anitha Raj
	Resolves COMPMID-6369` Signed-off-by: Anitha Raj <anitha.raj@arm.com> Change-Id: I2a3dc604895321f71198ece1dfed3af3a25c0228 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10724 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-10	Fix CpuGemmConv2d int8 segfault	SiCong Li
	Bypass importation of memory of the original weights into the reinterpreted_weights auxiliary tensor if other weight transformation path is selected (which would've freed the original weights and its tensor info) Resolves COMPMID-6635 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: Ib8a345c3ac542bc3745d6a67db822b55df37e827 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10698 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-10	Update list of supported operators in documentation	Jakub Sujak
	Resolves: COMPMID-6633 Change-Id: I1e78df468876ec3569fa46597734e7de328b06f4 Signed-off-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10663 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-09	Remove duplicate definitions of BF16 fixed format kernels.	David Mansell
	Change-Id: Ie68b0a19040cc6b5bf47fca406989f39aa8d7b81 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10687 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2023-11-09	Pooling changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Changes in filelist.json moving fp16 file from common to fp16 attribute * Changes in kernel CpuPool2dAssemblyWrapperKernel, replaced __ARM_FEATURE_FP16_VECTOR_ARITHMETIC by ENABLE_FP16_KERNELS to make sure the fp16 kernels are compiled in for multi_isa=1 * Partially resolves MLCE-1102 Change-Id: I327154ec5b1ddfb9f54d9096f00c35b3e05c678a Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10662 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-11-09	DepthwiseConvolution changes to enable fp16 in armv8a multi_isa builds	Pablo Marquez Tello
	* Changes in filelist.json moving fp16 file from common to fp16 attribute * Removed the guard __ARM_FEATURE_FP16_VECTOR_ARITHMETIC in CpuDepthwiseConv2dAssemblyWrapperKernel to always create the assembly kernel * Partially resolves MLCE-1102 Change-Id: I2f88d5e54a94042cfb3cb4ea0386338a7c444866 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10626 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-08	Document how to build ACL with LLVM+Clang toolchain	Gunes Bayir
	Resolves: COMPMID-6471 Change-Id: I5add2af4292ff2eeafcde85f3bff8e98b2069b13 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10660 Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-08	Optimize CpuGemmConv2d start-up time	SiCong Li
	When weight has no holes, we can replace CpuWeightsReshapeKernel with: - Collapse by reinterpreting weight's 3 spatial dimensions - Perform CpuTranspose For more details see the documentation in src/cpu/operators/CpuGemmConv2d.cpp This is one optimization since the CpuTranspose is better performing than CpuWeightsReshapeKernel A second optimization is to fuse this transpose with other weight transformations (e.g. pretranspose_B_array in CpuGemmAssemblyDispatch) However this second optimization depends on how the underlying gemm methods (the fall back path: CpuGemmMatrixMultiplyKernel or the assembly path: CpuGemmAssemblyDispatch) chooses to fuse the transpose. Therefore, this patch moves the transpose down from CpuGemmConv2d, to the individual gemm operators where the fusion decision needs to be made, by passing an extra "transpose_b" flag to CpuGemm New transpose_b flag in different scopes (they are all the same, but with different names because pretranspose_b has a different meaning in GemmAssemblyDispatch): GEMMInfo::pretranspose_B -> AsmGemmInfo::transpose_b New auxilliary tensors holding the transposed b result: - CpuGemm optimized path: CpuGemmAssemblyDispatch::PrePretransposedB - CpuGemm fallback path: CpuGemm::PreTransposedRHS Note that this patch does not yet have the second optimization (COMPMID-6595), but it prepares for it. Relates to COMPMID-6595 Resolves COMPMID-6499 Change-Id: I999a2da9da4b2b15369a3cc06d7872c86e0190ea Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10526 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-07	Update heuristic for MatMul Native U8	Gian Marco Iodice
	Resolves COMPMID-6479 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Change-Id: I13aa0ef944a75ba8b5e4df183d52df57b9aba90f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10659 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>