aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
12 daysProvide a wrapper class to expose cpu::CpuGemmRyo Suzuki
This wrapper allows us to utilize the functionality of CpuGemm without directly exposing the source code. Change-Id: I408630f52acd610c912e5c5fa02bfee5f884471e Signed-off-by: Ryo Suzuki <ryo.suzuki@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11607 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
12 daysUpdate CPU kernels and add mixed sign GEMM supportMichael Tyler
- Add support for mixed sign quantized convolution. - Add support for mixed sign dequantized GEMM. - Add SME FP16 GEMV kernel. - Change SME vector length function to use RDSVL instead of static variable. - Add GEMM dilation support internally (not exposed yet). - Remove unused "get_default_activation_values" functions. - Add SVE fixed format interleaved BF16 DOT kernel. - Updates and optimizations to assembly kernels. Resolves COMPMID-6926 Change-Id: I227f502502611d4cc4111c89e30c53ce94079544 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11570 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-06-21Disable fix for long path on Windows(R) OSQuoc Khanh Le
The TEMP file setup is currently unavailable on the Windows(R) operating system because the RANLIBCOM variable is missing. For now, restrict the fix to POSIX(TM) operating systems. Signed-off-by: Quoc Khanh Le <QuocKhanh.Le@arm.com> Change-Id: Ia347a488efea5eceba9a11bde88fda2dcf88c1d5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11743 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com>
2024-06-19Separate data type for accumulator in DConv3D testSangwon Ha
Resolves: COMPMID-6947 Signed-off-by: Sangwon Ha <sangwon.ha@arm.com> Change-Id: I7fcf4f41d2961edf1fdf05e8f0b538a94f75295a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11710 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-06-19Fix build and link issue when using long pathQuoc Khanh Le
SCons may fail during the building or linking process if the path exceeds the maximum character limit. To address this, support for using TEMPFILE has been added to handle excessively long command line strings. Signed-off-by: Quoc Khanh Le <QuocKhanh.Le@arm.com> Change-Id: Ic94e7f087f6d044602bdc1fe3af0d0836cb22a3e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11590 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-06-17Update ErrataRamy Elgammal
- Report fix of out of bound memory write for non-optimized FP16 GeMM kernel. Resolves: COMPMID-6904 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib06a5e6e70c9d86e422ab3b82a137ba46449f392 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11713 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-06-14Fix out-of-bound memory writeViet-Hoa Do
* Non-optimized FP16 GeMM kernel has out-of-bound memory write. - This doesn't affect optimized assembly kernels. - This bug writes 1 extra FP16 value to the destination tensor. Resolves: COMPMID-6904 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I26b8ebcd15680b25c97c4b7e331996f397692447 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11706 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-06-12Update documentationRamy Elgammal
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I46f936f3c503d4801c4dba85900cee00bc372683 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11690 Reviewed-by: Suhail M <MohammedSuhail.Munshi@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-06-10Enable FP16 in multi_isa+v8aPablo Marquez Tello
* Enable FP16 kernels in NEROIAlignLayerKernel NEComputeAllAnchorsKernel NEBoundingBoxTransformKernel NEInstanceNormalizationLayerKernel NEBatchNormalizationLayerKernel * The FP16 kernels were disabled due to the use of __ARM_FEATURE_FP16_VECTOR_ARITHMETIC * Resolves MLCE-1305 Change-Id: Ib8dd3cad631667018b25db4ba76007dbfb4bf5a5 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11677 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-06-10Fixed illegal instruction in SoftmaxPablo Marquez Tello
* The softmax kernel is using SME2 instructions on non SME2 devices * Resolves MLCE-1304 Change-Id: I9d7d94443e7c9df4e7c1a05eeef6838f530b357b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11676 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2024-05-23Fix OpenMP thread scheduling for large machinesHamza Butt
Resolves ONCPUML-1648 and ONCPUML-1539 Signed-off-by: Hamza Butt <hamza.butt@arm.com> Change-Id: Ib70a4f8cef61c2979dfd265c0755c541930ee563 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11575 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-22Update documentationMichael Kozlov
Signed-off-by: Michael Kozlov <michael.kozlov@arm.com> Change-Id: I43d59bfbf932a37e7bda7dcf4f447f12237e0fa8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11612 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: <felixjohnny.thomasmathibalan@arm.com>
2024-05-22Use lookup table for Fp16 Tanh activation in hardware with SVEGunes Bayir
Resolves: COMPMID-6901 Change-Id: Idcd3f5f5d90f4073aaf116c0586e46013fbd64f7 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11605 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-22Replace explicit version numbers in the Readme with placeholdersGunes Bayir
The placeholders will be replaced only in the release branches instead of main. This will also help with commit automation. Partially Resolves: COMPMID-7020 Change-Id: I6d68dcef2f2d07181ce5d61892b10adbfd4cd575 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11538 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2024-05-21Fix issues with OpenMP scheduler little core exclusion.Omar Al Khatib
1. Remove unnecessary restriction to the exclusion only running on systems with little mid and big cores. 2. Allow override of the suggested number of threads in case the user sets the number of threads to a lower value. Resolves [COMPMID-7014] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: Ifb76ef4454f38dd2e3e5781b5dfea07c044aeb74 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11604 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2024-05-17Update logic in the OpenMP scheduler to exclude LITTLE coresOmar Al Khatib
On systems with BIG/MID/LITTLE cores, we need to exclude the LITTLE cores. This is make changes to CPUInfo to detect number of LITTLE cores and set the num_threads to TOTAL_CORES-NUM_LITTLE cores Resolves [COMPMID-7014] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I3e1772e5b64d1c45304860be43233b7e5dd8dba1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11565 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-17Fix linking error to fp16_run_dequantization_core()Ramy Elgammal
Resolves: COMPMID-7063 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ife4d9f0b2644a649da45544b8789c51c15c9aebf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11574 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-05-16Fix nightly build errorPablo Marquez Tello
Change-Id: I03fd3821d3636418f529f3395eceeaa00d02664b Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11562 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
2024-05-16Refactor Dequantize to enable FP16 kernel in v8a multi_isa buildsRamy Elgammal
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> COMPMID-7058 Change-Id: I9c6d18a8fddaf335bcd1e8dd562fa3838c1ca4b2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11561 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2024-05-15Fix nightly build errorPablo Marquez Tello
* Resolves COMPMID-7059 Change-Id: If77e579199720b7234298d2dc844d88c05989bf9 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11556 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-05-14Rework CpuQuantizeKernel to enable FP16 in multi_isa buildsRamy Elgammal
Resolves: COMPMID-7054 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I68d125b81ad7f74b2594ccda8d6ec08beef1ebd7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11555 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-14Refactor arm_gemm to enable FP16 in all multi_isa buildsPablo Marquez Tello
* Resolves MLCE-1285 Change-Id: I22a37972aefe1c0f04accbc798baa18358ed8959 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11552 Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-13Fix ReductionLayer FP16 for armv8a multi_isa buildsRamy Elgammal
- Enable FP16 code when building multi_isa for armv8a architecture in order to run on higher architectures e.g. 8.2, 8.6. - When running this build on v8 the validation will stop it flagging that the arch does not support FP16. Resolves: COMPMID-7013 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I0d445e2fade31c1156d7a6e142edf2a7f84d3622 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11544 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-10Improve CPU extension detection on macosViet-Hoa Do
Resolves: COMPMID-7021 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I809bc6ecd2845dfe6ee5de20a902aea4d07f15a5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11540 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com>
2024-05-10ScatterND fix for scalar casesGunes Bayir
- Padding with batched scalar cases is unsupported, adds checks. - Adds tests for scalar cases, without padding. Resolves: [COMPMID-7015] Change-Id: Ib9cf5db990420ff4b442d003ef9424e365bee86d Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11536 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-08Make quantization rounding consistentJonathan Deakin
In NEQuantizeLayer for QASYMM8_SIGNED, the rounding was inconsistent between the unrolled loop and the leftover loop, which meant identical values (e.g. 0.5) at different indices of a Tensor could round to different values (0 or 1 in this case). We have changed vcvtaq to vcvtnq to round to the nearest, with ties to even. This matches the default fegetround setting, so it is a sensible default. Relates-to: COMPMID-6994 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Change-Id: I8e7ecb1b8dbdd3e887697a92046af99ed33fc78f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11532 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-05-08Add SME2 implementation of Softmax for QASYMM8 and QASYMM8_SIGNED.Omar Al Khatib
Resolves: [COMPMID-6917] Change-Id: Id8b96efd29f6c61dd43a371341c6e1fe087953e9 Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11509 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-08Add batched indices support to Scatter GPU ImplementationMohammed Suhail Munshi
Resolves: [COMPMID-6897] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I70b1c3c5f0de8484fcb6c3b0cc0d0d8c059b0f58 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11525 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-03arm_gemm: fix SVE check on fast mode kernels.David Mansell
SVE BF16 kernels need to check for svebf16(), not just bf16(). Change-Id: I89494aac40166eba59719bed9822194a48ac282d Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11520 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-05-02Change reorder implementation to be vector length agnostic for OHWIo8 reorderRadu Salavat
As the reorder kernel is called with WeightFormat OHWIo8 for hardware that does not support it e.g. vector length 128, adapt the test case and add kernel implementation for this edge case. This fixes the mismatching values that appear when OHWIo8 fixture was run with 128 vector length. Resolves: ONCPUML-1523, COMPMID-6281 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: Iaa1a3b486d1725a2d6031051aa544082c1bbe913 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11421 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-05-01New SME2 heuristics.David Mansell
Change-Id: I69aa973e61df950060807a31230a1edd91add498 Signed-off-by: David Mansell <David.Mansell@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11514 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-04-30Add fp16 and integer data type support for ScatterNd in GpuGunes Bayir
Resolves: COMPMID-6899 Change-Id: I3743f2c9e5c21e1ec9f4c81d08c148666afad33a Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11505 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Sang Won Ha <sangwon.ha@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-29Disable SME2 Gemmlowp s8f32 kernel selection in case results needs to be ↵Gunes Bayir
accumulated Similar to https://review.mlplatform.org/c/ml/ComputeLibrary/+/11500, s8f32 kernels do not support accumulate mode. This patch modifies the kernel selection and also adds more tests to stress these test cases better. Partially Resolves: COMPMID-6995 Change-Id: I40e19446c012eb7334e4511e254cce0d635aa234 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11503 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Radu Salavat <radu.salavat@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-26Disable SME2 Gemm kernel selection in case results needs to be accumulatedGunes Bayir
SME2 kernels use a different accumulation buffer and destination tensor is not copied to this buffer as initial value, thus causing mismatches. This patch modifies the kernel selection algorithm such that it does not select SME2 kernels if accumulation is required. Resolves: COMPMID-6995 Change-Id: I82da3cba41729f938a046f26b41b63ff5716c02d Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11500 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-04-25Add update/index/output (m+1)/2d/(m+n) support for CLScatterGunes Bayir
Resolves: COMPMID-6894, COMPMID-6896 Change-Id: I9d29fd3701a7e0f28d83f81a6c42a7234c2587c3 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11477 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-25Move s32 to f32 conversion in reference layers from quantization to ↵Radu Salavat
dequantization Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: Ib17946b526d35deeca94b5d2f163b92101e313c4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11420 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-25Add memory stress tests for per channel quantized convolutionGunes Bayir
Partially Resolves: MLCE-1255 Change-Id: Ibadcfedd43530232c65f05e571bc8b4568a63e67 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11499 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-25Add padding to the shift and multipliers buffersPablo Marquez Tello
* All per-channel requantizing hybrid assembly kernels require these buffers to be padded. * Resolves MLCE-1255 Change-Id: I892b8ee9b31e079189ec72f3fc6da4ce5efda974 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11491 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-24Fix compiler error in the validation testsPablo Marquez Tello
* Building with openmp=1 cppthreads=0 caused a linker error in the validation suite Change-Id: I16d8a49e9190cd1288237d82583a0034e20a9f38 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11483 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-22Multi-Dimensional and Batched Scatter Reference and Dataset Implementation.Mohammed Suhail Munshi
Resolves: [COMPMID-6893, COMPMID-6895, COMPMID-6898] Change-Id: I355f46aeba2213cd8d067cac7643d8d96e713c93 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11430 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-22Scatter GPU Kernel Implementation for 1D tensors.Mohammed Suhail Munshi
Resolves: [COMPMID-6891, COMPMID-6892] Change-Id: I5b094fff1bff4c4c59cc44f7d6beab0e40133d8e Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11394 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-17Update documentation for 24.04 releaseMichael Kozlov
Change-Id: Ifec7015ad5712d8b84d65203a5fa21cbefcb04ad Signed-off-by: Michael Kozlov <michael.kozlov@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11438 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-16Fix v7 test failure when core matmul result is dequantized into fp32Gunes Bayir
Partially Resolves: ONCPUML-1444, MLINFSW-439 Change-Id: Ic7498d6944df2848f3e82eaf4e11cc5cb6ef5754 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11424 Reviewed-by: Anitha Raj <Anitha.Raj@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-04-16fix compilation errors on linux with gcc12Sunita Nadampalli
Signed-off-by: Sunita Nadampalli <nadampal@amazon.com> Change-Id: I21eca31d97d6e2ca8279adb9db65f11540e72689 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11396 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2024-04-15Add s8f32 kernels and dynamic QuantizationInfoJonathan Deakin
- Add support for QASYMM_SIGNED*QASYMM8_SIGNED->F32 in CpuGemmLowpMatrixMultiplyCore - Add s8f32 kernel using existing s8->s32 kernels with a new DequantizeFloat OutputStage, the structure is similar to Requantize32 but the opposite way around. - Add SME s8f32 kernels with integrated support for DequantizeFloat. - Add scale to CpuGemmLowpOffsetContributionKernel. - Add virtual dequantize scale to gemm_common, only implemented for gemm_interleaved. - Update year to 2024 in generate_build_files. - Add dynamic flag to QuantizationInfo which signals to operators that it can change after configuration - Add support for dynamic quantization in NEGEMMLowpMatrixMultiplyCore - Add dynamic quantization fixture by extending GEMMLowpGenericMatrixMultiplyCoreValidationFixture - Add GEMMLowpDequantizedMatrixMultiplyValidationFixture - Store k (number of cols of A) rather than k_offset in the offset contribution kernels so that we can recompute it when the other offsets change relates to: ONCPUML-1444 MLINFSW-439 Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Co-authored-by: David Mansell <David.Mansell@arm.com> Change-Id: I58a3acf2c09289a303e52eea6b336a696a5bc8da Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11022 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-15Add guarding for accumulation validation test in aarch32Radu Salavat
Partially Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: I681df5e9c399996fbc7dc362b906af151588ca44 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11416 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2024-04-12Runtime checks for bf16 fixed format testsDavid Svantesson-Yeung
Add checks for bf16 support for bf16 fixed format tests. This ensures tests pass in multi_isa setting where library was compiled with bf16 support, even on systems that do not support bf16. Also adds runtime check to GEMMConvolutionLayer/Float/BFLOAT16/RunSmall. Resolves: COMPMID-6922 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Change-Id: Ic0f09ba34b5a2c64be8bfc848a4457a6b1c4d1c3 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11408 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2024-04-12Accumulation in Cpu Gemm kernels is not supported for quantized kernels in ↵Radu Salavat
aarch32. This patch guards the relevant tests. Partially Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Change-Id: I8eed80db4b522185c3c50c13f0f701aa48961057 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11410 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-11Add SME2 implementation of softmax for FP16Gunes Bayir
In addition to the softmax kernel, this patch fixes minor issues in the fp32 implementation. Resolves: COMPMID-6920 Change-Id: Ibbd9f0af5f2a93fba0e92d72ba437279c34149d3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11402 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2024-04-11Add in place summation to CPU GEMM kernelsRadu Salavat
Instead of dispatching the sum postop for GEMM kernels to a separate kernel + add, that requires an extra destination sized allocation, plus 3 extra load/stores per element, just do it in the GEMM kernel. Resolves: ONCPUML-1442 Signed-off-by: Radu Salavat <radu.salavat@arm.com> Co-authored-by: Milos Puzovic <milos.puzovic@arm.com> Change-Id: I7a1f2da3300875fa1ac88b705a34390969518077 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11298 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>