aboutsummaryrefslogtreecommitdiff
path: root/src/cpu/operators/CpuDepthwiseConv2dAssemblyDispatch.cpp
AgeCommit message (Collapse)Author
2024-02-12Fix parallel depthwise perf regression from 2db938cJonathan Deakin
Incorrect conditional meant that we were parallelizing over batches when we should have been parallelizing over rows. Relates to: ONCPUML-1443 COMPMID-6875 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Change-Id: I61d43bb2a94e8a6887d4cc5d1ae2ebb03295dff7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11120 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2024-02-07Parallelize CPU depthwise over batch if only 1 rowJonathan Deakin
This patch also fixes a bug where the split dimension was wrong in CpuDepthwiseConv2dAssemblyDispatch::run. It was set to DimY, which is cols, but it should have been DimZ. This was rarely an issue in practice because typically the number of cols are greater than the number of threads anyway. Relates to: ONCPUML-1443 Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: Ifed2fce22ddeb7cd77e6a6ae1083694427f91e04 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11083 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2023-12-06Revert "thread_local _custom_scheduler"Pablo Marquez Tello
This reverts commit ded5b182675e3166e947a8eb637b5b1e925816ab. Resolves COMPMID-6735 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I9b69ca1ec80a671171d3f52081c4b8c61a676617 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10838 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: <felixjohnny.thomasmathibalan@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-11-24thread_local _custom_schedulerDavid Svantesson
Resolves ONCPUML-1331 This patch adds an option to make _custom_scheduler thread_local to support usage of multiple schedulers handled outside of ACL. It also adds num_threads() function to Scheduler which reverts to querying CPUInfo if no scheduler has been set. Change-Id: Iff706165d8d091895331a5bb3a76f6cabe048912 Signed-off-by: David Svantesson-Yeung <david.svantesson-yeung@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10748 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-09-28Apply clang-format on repositoryFelix Thomasmathibalan
Code is formatted as per a revised clang format configuration file(not part of this delivery). Version 14.0.6 is used. Exclusion List: - files with .cl extension - files that are not strictly C/C++ (e.g. Android.bp, Sconscript ...) And the following directories - compute_kernel_writer/validation/ - tests/ - include/ - src/core/NEON/kernels/convolution/ - src/core/NEON/kernels/arm_gemm/ - src/core/NEON/kernels/arm_conv/ - data/ There will be a follow up for formatting of .cl files and the files under tests/ and compute_kernel_writer/validation/. Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Change-Id: Ib7eb1fcf4e7537b9feaefcfc15098a804a3fde0a Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10391 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2023-07-04Depthwise channel pre-multiplicationMichael Tyler
Resolves: COMPMID-6337 Change-Id: Ie9097b3f56e8071426c621386a5988bd7f7e8ef2 Signed-off-by: Michael Tyler <michael.tyler@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9852 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Updateable weights in depthwise convolutionMilos Puzovic
Check whether weights are defined as constant, if they are not constant then do not pack them if they are already packed so that they can be updated. Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I73447e31e3660b05f8f40e04ea4ea2003eb9b802 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8539 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2021-09-16Provide logging for configure functions in all cpu operatorsramelg01
Partially Resolves: COMPMID-4718 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I02eabdd6bce8cd561ab2fdfd644a686a3762b817 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6253 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-25Move CPU/GPU files from Core/Runtime to the respective backend foldersGeorgios Pinitas
Legacy structure contained two libraries core/runtime with two backends in each. We reduce the core/runtime libraries to a single library thus merging the backend files Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>