aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/NEON
AgeCommit message (Collapse)Author
2023-02-01Remove fixed format strides hackJonathan Deakin
- Remove hack in CpuGemmAssemblyDispatch.cpp which tried to guess strides for fixed format kernels. Instead, expect that strides will have been correctly set on weights externally - Update fixed format test fixtures to set the strides - If the fixed format uses fast math mode, then weights should be of type BFLOAT16. Change the validation logic to accept this. Resolves: [ONCPUML-1131] Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I0f18d8b86b0f639be25fd122fa06a591e90645f2 Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8985 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-30Skip upsampling for deconvolution when not neededAnnop Wongwathanarat
If the input tensor's stride is 1 and the kernel size is 1x1, skip upsampling step and pass the input tensor pointer for convolution directly. Partially resolve: [ONCPUML-1137] Change-Id: I9de9444ff99cf35d44a51ccbe0fa6facc1035d27 Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8994 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2023-01-20Add enable_fast_math for NEDeconvolutionLayerAnnop Wongwathanarat
Resolves: [ONCPUML-1128] Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com> Change-Id: I287a71222d3f0289d8cccfcb15383b0a930a55e6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8952 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-21Optimize MeanReduce by integer acc. and removing upfront dequant.Omar Al Khatib
Resolves: [COMPMID-5466] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I68af0bb54580bebd2ace1fba30aa73f7f68a4dbb Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8804 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-09-16Optimize Quantized/Integer Bilinear Scale for Neon™Gunes Bayir
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC. This patch - Reduces the memory footprint by disabling precomputation of indices and weights when they're not used - Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8) - Adds S8(Int8) Bilinear Scale for Border mode REPLICATE - Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation - Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset Resolves: COMPMID-5453, COMPMID-5454 Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250 Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-09-09Optimize FP32/16 Bilinear Scale Kernel for Neon™Gunes Bayir
This patch removes index and weight pre-computations where it's not used and reduces some calculations inside the inner-most loop of Scale. Resolves: COMPMID-5452 Change-Id: Ie149b1b76a90a8cb659ada0f97aef78caf69932f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8220 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-08-03[ONCPUML-968] Fixed format kernel support in additional APIsMilos Puzovic
Implements required plumbing in order to be able to ask and execute fixed format kernels from NEFullyConnected, NEGEMM and NEGEMMConv2d. These APIs are used to accelerate oneDNN primitives (inner product, matrix multiplication and indirect GEMM respectively) and without changes it would not be possible to call fixed format kernels from those oneDNN primitives. Change-Id: I27534f0491ce28d0ccb98c19f318bd33dcdf2ff5 Signed-off-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7999 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-26Fix for inclusion of "arm_gemm" from src into "Types.h" from coreRamy Elgammal
- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat when needed through two map function. - Moved to_string(WeightFormat) to TypePrinter.h Resolves: COMPMID-5415 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Sicong Li <sicong.li@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-19[ONCPUML-951] Variable weight support for Convolution.Francesco Petrogalli
API changes for NEGEMMConvolutionLayer and CpuGemmConv2d Built with: scons neon=1 opencl=0 os=linux arch=armv8.2-a multi_isa=1 \ build=native -j32 Werror=false validation_tests=1 build_dir=opt \ standalone=1 asserts=1 experimental_fixed_format_kernels=1 . Tested with: ./build/opt/tests/arm_compute_validation Hardware where the test executable was run: Neoverse N1 Test coverage: * NEGEMMConvolutionLayer, CpuGemmConv2d * NHWC (the only one supported by the fixed-format kernels) * F16, F32 * Shapes: RunSmall Change-Id: I4fd3e495a7cbf61210ea02d37440ba9652934e99 Signed-off-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7632 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-07-14Integrate new winograd APIs from MLTechramelg01
Resolves: COMPMID-5400 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ib4428436dd7a6e40d8b2d8a2f8dac1b079154551 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7894 Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-05-06QLSTM add support for different qinfoMike Kelly
* Resolves MLCE-604 Signed-off-by: Mike Kelly <mike.kelly@arm.com> Change-Id: Ice3d6f361588f1a6bd0bff301c27b0d063a5c014 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7529 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-04-21NEQLSTM: perform type conversion in prepare method.Pablo Marquez Tello
* When input_to_forget_weights is QASYMM8_SIGNED, the conversion to QSYMM8 is done in the prepare method * Partially resolves MLCE-604 Change-Id: Iddadbc50d77381542451ac4e46de49b2706bc88c Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7441 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-04-01Add CPU Pool3d FP16/32 implementationAdnan AlSinan
- Add implementation for the CPU pooling 3d layer. - NDHWC data layout support - Support FP32/FP16. - Add Pool3d to the operator list. - Fix CL Pool3d kernel comments to generate the operator list. Resolves: COMPMID-4671 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: I92478a154beb12541525b648ed3dd5a58c8f27fa Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7311 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> (cherry picked from commit 572659a0e5dd1086b1c7d16fe331ff73d2acd93a)
2022-03-29NEQLSTM: Add support for QASYMM8_SIGNED for input_to_forget_weightsPablo Marquez Tello
* QLSTM only supports QSYMM8 for the argument input_to_forget_weights * We add support for QASYMM8_SIGNED by dequantizing and requantizing to QSYMM8 * Resolves COMPMID-5184 Change-Id: I1cae18d81dafdb7ae722b520a1354cf4a56b9606 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7321 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> (cherry picked from commit 187a041dedf8e9db0c9e0652f13f8639dca880f3)
2022-03-24QLSTM add support for different qinfo in weightsPablo Marquez Tello
Resolves COMPMID-5185 Change-Id: I61e1453e8851ab84c1cadc10587ebd23fd94799e Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7330 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-16Fixed threshould argument order in NE/CL/LSTMPablo Marquez Tello
* Fixed hardcoded LOGISTIC activation in ACL reference * Partially resolves MLCE-60 * Resolves COMPMID-5139 Change-Id: I50e75339084ea53bf75acf18aa3e5cdafcf34c15 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7150 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-14Port MaxUnpoolingLayer kernel and add KernelSelect vaidation testDana Zlotnik
Resolves COMPMID-4958 Change-Id: Ibed5155f2e3ece46635f6ea9617bf11cefc402b1 Signed-off-by: Dana Zlotnik <dana.zlotnik@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7028 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-02-09Remove deprecated remap functions.Adnan AlSinan
- Remove CLRemapKernel. - Remove NERemapKernel. Partially resolves COMPMID-4984 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-11-15Initialise quantization info in NEPadLayerPablo Marquez Tello
* Resolves COMPMID-4884 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Id28f0df242fe240c70f22e3ad55e4729ab1e40fe Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6641 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-10-18DirectConv3d support refineSheri Zhang
- Decouple data support of CpuDirectConv3dKernel - Update documentation for Conv3d Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: I1d94aa28f821f45a1a3d39cc3335c8faeee89f0d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6453 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-10-15Conv3d supportSheri Zhang
* Add CpuDirectConv3d support for fp32 and fp16 * Dilation is not supported * Need decouple Partially resolve: COMPMID-4661 Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: Ib1865b9ff328b684d131512b1baf77bc2f10318f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6430 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2021-10-04Provide logging for configure functions in all CPP functionsramelg01
- Moving impl of CPPSplit template to src/runtime/CPP to allow including of Log.h from src/common. - Fix logging of vector<ITensor*> to print contained tensor's info not their ptrs. Partially-Resovles: COMPMID-4718 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Idec81665b2a7c0cfae5248803109c6e2edc520a1 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6362 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-09-22Provide logging for configure functions in all NEON functionsramelg01
Partially Resolves: COMPMID-4718 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I655268c57fa126d9c99981c49d345a3aac75646e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6286 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com>
2021-08-25Move CPU/GPU files from Core/Runtime to the respective backend foldersGeorgios Pinitas
Legacy structure contained two libraries core/runtime with two backends in each. We reduce the core/runtime libraries to a single library thus merging the backend files Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-08-20Rename [Cl|Cpu]GemmConvolution to [Cl|Gpu]GemmConv2dGeorgios Pinitas
Renaming the gemm-based convolution operators to accomodate for new operators with higher convolution dimensonality Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Id2f2cf11404221f0e87baa0e5d08ad5d63eaf78e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6113 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-19Address comments on avoiding releasing weights if used by multiple functionsGiorgio Arena
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: I0b59c5326f5fcbc322fbeb864197ea999de6bd56 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6112 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-08-13Avoid releasing weights if they are used by multiple functionsGeorgios Pinitas
Resolves: COMPMID-4769 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Iccadcbd68b0fd84ed3bf212e358a4ea944084a40 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/349845 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6107 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-08-13Ensure correct transformed matrices are used in CpuGemmConvolutionGeorgios Pinitas
Resolves: COMPMID-4763 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Iae2e093cfb7d2c7172603897afe1c6a2e5d1caa3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/349725 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6101 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2021-08-12Ensure that correct transformed matrices are used in CpuFullyConnectedGeorgios Pinitas
Execution pack of CpuFullyConnected was altered explicitly with local objects that were getting out of scope. Leading to incorrect results or memory related issues. Track transformed weights and register the weights matrix explicitly during execution honoring the object lifetime scope. Resolves: COMPMID-4762, COMPMID-4764 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I53449c377fb1cfccdf5e6f9505d963518748c318 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/349345 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6092 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-30Reintroduce implementation of NEConvolutionLayer::get_convolution_methodMichele Di Giorgio
Issue caused by the porting of the function to the new API. The method will call down to the new CpuConv2d implementation. Change-Id: I650ad1f17c8b89a637b589e452ca785b5d14e975 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6027 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
2021-07-29Port NEConvolutionLayerMichalis Spyrou
Resolves: COMPMID-4507 Change-Id: I9557026ec0052b5585994f7a1300a14565c976d0 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5964 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-07-28Reduce binary footprint of CpuConvertFullyConnectedWeightsKernelMichele Di Giorgio
Binary size reduction for this kernel is almost 50%. Also remove unused NEConvertFullyConnectedWeightsManaged. Change-Id: Ia46a1342a0737397b4aac2578d963c2ebb7446e3 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6011 Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-07-23Port NEFullyConnectedLayer to memory injecting interfaceMichele Di Giorgio
Resolves: COMPMID-4501 Change-Id: Ib61b3d06974009e501b3fb86467735427e13a94a Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5931 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2021-07-22Expose fast_math mode for GEMM through BFloat16Georgios Pinitas
Resolves: COMPMID-4641 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I7ccc663b2692d40c370794caa906b5be8fd25a32 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5977 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-22Update GEMM assembly kernelsGeorgios Pinitas
- Introduce Fp32 kernels with internal calculations in Bfloat16 when fast_mode is enabled - Improve kernel selection heuristics Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: I68a9e7e862b6fd2721b46e0d7cc791091c4ab279 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5965 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-16Avoid multiple Rhs matrix transformation on ClGemmGeorgios Pinitas
ClWinogradConv2d was performing Rhs transformation on every step impacting the performance. Adds scope logging support through ARM_COMPUTE_LOG_MSG_WITH_FUNCNAME Resolves: COMPMID-4596 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ib329d3bc8d8aa21abae9fabfe61de35cc84d4819 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5925 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-15Disabled DirectConv method for NCHW and kernel > 15Pablo Marquez Tello
* We prefer GEMM method for kernel size > 15 when data layout is NCHW because direct convolution does not support this. * Resolves COMPMID-4581 Change-Id: Ie18cc96bc6b446fd59e8c8ebb10c3af5ca02c3bb Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5935 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-15Port NEGEMMConvolutionLayerManuel Bottini
Details: port NEWeightsReshapeKernel to CpuWeightsReshapeKernel port NEGEMMConvolutionLayer to CpuGEMMConvolutionLayer Resolves: COMPMID-4509 Change-Id: I3c7051e2c3f6d808a7ccb898aad70e5b221b9dc3 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5938 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-13Port NEWinogradConvolutionLayerMichalis Spyrou
Rename to CpuWinogradConv2d Allow memory to be injected externally Change-Id: I1f0a26ea533e326a7c63df86e708895c31752a39 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5926 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-09Port NECol2ImKernelManuel Bottini
Resolves: COMPMID-4511 Change-Id: Id6335cb23ef22bba02083498025da0ecb1647714 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5898 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-08Port NEGEMMLowp Part 2Manuel Bottini
Details: Extend NEConvertQuantizedSignednessKernel Port NEGEMMInterleave4x4Kernel to CpuGemmInterleave4x4Kernel Port NEGEMMTranspose1xWKernel to CpuGemmTranspose1xWKernel Port NEGEMMLowpMatrixAReductionKernel to CpuGemmLowpMatrixAReductionKernel Port NEGEMMLowpMatrixBReductionKernel to CpuGemmLowpMatrixBReductionKernel Port NEGEMMLowpOffsetContributionOutputStageKernel to CpuGemmLowpOffsetContributionOutputStageKernel Port NEGEMMLowpOffsetContributionKernel to CpuGemmLowpOffsetContributionKernel Resolves: COMPMID-4403 Change-Id: I3227f052f25e7b41d073bbea1da8a881fcd78b8e Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5875 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2021-07-06Port NEIm2ColKernelManuel Bottini
Resolves: COMPMID-4510 Change-Id: Ia3e588f599449d975dabad4afafb2974dd44d0ad Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5899 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-07-02Port NEGEMM to memory injecting interface (Part 3)Michele Di Giorgio
- Complete porting of NEGEMM to the new API Resolves: COMPMID-4402 Change-Id: I14904102b25332dbb4fc048d45dca068a15b6eca Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5890 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Port NEGEMM to memory injecting interface (Part 2)Michele Di Giorgio
- Port NEGEMMMatrixMultiplyKernel to the new API Partially resolves: COMPMID-4402 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Change-Id: I52b67055dc24bb3a417d6ec5aeeee86e21b74320 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5873 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-29Port NEGEMM to memory injecting interface (Part 1)Michele Di Giorgio
- Start porting NEGEMM to the new API - Port NEGEMMInterleave4x4Kernel to the new API - Port NEGEMMMatrixAdditionKernel to the new API - Port NEGEMMTranspose1xWKernel to the new API - Remove padding from NEGEMMMatrixAdditionKernel - Remove unused INESimpleKernel and ICPPSimpleKernel Partially resolves: COMPMID-4402 Change-Id: I63edadddfe00a54586e5384d6a0211db25ae9042 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5857 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-25Port NEGEMMConv2d to memory injecting interfaceMichele Di Giorgio
Resolves: COMPMID-4506, COMPMID-4570 Change-Id: I6d37a06da141f1fcfcaa8525322a319cb0234791 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5824 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-23Create core library using high priority operatorsMichalis Spyrou
A smaller core library is created using a subset of the operators. Changed the structure of filelist.json in order to include more information about the kernels and make the selection easier. Resolves: COMPMID-4514 Change-Id: I079ca7d8e64346174eebdd13b834e1dd4dc36ca2 Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5786 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-22Port NEGEMMLowp Part 1Manuel Bottini
Details: Port NEGEMMLowpQuantizeDownInt32ScaleKernel to CpuGemmLowpQuantizeDownInt32ScaleKernel Port NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel to CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel Port NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel to CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel Port NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel to CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel Port NEGEMMLowpOutputStage functions to CpuGemmLowpOutputStage operators Partially Resolves: COMPMID-4403 Change-Id: I6d5f45e43f35d731d564ed3b5c0e804d2a318fb1 Signed-off-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5833 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-06-22Add FP16 support to CLRemapFreddie Liardet
Add FP16 support to CLRemap when data layout is NHWC. Add relevant tests for FP16 and validation. Update NERemap function level to be consistent with CLRemap. Add depreciation notice for uint_8 only function level methods. Resolves: COMPMID-4335 Signed-off-by: Freddie Liardet <frederick.liardet@arm.com> Change-Id: If05f06801aef7a169b73ff1ebe760a42f11ca05c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5816 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2021-06-18Integrate improved CPU depthwise convolution kernelsMichele Di Giorgio
* Replace assembly kernels for depthwise convolution with more optimized ones. * Add int8 assembly kernels. * Fix implicit padding on optimized kernels Resolves: COMPMID-3867, COMPMID-4361 Change-Id: I0b0867e05f61be4f368f62190d55e14d0ab3ebf2 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5622 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>