aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-09Use heuristics for setting dynamic fusion direct conv2d tile sizesRamy Elgammal
Resolves: COMPMID-5735 Change-Id: I9958413b69c5052cfa205dd0e9457cc4953aaf35 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/474818 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8724 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Optimize CPU base-e exponential function on FP32Viet-Hoa Do
Resolves: COMPMID-5664 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I4182752e213aade19005ee984a488c2490453f8f Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8747 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
- Implement indirect convolution kernel - Add operator support - Add test Resolves COMPMID-5709 Change-Id: I9272304163471a5a40da7fdec204599f3c1d8e32 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-30Fix build error for unused variables in data type specific buildsGunes Bayir
The fp32 mws selection variables are guarded because they are flagged as unused when data type support does not include fp32. Resolves: COMPMID-5761 Change-Id: I7ac783e3d5ca51868b562ab879d03e02140b51a1 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8707 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-29Adding GpuAdd to dynamic fusion operatorsRamy Elgammal
- Provide support for Add operator - Auto initialize the destination tensor before testing fusion in conv2d and elementwise binary ops. Resolves: COMPMID-5518 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: Ibd815020f02b57f88eea7c2921bdcf98605d99c5 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8617 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-28Integrate SME2 kernelsViet-Hoa Do
* Add SME/SME2 detection. * Integrate SME2 implementation for: - Normal convolution - Winograd - Depthwise convolution - Pooling Resolves: COMPMID-5700 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I2f1ca1d05f8cfeee9309ed1c0a36096a4a6aad5c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8692 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-28Implement FP32/16 Depthwise Conv2d operator in dynamic fusionGunes Bayir
This patch adds Depthwise Conv2d operator into dynamic fusion interface and adds the associated tests. Resolves: COMPMID-5517 Change-Id: I385c94dff7fd40c72b8337ef797e508df4499a82 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8678 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-25Implement address precalculation for indirect conv2d - OpenCLGian Marco Iodice
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel) - Implement OpenCL kernel (indirect_convolution.cl) - Add test Resolves COMPMID-5708 Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-23ONCPUML-1072: Remove double definition of get_mws for Mul kernelfadara01
Signed-off-by: fadara01 <fadi.arafeh@arm.com> Change-Id: Ieaa2fa4a6a69e7e0a48633967dabe91c786b42b7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8682 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-22Remove dynamic fusion prototype with tests and examplesSiCong Li
Public headers of the new experimental dynamic fusion can be found in arm_compute/dynamic_fusion/ New examples on how to use the interface can be found in tests/validation/dynamic_fusion/gpu/Integration.cpp Resolves COMPMID-5683 Change-Id: I7ccb902a227fb487562df15fc3c30118d1d95bbd Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8671 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-22ONCPUML-1072: Tuned MWS values (for N1, V1) for binary operators used by oneDNNFadi Arafeh
Added approximate values for MWS for the following binary operators: Add, Sub, Mul, Min, Max, Div Change-Id: I5c4c75511129982a3f44c038ee272f09598469de Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/459609 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Signed-off-by: fadara01 <fadi.arafeh@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8392 Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-22space-to-depth shape calculator fixRamy Elgammal
Resolves: ARMCL-592 Change-Id: If036a2b6d2842db2ed750e9d350bfaa3f7a76c9e Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8668 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-18Add num_threads_to_use to OMPScheduler based on workload sizecfRod
Fixes benchdnn test failures in ONCPUML-1104 when num_threads is greater than workload size. Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com> Change-Id: Ic351a3ab5b548aa1843042a053130b02d0f1d40e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8655 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-17Fix documentation about BF16 accelerationViet-Hoa Do
* Fix the heading and the code block. Resolves: COMPMID-5546 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I60162b0e0aaf2a71a70e517aaeb8c75dd82d8dd9 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8652 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix release notes for 22.11Viet-Hoa Do
Partially resolves: COMPMID-5548 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0f59bb1eaed45aef565d8cc435b59c45dc07f240 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8639 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix regression caused by mws in ActivationLayerMohammed Suhail Munshi
- Regression is caused by the small default mws in ActivationLayer - Syncronization of threads takes longer than the workload on small sized tensors. - Size 1536 is chosen arbitrarily based on the size of tensors in benchmarked networks Resolves: [COMPMID-5655] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I02e865b578399e75484f471e67806dd4cf7502c0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/468454 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8615 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fix GemmLowp BatchMatMul Tests to use quantized OutputsMohammed Suhail Munshi
- Fix includes int8/uint8 quantized inputs - Bias S32 value is limited to better allow detection of mismatches in gemmlowp kernel Resolves: [COMPMID-5659] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ie9cca430c6ab66253fe1d5252bd2c5396c7f38cf Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8514 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Fixed Arm NN unit test failure caused by quantised multiplication patch.Omar Al Khatib
Resolves : [COMPMID-5698] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I21ce976473e0e8807c14989e98e68aca69c7f1f3 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8603 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-15Add release notes for 22.11Viet-Hoa Do
Partially resolves: COMPMID-5548 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I8d26dc871dfa83af2cf06e4e3997f5331235ff36 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8607 Reviewed-by: SiCong Li <sicong.li@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
This patch optimizes transposed convolution for CL backend by rewriting it in a single kernel instead of three (flip_kernel + upsample + conv). The new kernel skips the upsampling step which reduces the input space of convolution by stride_x * stride_y, resulting in significant performance improvement. It also skips the kernel flipping by traversing the weights accordingly, thus reduces the memory footprint. Resolves: COMPMID-5676 Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Change-Id: I8a333212dc7c5f7f0597aa58b0d56d44814baa14 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8588 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Update READMEViet-Hoa Do
* Use 22.11 for all links. * Make markdown table code more aligned. Partially resolves: COMPMID-5548 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I1fd41fc4df3e97573cbfe8c3bb033c0a6dcbfb5c Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8608 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-14Optimize T_QUANTIZE8_ASYMMETRIC for Mali™ G52Pablo Marquez Tello
* Resolves MLCE-842 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: Iae0521b25a5e6c9cc8046830f397d523dfbcc66e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8542 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-10Fix compiler warnings in dynamic fusionSiCong Li
Remove conflicting Padding2D from the unused comparison operator in the prototpye Resolve unused variables in release mode Resolves COMPMID-5683 Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I19d74c57e51e6cf64003ddcbc74227608bb866d2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8590 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-09Fix CPU multiplication layer threading overheadViet-Hoa Do
* When the tensors are reinterpreted as 1D, any thing smaller than 10KB won't be splitted into different thread. Resolves: COMPMID-5630 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icff7089e37c85c8b325f099008a080a5805d36a2 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8581 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-08SVE Hard-Swish via Lookup table for quantized inputPablo Marquez Tello
* Datatype supported qasymm8 and qasymm8_signed. * Resolves COMPMID-5211 Change-Id: Ib161af3af5ad11ec6b46de0bf4fbac172d2525fb Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7729 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-07Optimize CPU mul layer on quantized dataOmar Al Khatib
Resolves : [COMPMID-5461] Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com> Change-Id: I89b99d267c32b00ef44f9bb6e7c714dfe4a0d29d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8420 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-04Fix compiler warnings in dynamic fusionSiCong Li
Resolves: COMPMID-5686 Change-Id: I608c359583c44f2f04f29faddd1c6b38a381de60 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8562 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-04Update SONAME_VERSION in SConscriptViet-Hoa Do
Partially resolves: COMPMID-5548 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: If12e7f02b8ff485cf856a6cee6773be54165ccae Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8557 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-03Fix activation block in gemm.clGian Marco Iodice
- Replace VEC_SIZE with N0. VEC_SIZE was used in the old gemm kernel and not used anymore in the existing ones Resolves COMPMID-5678 Change-Id: Ia770200b9d6e24c51c57347e4634fb8eadd10385 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8556 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-02Add Dynamic Fusion GpuConv2d FP32/FP16 testcaseRamy Elgammal
Resolves: COMPMID-5511 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I0ac0acbf1de7da09f18f7b457307ec3cc99deb3b Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8546 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-02Partially Revert "Add threshold for floating-point SOFT_RELU activation"Gunes Bayir
Revert the range removal in tests for soft relu and bring the former implementation in CL backend back. Resolves: COMPMID-5677 Change-Id: I35d5ac03a134299041ce97aabc9fff2d4380d09f Signed-off-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8551 Reviewed-by: Milos Puzovic <milos.puzovic@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Fix fixed-point quantized additionViet-Hoa Do
* The range check condition is incorrect. The maximum value of 8-bit quantized data is 255, not 1023. * Use 256 to make the check slightly stricter than it should. Resolves: COMPMID-5458 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5c071680531574b409a7ce71a732f6480caa10d8 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8537 Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Updateable weights in depthwise convolutionMilos Puzovic
Check whether weights are defined as constant, if they are not constant then do not pack them if they are already packed so that they can be updated. Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I73447e31e3660b05f8f40e04ea4ea2003eb9b802 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8539 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Add threshold for floating-point SOFT_RELU activationMilos Puzovic
Added missing threshold for calculating SOFT_RELU when SVE and CL implementations are used. As a result removed from the testing bounds for input values that were set to be in the interval [-40, 40]. Resolves: COMPMID-5658 Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com> Change-Id: I3d14df60125e36e4eb85aeb222f4fb0cc5741521 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8536 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Add check for Batch Matmul in GemmAssemblyDispatchMohammed Suhail Munshi
Relates to : COMPMID-5507 Change-Id: Ia2c4ea153ac2524ffa6b2a9c10f3a0318a8a67a1 Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8509 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rewrite dynamic fusionSiCong Li
The new version introduces the following major changes: * Change public interface to simplify and standardize the user experience - Use the term "Workload" uniformly - Simplify operator interface to be a set of static methods: validate_op(), create_op() * Separate the kernel writing into its own component (template_writer). This is to allow the co-development of GpuKernelWriter, and to allow easy replacement once GpuKernelWriter is mature. * Optimize the core fusion algorithm used by the component graph. The details can be found in GpuKernelComponentGraph::fuse() * Use Gpu instead of Cl prefixes for most of the Workload interfaces (except for runtime and kernel components, which have to be language specific) This allows the potential extension to other Gpu langauges in the future. * Refactor runtime memory interface so that auxiliary tensor handling is separate from the user tensor passing. This is because the former is less stable and may require extension in the future. * Hide source code object from the user as it is not required at the moment * Deprecate the old prototype entirely by disabling it in SCons build Resolves COMPMID-5510, COMPMID-5512, COMPMID-5513 Change-Id: If69d2362856f2de4503546b7b6cf48a525cf3079 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8406 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-11-01Rework direct convolution heuristic on OpenCLGian Marco Iodice
Resolves COMPMID-5634 Change-Id: I075de70d509d0c4430b4bcf3f218384e237a3a56 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/453708 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8473 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
2022-10-27Fix fixed-point quantized additionViet-Hoa Do
* Use the same rounding function for the left-over part with the vectorized part. Resolves: COMPMID-5640 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I07450b2a43390b77539b78cd5d3e6772bdc38548 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8520 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-25Fix compiler warningPablo Marquez Tello
* Resolved MLCE-946 Change-Id: I7a2b8d068bbc810602dd0959e8c99d87a323152f Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8513 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-24Add FP16 tanh based on rational approximationJonathan Deakin
Use rational approximation with optimised coefficients to calculate tanh_f16. Method is ~2.5x faster than previous and has lower relative and absolute error. This will fix https://github.com/ARM-software/ComputeLibrary/issues/1002 Credit to George Steed for suggesting use of vcageq instead of min+max Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com> Change-Id: Id70da3aab666c68b0d798266a837b59c00937bf7 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8480 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-21Fix mapfile generation in ClangPablo Marquez Tello
* Resolves COMPMID-5654 Change-Id: I2654f5178b4400abf333a9b7ef5c9a239ce4ef73 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8507 Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-10-20Update reinterpret tensor as 1D for CPU addViet-Hoa Do
* Use the same implementation as other layers. Resolves: COMPMID-5108 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5a50259b398b71ca1f61b5ee8daa539bf8263fac Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8501 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-20Add test in GEMMLowp for batch matmulMohammed Suhail Munshi
- Adds tests for batched matrix multiplication - Bugfix for issue : 3d tensors input tensors with offsets in GemmLowp results in mismatches Resolves : [COMPMID-5507] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: I68e036fbca642c1841dd4321033045aadc8f5636 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/461298 Comments-Addressed: bsgcomp <bsgcomp@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8482 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-19Fix FFTConvolutionLayer testViet-Hoa Do
* Do not change the tensor info after configure stage. * By fixing this, the 1D optimization for activation layer can be applied to all data types and tensor layout. Resolves: COMPMID-5644 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I557f9bb84e5e456c28d6b423584887d7a3648ad4 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8470 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-12Add scons option to generate Map files.Pablo Marquez Tello
* Resolves MLCE-942 Change-Id: I4e61e3712c7bc7e711b1fa280d7c3e73d08a57e3 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8407 Benchmark: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2022-10-12Optimize Neon™ Logistic ActivationMohammed Suhail Munshi
- Use a 1d execution window to improve memory access pattern. Resolves: [COMPMID-5465] Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Change-Id: Ida30669ffa06eb002ca43a6edf15e25a6eaad2f6 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8344 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gunes Bayir <gunes.bayir@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-12Adding documentation section explaining how BF16 is usedRamy Elgammal
Resolves: COMPMID-5494 Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com> Change-Id: I8f512745855b8ca21181a9ab21323bfff6aeb866 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/458884 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com> Comments-Addressed: bsgcomp <bsgcomp@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8391 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Benchmark: Arm Jenkins <bsgcomp@arm.com>
2022-10-10Use https to embed MathJax to documentationViet-Hoa Do
Partially resolves: COMPMID-5642 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: Icbea9de88d9c55494e782a08bb17aeeb2a9e592d Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8385 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2022-10-10Fix LUT-based activation layerViet-Hoa Do
* Use the window instead of the tensor shape to determine the number of elements in the x-dimension. * Remove the LUT implementation in 32-bit build. Resolves: COMPMID-5641 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I0a79aa38d8f6a105ad01785bd94571f5a2ecb348 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8380 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
2022-10-07Workaround CL compiler issue on FP16Viet-Hoa Do
Resolves: COMPMID-5600 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I5196d1639c48d0b8a116d47ed1d6c7334dc8f41e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8374 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Jakub Sujak <jakub.sujak@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>