Age | Commit message (Collapse) | Author |
|
* vfma is an extension on armv7a and it can be enabled
with -mfpu=neon-vfpv4
* Resolves MLCE-1079
Change-Id: Id455c39ee4feb8d3cdc4515c8307eb8a5d6e093b
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9795
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Move some maths-related things from Utils.h to new Math.h header in utils/math.
Move some routines used for Tensor shape validation to Validate.h
Change-Id: I8ce89fe03ec3ae1b61d1a80c282b8b91eea0cfb3
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524783
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9743
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Split some of the larger types with inlined code into their own
header files, so that the implementation of them needn't be included
everywhere.
Change-Id: Id3ec2d42efbd33cedb55705a5a24e1b90c8b7a01
Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/524782
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9757
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed the BF16 code related to the linker error for armv7a
* Resolves COMPMID-6288
Change-Id: I2dcedf5c0ba684f8e31865985899bf00b9390c9e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9736
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: <michael.tyler@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6291
Change-Id: Ibe5da8cfcf6d7fd994ddf7759efd0e773accdeb2
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9731
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6023
Change-Id: I868975d14c4f98af6716726feda22405a6a4c891
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9686
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-6151
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Change-Id: I0e8c957f3460633c32ef57be0cdc44a53b8c3e88
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9553
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Updates a64_transpose_interleave_16 transform, syncing with current
version in gemm-linux. These fixes ensure that zero padding is done on out of range columns.
Partially resolves ONCPUML-1232
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Change-Id: Ic2d30b622e4a3b0099d6f07037336a2aaaa64e13
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9550
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Adds Reorder kernel exposing blocking reorders from arm_gemm
Resolves ONCPUML-1232
Change-Id: I42bf4166311fe1771565134d3ed7039fc8e30230
Signed-off-by: David Svantesson <david.svantesson@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9500
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* If the index is out-of-bound, both CPU and GPU implementations
of the gather layer will output 0.
Resolves: COMPMID-5964
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ib029b3acfb31452f2097c8c75448fb2697cfa332
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9487
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5988
Change-Id: I93e78edf31c9eec8242ccbb8c3c768f46a7c7c38
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9456
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is required for the case where rhs (B) is dynamic and needs to be
pretransposed in every run.
In a multi-threaded setting, this means the previously single-threaded
pretranspose_B_array would become the bottleneck
Resolves COMPMID-5896
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Id508c46992188a0f76a505152931d4955d04c16d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9455
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- If input value to round has the decimal point beyond the fraction
part (23 bit) then simply return the rounded value = input value.
Resolves: COMPMID-6025
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I1994e49a9bca7daeaeec7681aec099c63a97b53f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9473
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Deprecate dynamic block shape interface
- Iterate over output window instead of input window for simpler
implementation and better performance
- Add cropping support and cropping tests
Resolves COMPMID-5918
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ifea0f5f7760ffd0f4d5d4f3a5ae8d14d0b98b790
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9378
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed namespace arm_compute::utils::requires to fix the build error
‘requires’ is a keyword in C++20 [-Wc++20-compat]
* Added missing includes for cstdint.h
* Resolves MLCE-1040
Change-Id: I08842a273a4422f8e9b10daded680f521efe26e0
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9388
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add quantized unary elementwise in CPU using LUT.
* Widen the input data range of the test suite.
- Fix CPU exponential function overflow/underflow range.
- Fix saturation issue of CL round operator.
Resolves: COMPMID-5763
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I41445de2b4a33ec6b01e0ab701516c240c852d0b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9367
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use a vector to represent the (static) block shape instead of an N-D
Tensor. The previous use of ND Tensor as block shape was wrong, not
adhering to the specification, and non-functional (only first dim was
used anyway).
* The fixture now accepts a static block shape, because the dynamic
case is not properly implemented and will be deprecated for now.
* Fix an assertion error in reference implementation.
Partially resolves COMPMID-5918
Change-Id: I5221e52ccc05e7c1249dec3a42426f954a73729a
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9357
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Omar Al Khatib <omar.alkhatib@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* This patch adds support for rounding modes in vmlaq_qasymm8_signed
which is used to compute Relu for quantized types
* Partially resolves MLCE-1018
Change-Id: I2a267b84745430e1ffe92b8bc79828a39332db18
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9354
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
threading mode.
Resolves: COMPMID-5844
Change-Id: Iceb0018114bbca2bfdac4d4406936f9b260539e9
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9070
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
|
|
Resolves: COMPMID-5966
Change-Id: Ic0d694493178da029a297643855bd0cff01b174f
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9302
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* The shape of input and indices tensors, and the gather axis
can be any number, as long as these are valid and the output
tensor doesn't have more dimensions than the library supports.
* Update the reference code to be more generic and straightforward.
* Add necessary test cases.
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Resolves: COMPMID-5919
Change-Id: Ic7e2032777aa97ecc147f61d5388528697508ab1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9199
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The SME kernels for quantized int8/uint8 GEMMs erroneously required that
maxthreads==1 before they could be selected. This resulted in them not
being available on multi-thread runs. Remove that restriction.
Resolves COMPMID-5962
Change-Id: Ia7933d0c66020b5e2981604ca97ff7ead95ec14e
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9274
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Dividing scale by number of elements causes accuracy loss due to limitations in float datatype and truncation to int
- Adds rounding after division on aarch64 to negate this.
Resolves: [COMPMID-5839]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I54ef0f7e56f39da1fa5f30378f551b5ca419a61d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/492456
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9110
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5805
Change-Id: Idf720bbb136474810086f5089c5ed23b3f79835a
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9081
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
|
|
* Resolve COMPMID-5689
Change-Id: I81a3791ad054db59562b76d1c729f2b2168aee8b
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Signed-off-by: Andrew Mundy <andrew.mundy@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Linker errors caused by the declarations of the DWC functions not matching
the functions implementation. Changed the functions declaration to
match the implementation.
* Partially resolves MLCE-996
Change-Id: Ie6458c80bc425deaa6c239828b9f4a2a6646f503
Signed-off-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9056
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-5110]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I3889a79c311b697c56d7369305c862433e856487
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8903
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This reverts commit 3c59f01c209d2732a15d97d65565ead964787a8b.
Resolves: COMPMID-5817
Change-Id: Ie2443a21854a95db1e3d0cafa2121c0187a5e237
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8974
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5805
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Change-Id: I250f64531e209625e4ff176dd5a552c1c34bc484
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8909
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed header files in arm_conv/depthwise
* Resolves MLCE-990
Change-Id: Iacddd80e2d83ff0fbafb817014f90c5bc80dab3c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8946
Reviewed-by: Andrew Mundy <Andrew.mundy@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Removed BF16 validation tests for DepthConvert
* Revert back to using inline assembly to convert to/from BF16
* Resolves COMPMID-5800
Change-Id: I803b2ad19ead297417f780c97c5b724cca6b394c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8929
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I85731bb688864a29b95adc729083e0c8e2ab61f8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8885
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves: COMPMID-5794
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I275d0401be978e86507990bdb7dc5b1538a108d8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8884
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: [COMPMID-5466]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I68af0bb54580bebd2ace1fba30aa73f7f68a4dbb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8804
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5664
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ica2fd82645d95bd64226a1950a013d8a9b9035eb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8833
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fixes various mismatches when converting FP32 to BF16 and
BF16 to FP32
* Fixed segfault when trying logging=1 and trying to log BF16
* Resolves MLCE-979
Change-Id: Ie517d0b7411b4e3a7fecdee588f0e073d290625a
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8830
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5664
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I4182752e213aade19005ee984a488c2490453f8f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8747
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add SME/SME2 detection.
* Integrate SME2 implementation for:
- Normal convolution
- Winograd
- Depthwise convolution
- Pooling
Resolves: COMPMID-5700
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I2f1ca1d05f8cfeee9309ed1c0a36096a4a6aad5c
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8692
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-5698]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I21ce976473e0e8807c14989e98e68aca69c7f1f3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8603
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves : [COMPMID-5461]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I89b99d267c32b00ef44f9bb6e7c714dfe4a0d29d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8420
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use the same rounding function for the left-over part with the
vectorized part.
Resolves: COMPMID-5640
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I07450b2a43390b77539b78cd5d3e6772bdc38548
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8520
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Use rational approximation with optimised coefficients
to calculate tanh_f16. Method is ~2.5x faster than previous
and has lower relative and absolute error.
This will fix https://github.com/ARM-software/ComputeLibrary/issues/1002
Credit to George Steed for suggesting use of vcageq instead of min+max
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Change-Id: Id70da3aab666c68b0d798266a837b59c00937bf7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8480
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Use fixed-point arithmetic where possible.
* Various optimization for the FP32-based implementation.
This implementation is kept as the fall-back solution
in case of unrealistic quantization parameters that exceed
the range of fixed-point solution.
Resolves: COMPMID-5458
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I221d2d3801ecaae4fe0b7cf6ae8ef00ca3743665
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8317
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves MLCE-924
Change-Id: I3cc3d30893c2ee0865eacafdc1d9ba3d5b876d32
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8326
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
SVE merges for interleaved kernels were not guarding bias reads with the
correct predicates, leading to overreads and crashes in some cases. Fix to
use the appropriate predicate.
Resolves: COMPMID-5627
Change-Id: Ib049531c4a3bea56e90623b7b9f0d6a7ab4db2c8
Signed-off-by: David Mansell <David.Mansell@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8315
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC.
This patch
- Reduces the memory footprint by disabling precomputation of indices and weights when they're not used
- Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8)
- Adds S8(Int8) Bilinear Scale for Border mode REPLICATE
- Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation
- Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset
Resolves: COMPMID-5453, COMPMID-5454
Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
OpenCL implementation uses built in erf.
NEON implementation requires new vectorized erf.
Uses the following approximation:
erf(x) = 1 - 1 / (1 + a1x + a2x^2 + a3x^3 + a4x^4)^4
a1 = 0.278393, a2 = 0.230389, a3 = 0.000972, a4 = 0.078108
From https://en.wikipedia.org/wiki/Error_function#Numerical_approximations
Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: I2d3964b2c26a4334166b17135f9104bc6324fad2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7921
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implements LayerNorm for qasymm8 tensors.
Uses uint8x16 loads and stores.
Summation is performed in integer arithmetic (vpaddl)
Normalization is performed in float32 before requantizing back to int8.
Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: I2407c8b34717fb47adab98791bd76fb8a3c62f4a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7922
Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* For 3x3 kernel, only choose the implementation with larger tile
size if the input tensor is larger than the tile.
Resolves: COMPMID-5467
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I2cf95ddb25f477cb05da3b3501e0afe9548fc33a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8022
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Minor tweaks and test for running fixed format kernels with BF16
operations when specified by the user.
Change-Id: Ic8167f67b86b1298da65e46cfebed9f3b86940e4
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8000
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|