Age | Commit message (Collapse) | Author |
|
Resolves: COMPMID-5917
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves COMPMID-5918, COMPMID-5865
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ib3b01e7dc1c944184a4c038045bf0469fbb9ff45
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9321
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This is so that we can leverage fixed format kernel when
using gemm convolution method.
Partially resolves: [ONCPUML-1129]
Change-Id: I61ffa74f5cd9d75579dbc1f9aa187371f855e932
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9248
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Skip upsampling step for deconvolution when input strides are 1
regardless of kernel size. This is achieved by setting correct
paddings for unit strides convolution.
Resolve: [ONCPUML-1183]
Change-Id: Ief88f9fe30f6f56d3358e3cf6a506ab8b5691f18
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9134
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types.
The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel).
The inputs are
- input1 : nD tensor [X, Y, Z, W, ..]
- input2 : nD tensor [X, Y, Z, W, ..]
- add_coef : 1D tensor [X]
- mul_coef : 1D tensor [X]
The outputs are
- out1 : nD tensor (intermediate output) [X, Y, Z, W, ..]
- out2 : nD tensor (final output) [X, Y, Z, W, ..]
The operation can be summarized as follows:
out1 <- input1 + input2
out2 <- Act(out1 * mul_coef + add_coef)
The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr.
The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward.
Resolves: COMPMID-5463
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove hack in CpuGemmAssemblyDispatch.cpp which tried to guess
strides for fixed format kernels. Instead, expect that strides will
have been correctly set on weights externally
- Update fixed format test fixtures to set the strides
- If the fixed format uses fast math mode, then weights should be of
type BFLOAT16. Change the validation logic to accept this.
Resolves: [ONCPUML-1131]
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I0f18d8b86b0f639be25fd122fa06a591e90645f2
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8985
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
If the input tensor's stride is 1 and the kernel size is 1x1,
skip upsampling step and pass the input tensor pointer for
convolution directly.
Partially resolve: [ONCPUML-1137]
Change-Id: I9de9444ff99cf35d44a51ccbe0fa6facc1035d27
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8994
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: [ONCPUML-1128]
Signed-off-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
Change-Id: I287a71222d3f0289d8cccfcb15383b0a930a55e6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8952
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: [COMPMID-5466]
Signed-off-by: Omar Al Khatib <omar.alkhatib@arm.com>
Change-Id: I68af0bb54580bebd2ace1fba30aa73f7f68a4dbb
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8804
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC.
This patch
- Reduces the memory footprint by disabling precomputation of indices and weights when they're not used
- Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8)
- Adds S8(Int8) Bilinear Scale for Border mode REPLICATE
- Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation
- Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset
Resolves: COMPMID-5453, COMPMID-5454
Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch removes index and weight pre-computations where it's not used and reduces some calculations inside the inner-most loop of Scale.
Resolves: COMPMID-5452
Change-Id: Ie149b1b76a90a8cb659ada0f97aef78caf69932f
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8220
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Implements required plumbing in order to be able to ask and execute
fixed format kernels from NEFullyConnected, NEGEMM and NEGEMMConv2d.
These APIs are used to accelerate oneDNN primitives (inner product, matrix
multiplication and indirect GEMM respectively) and without changes it
would not be possible to call fixed format kernels from those oneDNN
primitives.
Change-Id: I27534f0491ce28d0ccb98c19f318bd33dcdf2ff5
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7999
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat
when needed through two map function.
- Moved to_string(WeightFormat) to TypePrinter.h
Resolves: COMPMID-5415
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Sicong Li <sicong.li@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
API changes for NEGEMMConvolutionLayer and CpuGemmConv2d
Built with:
scons neon=1 opencl=0 os=linux arch=armv8.2-a multi_isa=1 \
build=native -j32 Werror=false validation_tests=1 build_dir=opt \
standalone=1 asserts=1 experimental_fixed_format_kernels=1 .
Tested with:
./build/opt/tests/arm_compute_validation
Hardware where the test executable was run:
Neoverse N1
Test coverage:
* NEGEMMConvolutionLayer, CpuGemmConv2d
* NHWC (the only one supported by the fixed-format kernels)
* F16, F32
* Shapes: RunSmall
Change-Id: I4fd3e495a7cbf61210ea02d37440ba9652934e99
Signed-off-by: Francesco Petrogalli <francesco.petrogalli@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7632
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5400
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ib4428436dd7a6e40d8b2d8a2f8dac1b079154551
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7894
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves MLCE-604
Signed-off-by: Mike Kelly <mike.kelly@arm.com>
Change-Id: Ice3d6f361588f1a6bd0bff301c27b0d063a5c014
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7529
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* When input_to_forget_weights is QASYMM8_SIGNED, the conversion
to QSYMM8 is done in the prepare method
* Partially resolves MLCE-604
Change-Id: Iddadbc50d77381542451ac4e46de49b2706bc88c
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7441
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add implementation for the CPU pooling 3d layer.
- NDHWC data layout support
- Support FP32/FP16.
- Add Pool3d to the operator list.
- Fix CL Pool3d kernel comments to generate the operator list.
Resolves: COMPMID-4671
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I92478a154beb12541525b648ed3dd5a58c8f27fa
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7311
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
(cherry picked from commit 572659a0e5dd1086b1c7d16fe331ff73d2acd93a)
|
|
* QLSTM only supports QSYMM8 for the argument input_to_forget_weights
* We add support for QASYMM8_SIGNED by dequantizing and requantizing to QSYMM8
* Resolves COMPMID-5184
Change-Id: I1cae18d81dafdb7ae722b520a1354cf4a56b9606
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7321
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
(cherry picked from commit 187a041dedf8e9db0c9e0652f13f8639dca880f3)
|
|
Resolves COMPMID-5185
Change-Id: I61e1453e8851ab84c1cadc10587ebd23fd94799e
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7330
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fixed hardcoded LOGISTIC activation in ACL reference
* Partially resolves MLCE-60
* Resolves COMPMID-5139
Change-Id: I50e75339084ea53bf75acf18aa3e5cdafcf34c15
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7150
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-4958
Change-Id: Ibed5155f2e3ece46635f6ea9617bf11cefc402b1
Signed-off-by: Dana Zlotnik <dana.zlotnik@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7028
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove CLRemapKernel.
- Remove NERemapKernel.
Partially resolves COMPMID-4984
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ia61f9ac7447695d81178701cf0e9b7625a91eccc
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7056
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Resolves COMPMID-4884
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Change-Id: Id28f0df242fe240c70f22e3ad55e4729ab1e40fe
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6641
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Decouple data support of CpuDirectConv3dKernel
- Update documentation for Conv3d
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: I1d94aa28f821f45a1a3d39cc3335c8faeee89f0d
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6453
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add CpuDirectConv3d support for fp32 and fp16
* Dilation is not supported
* Need decouple
Partially resolve: COMPMID-4661
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: Ib1865b9ff328b684d131512b1baf77bc2f10318f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6430
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
|
|
- Moving impl of CPPSplit template to src/runtime/CPP to allow
including of Log.h from src/common.
- Fix logging of vector<ITensor*> to print contained tensor's info not their ptrs.
Partially-Resovles: COMPMID-4718
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Idec81665b2a7c0cfae5248803109c6e2edc520a1
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6362
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially Resolves: COMPMID-4718
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I655268c57fa126d9c99981c49d345a3aac75646e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6286
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
|
|
Legacy structure contained two libraries core/runtime with two backends
in each.
We reduce the core/runtime libraries to a single library thus merging
the backend files
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I69545765fe7a730368105cdbd067d3135ec7a174
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6155
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Renaming the gemm-based convolution operators to accomodate for new
operators with higher convolution dimensonality
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Id2f2cf11404221f0e87baa0e5d08ad5d63eaf78e
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6113
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: I0b59c5326f5fcbc322fbeb864197ea999de6bd56
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6112
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-4769
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Iccadcbd68b0fd84ed3bf212e358a4ea944084a40
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/349845
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6107
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-4763
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Iae2e093cfb7d2c7172603897afe1c6a2e5d1caa3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/349725
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6101
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
|
|
Execution pack of CpuFullyConnected was altered explicitly with local
objects that were getting out of scope. Leading to incorrect results or
memory related issues.
Track transformed weights and register the weights matrix explicitly
during execution honoring the object lifetime scope.
Resolves: COMPMID-4762, COMPMID-4764
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I53449c377fb1cfccdf5e6f9505d963518748c318
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/349345
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6092
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Issue caused by the porting of the function to the new API. The method
will call down to the new CpuConv2d implementation.
Change-Id: I650ad1f17c8b89a637b589e452ca785b5d14e975
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6027
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
|
|
Resolves: COMPMID-4507
Change-Id: I9557026ec0052b5585994f7a1300a14565c976d0
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5964
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Binary size reduction for this kernel is almost 50%.
Also remove unused NEConvertFullyConnectedWeightsManaged.
Change-Id: Ia46a1342a0737397b4aac2578d963c2ebb7446e3
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6011
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-4501
Change-Id: Ib61b3d06974009e501b3fb86467735427e13a94a
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5931
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Resolves: COMPMID-4641
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I7ccc663b2692d40c370794caa906b5be8fd25a32
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5977
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Introduce Fp32 kernels with internal calculations in Bfloat16 when
fast_mode is enabled
- Improve kernel selection heuristics
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I68a9e7e862b6fd2721b46e0d7cc791091c4ab279
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5965
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
ClWinogradConv2d was performing Rhs transformation on every step
impacting the performance.
Adds scope logging support through ARM_COMPUTE_LOG_MSG_WITH_FUNCNAME
Resolves: COMPMID-4596
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: Ib329d3bc8d8aa21abae9fabfe61de35cc84d4819
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5925
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* We prefer GEMM method for kernel size > 15 when data layout is NCHW because
direct convolution does not support this.
* Resolves COMPMID-4581
Change-Id: Ie18cc96bc6b446fd59e8c8ebb10c3af5ca02c3bb
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5935
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Details:
port NEWeightsReshapeKernel to CpuWeightsReshapeKernel
port NEGEMMConvolutionLayer to CpuGEMMConvolutionLayer
Resolves: COMPMID-4509
Change-Id: I3c7051e2c3f6d808a7ccb898aad70e5b221b9dc3
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5938
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Rename to CpuWinogradConv2d
Allow memory to be injected externally
Change-Id: I1f0a26ea533e326a7c63df86e708895c31752a39
Signed-off-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5926
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Resolves: COMPMID-4511
Change-Id: Id6335cb23ef22bba02083498025da0ecb1647714
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5898
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Details:
Extend NEConvertQuantizedSignednessKernel
Port NEGEMMInterleave4x4Kernel to CpuGemmInterleave4x4Kernel
Port NEGEMMTranspose1xWKernel to CpuGemmTranspose1xWKernel
Port NEGEMMLowpMatrixAReductionKernel to CpuGemmLowpMatrixAReductionKernel
Port NEGEMMLowpMatrixBReductionKernel to CpuGemmLowpMatrixBReductionKernel
Port NEGEMMLowpOffsetContributionOutputStageKernel to CpuGemmLowpOffsetContributionOutputStageKernel
Port NEGEMMLowpOffsetContributionKernel to CpuGemmLowpOffsetContributionKernel
Resolves: COMPMID-4403
Change-Id: I3227f052f25e7b41d073bbea1da8a881fcd78b8e
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5875
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Resolves: COMPMID-4510
Change-Id: Ia3e588f599449d975dabad4afafb2974dd44d0ad
Signed-off-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5899
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Complete porting of NEGEMM to the new API
Resolves: COMPMID-4402
Change-Id: I14904102b25332dbb4fc048d45dca068a15b6eca
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5890
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Port NEGEMMMatrixMultiplyKernel to the new API
Partially resolves: COMPMID-4402
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Change-Id: I52b67055dc24bb3a417d6ec5aeeee86e21b74320
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5873
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Start porting NEGEMM to the new API
- Port NEGEMMInterleave4x4Kernel to the new API
- Port NEGEMMMatrixAdditionKernel to the new API
- Port NEGEMMTranspose1xWKernel to the new API
- Remove padding from NEGEMMMatrixAdditionKernel
- Remove unused INESimpleKernel and ICPPSimpleKernel
Partially resolves: COMPMID-4402
Change-Id: I63edadddfe00a54586e5384d6a0211db25ae9042
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5857
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|