Age | Commit message (Collapse) | Author |
|
Resolves: COMPMID-5945, COMPMID-5954
Change-Id: I7b27021d21f8e08c4896f6b1f595a75125064f9e
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9356
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5917
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I073067b490f2a1b96b81a037ea431c9a2e5c7503
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9322
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement opencl kernel for LHS transposed and RHS non-transposed
- Implement opencl kernel for LHS transposed and RHS transposed
- Add validation tests
Resolves: COMPMID-5953, COMPMID-5955
Change-Id: I55589acbffe86c44e29807574975978a1ec09bad
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9345
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5863
Change-Id: I9ff67face62826c1d335a6b941e8516be39bdac8
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/488768
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9225
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement ClNativeMatMulKernel class
- Implement opencl kernel for LHS non-transposed and RHS non-transposed
- Implement opencl kernel for LHS non-transposed and RHS transposed
- Add test fixture and dataset for matmul
- Implement transpose_tensor() for reference implementation to transpose high dimensional tensors
Resolves: COMPMID-5944, COMPMID-5951
Co-authored-by: Gunes Bayir <gunes.bayir@arm.com>
Co-authored-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I1d5b8978f41be27baddb3153ade880472141573f
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9333
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves COMPMID-5918, COMPMID-5865
Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ib3b01e7dc1c944184a4c038045bf0469fbb9ff45
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9321
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add sigmoid and tanh activation functions for dynamic fusion.
* Add corresponding tests, but both activation functions share
the same fixture implementation.
Resolves: COMPMID-5939
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I0aae0eaa18b746ce89680d2773c66e09b0f854ce
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9257
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add a max pooling implementation that returns kernel indices.
- Add a parameter in pooling info object to pick kernel indices impl.
- Add validation tests.
Resolves: [ONCPUML-1187]
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I485ef1604f676ee14d5f7f62d33699e49c38e4d3
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9192
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Dividing scale by number of elements causes accuracy loss due to limitations in float datatype and truncation to int
- Adds rounding after division on aarch64 to negate this.
Resolves: [COMPMID-5839]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I54ef0f7e56f39da1fa5f30378f551b5ca419a61d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/492456
Tested-by: bsgcomp <bsgcomp@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9110
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Fast math is enabled unexpectedly by convolution layer tests.
Resolves: COMPMID-5843
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: Ib3bc36d3f9070dbfb2c76146eecbb1ce0ee90626
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9137
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This is a fused operator that merges Add + Mul + Add [+ Relu-based-Activation] layers and have an intermediate output after the first Add. It's supported for FP16/32/QASYMM8/QASYMM8_SIGNED data types.
The subsequent Add and Mul are intended for scaling and the coefficients only have one dimension (per channel).
The inputs are
- input1 : nD tensor [X, Y, Z, W, ..]
- input2 : nD tensor [X, Y, Z, W, ..]
- add_coef : 1D tensor [X]
- mul_coef : 1D tensor [X]
The outputs are
- out1 : nD tensor (intermediate output) [X, Y, Z, W, ..]
- out2 : nD tensor (final output) [X, Y, Z, W, ..]
The operation can be summarized as follows:
out1 <- input1 + input2
out2 <- Act(out1 * mul_coef + add_coef)
The activation function can be Identity, Relu, Bounded Relu or Lower/Upper Bounded Relu. The intermediate output can be skipped by providing a nullptr.
The reason of providing this operator is to be able to fuse in case of Residual network patterns and save computations by reducing memory back and forward.
Resolves: COMPMID-5463
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8ef577aa623b036e9a9f655cc088493fd19a6109
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9055
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Remove hack in CpuGemmAssemblyDispatch.cpp which tried to guess
strides for fixed format kernels. Instead, expect that strides will
have been correctly set on weights externally
- Update fixed format test fixtures to set the strides
- If the fixed format uses fast math mode, then weights should be of
type BFLOAT16. Change the validation logic to accept this.
Resolves: [ONCPUML-1131]
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I0f18d8b86b0f639be25fd122fa06a591e90645f2
Signed-off-by: Jonathan Deakin <jonathan.deakin@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8985
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Note: we use a separate test fixture for Multiplication op instead of reusing ElementwiseBinaryFixture to avoid exposing the internal enum ElementwiseOp to the public utils/TypePrinters.h as required by the data test case macros to print the test data. We also do not consider modifying the enum ArithmeticOp in the standard interface to include MUL without an implementation. Future work should consider refactoring this test fixture into the ElementwiseBinaryFixture to reduce the total number of fixtures/code duplication.
Resolves: COMPMID-5779
Change-Id: I84207658ce0407095b028fca0ab7bfa2950255ec
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9013
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Return aux tensorInfo by get_aux_tensors() at runtime to init the aux
tensor with the right size.
- Keep softmax unfusable for this commit
- Hence, added Tensor3D to template writer arguments declaration, for sake of
keeping dynamic fusion softmax componenets' kernels matching their cl
counterparts.
Resolves: COMPMID-5523
Change-Id: I667f39545db925f667036ef448302c79a0330373
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/483924
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8986
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The new dynamic fusion API is introduced in the following patch:
https://review.mlplatform.org/c/ml/ComputeLibrary/+/8906
For each operator (except Conv2D, which is migrated in the above patch), we
- remove destination tensor from is_supported, validate and create calls
- make create_op return ITensorInfo* to the intermediate destination object
Affected operators:
- DepthwiseConv2D
- Cast
- Elementwise Ops
- Clamp
- Reshape
- Resize
Resolves: COMPMID-5777
Change-Id: Ib60ec8a5f081752808455d7a7d790f2ed0627059
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8991
Reviewed-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Dynamic-Fusion: Ramy Elgammal <ramy.elgammal@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Add direct conv2d tests as a separate fixture so that we can enable
future direct conv2d specific tests
* Move Conv2dAttributes to its own file
Partially resolves COMPMID-5736
Change-Id: I530649488faf3bbed1a4fc7d16a74063bfdf33db
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8928
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds Dynamic fusion PoolingLayer2D as Unfusable Operator
- Indices are not supported
- Adds tests for F32/F16 Datatypes
Resolves : [COMPMID-5520]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I0d112545eb9209c836bf9ea153069f8627531e0a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8893
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5522
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: If4e5736a2f7ff42e70276d7f4e0f3ebcb38414e6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8881
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Intermediate tensor info objects are not created by the user anymore. They're returned from create_op and reused. This will prevent allocation of the intermediate tensors in case of possible interface misuse.
- Sketch object handles intermediate tensor info pointers inside its implementation class via a unique pointer vector
- Conv2d operator is migrated into the new interface
Resolves: COMPMID-5776
Change-Id: I9422e3681eef4f2d2922f6d0a5d7786380837c6d
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8906
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5521
Change-Id: Id38a4ce18f9ea8805a151acb064e72795535d1a0
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8859
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* The output of the fused operator must be explicitly specified
using GpuOutput operator.
* Any temporary tensors used to connect the output of an operator
to the input of another operator will be marked as no-alloc
and won't be allocated as a tensor in the memory.
Resolves: COMPMID-5771
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I5ae8e800f8f737db23a055a92b01c4f1d78c3bb8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8794
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Add the CLAMP activation function for GPU backend with generic activation Component and TemplateWriter modules.
CLAMP is internally implemented as LU_BOUNDED_RELU activation function with the alpha and beta variables swapped.
We do NOT consider in-place computation cases in this patch.
* CLAMP operator for GPU backend.
* Activation Component and TemplateWriter for CL backend.
* TemplateWriter generates tiled kernel code.
* Supported data types: F16, F32.
* Validation tests for CLAMP operation.
Resolves: COMPMID-5519
Change-Id: Ieb097d6b1e6a7ed2b882518e88314454efb402f6
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8762
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
The operator is migrated into dynamic fusion for all data types supported
Resolves: COMPMID-5693
Change-Id: I3c550d3d1cd04570f453beae678c3f60d4cb1a73
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8755
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Provide support for Add operator
- Auto initialize the destination tensor before testing fusion in conv2d
and elementwise binary ops.
Resolves: COMPMID-5518
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ibd815020f02b57f88eea7c2921bdcf98605d99c5
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8617
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch adds Depthwise Conv2d operator into dynamic fusion interface and adds the associated tests.
Resolves: COMPMID-5517
Change-Id: I385c94dff7fd40c72b8337ef797e508df4499a82
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8678
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Implement kernel (ClIndirectConv2dAddressPrecalculationKernel)
- Implement OpenCL kernel (indirect_convolution.cl)
- Add test
Resolves COMPMID-5708
Change-Id: If7408e37cbc6f9ad8506ff3334bc574e5d6763fb
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8661
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: ARMCL-592
Change-Id: If036a2b6d2842db2ed750e9d350bfaa3f7a76c9e
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8668
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Fix includes int8/uint8 quantized inputs
- Bias S32 value is limited to better allow detection of mismatches in gemmlowp kernel
Resolves: [COMPMID-5659]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ie9cca430c6ab66253fe1d5252bd2c5396c7f38cf
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8514
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves: COMPMID-5511
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I0ac0acbf1de7da09f18f7b457307ec3cc99deb3b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8546
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Check whether weights are defined as constant, if they are not constant then do not pack them if they are already packed so that they can be updated.
Signed-off-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I73447e31e3660b05f8f40e04ea4ea2003eb9b802
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8539
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
* Do not change the tensor info after configure stage.
* By fixing this, the 1D optimization for activation layer can be
applied to all data types and tensor layout.
Resolves: COMPMID-5644
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I557f9bb84e5e456c28d6b423584887d7a3648ad4
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8470
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Mohmun02 <MohammedSuhail.Munshi@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
Resolves COMPMID-5632
Change-Id: I2bdbe69a610ca2510fbd74d5d412842679299762
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8365
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
multiplication with variable input tensors
Resolves: COMPMID-5506
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I8345a3b7a83ef46f9ec7a77197cc65c933ec9ac6
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8239
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC.
This patch
- Reduces the memory footprint by disabling precomputation of indices and weights when they're not used
- Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8)
- Adds S8(Int8) Bilinear Scale for Border mode REPLICATE
- Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation
- Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset
Resolves: COMPMID-5453, COMPMID-5454
Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Implements LayerNorm for qasymm8 tensors.
Uses uint8x16 loads and stores.
Summation is performed in integer arithmetic (vpaddl)
Normalization is performed in float32 before requantizing back to int8.
Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: I2407c8b34717fb47adab98791bd76fb8a3c62f4a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7922
Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
input tensors
- Add a test for CPU to batched matrix multiplication with variable input tensors
- Disable assembly kernel when using _reshape_b_only_on_first_run flag
Resolves COMPMID-5501
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: If96b182584617806a9dfe597dbfaf05241b123c2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8234
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
input tensors
Resolves : [COMPMID-5502]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Ida001dc597973f9180468737a3e32e5022e6baee
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/450342
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8224
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Minor tweaks and test for running fixed format kernels with BF16
operations when specified by the user.
Change-Id: Ic8167f67b86b1298da65e46cfebed9f3b86940e4
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8000
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
- Added arm_compute::WeightFormat and converted to/from arm_gemm::WeightFormat
when needed through two map function.
- Moved to_string(WeightFormat) to TypePrinter.h
Resolves: COMPMID-5415
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: I65f7942100bcd4dbf2c5cf6c07f26c8e1e3bf86e
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/438511
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Sicong Li <sicong.li@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7985
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces a GEMMLowp routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615
Resolves: COMPMID-5398
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I8d06453645688f3658b6c7c06f1ebc25a2505661
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7932
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
API changes for NEGEMMConvolutionLayer and CpuGemmConv2d
Built with:
scons neon=1 opencl=0 os=linux arch=armv8.2-a multi_isa=1 \
build=native -j32 Werror=false validation_tests=1 build_dir=opt \
standalone=1 asserts=1 experimental_fixed_format_kernels=1 .
Tested with:
./build/opt/tests/arm_compute_validation
Hardware where the test executable was run:
Neoverse N1
Test coverage:
* NEGEMMConvolutionLayer, CpuGemmConv2d
* NHWC (the only one supported by the fixed-format kernels)
* F16, F32
* Shapes: RunSmall
Change-Id: I4fd3e495a7cbf61210ea02d37440ba9652934e99
Signed-off-by: Francesco Petrogalli <francesco.petrogalli@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7632
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
This patch introduces a GEMM routine that is optimized for Arm(R) Mali(TM)-G715 and Arm(R) Mali(TM)-G615
Resolves: COMPMID-5216
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Change-Id: I2e5d7806f5904347185bb3e250f73d73d6669dba
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7914
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
|
|
Partially resolves MLCE-739
Change-Id: Ice06a96d6a8a26b31e334ba4e697cd41d352b026
Signed-off-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7364
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ie65f9096a75610dc20ffbb25dc43fd2f632f2f03
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7530
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Added values to supports transpose operations prior to gemm.
Resolves: COMPMID-5072
Change-Id: Ia8bc39b3ded8a507c2314a4926f1f7809da03649
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7485
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Extends GEMMInfo class to include flags for transposing A and B.
- Extends GEMMLowp fixtrues to have an option for transposing A and B.
Resolves COMPMID-5075
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: If5e4b7e2b7b19ca30808a78a9641d8ba3f176a26
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7458
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
- Duplicate fixture in Pooling3dLayerFixture causes build errors
Resolves: COMPMID-4669
Change-Id: I5384941129feae6f90a18dce10677043e0e54718
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7419
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Adds Qasymm8 and Qasymm8_signed support to the 3d pool operator
Resolves: COMPMID-4669
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: I36038c2b7c4f36baf67f7aae801356890e104538
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/410496
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Sheri Zhang <sheri.zhang@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7391
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
|
|
- Add implementation for the CPU pooling 3d layer.
- NDHWC data layout support.
- Support QASYMM8/QASYMM8_SIGNED.
- Add Pooling helper file for Pool3d/2d common functions.
Resolves COMPMID-4668
Change-Id: Iadf042036b076099c2353d6e2fe9fc623bc263d8
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7387
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
- For NDHWC layout
- For F16 and F32 data types
- Mixed Precision stil not supported
Resolves: COMPMID-4670
Signed-off-by: ramy.elgammal@arm.com
Change-Id: I0e14a13e4625569e8e5ee67e6033bd1efe0da469
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7262
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|