Age | Commit message (Collapse) | Author |
|
matrix of GEMM/GEMMLowp
Change-Id: I8c5fd4c8bcdffda1522c83158981ed92baa045f4
Reviewed-on: https://review.mlplatform.org/364
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I5a9d698f459fd7efdbe85d70d092e79e961cb4dd
Reviewed-on: https://review.mlplatform.org/368
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I10979793e22296f9ab091405509d809fd692ab89
Reviewed-on: https://review.mlplatform.org/365
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Matthew Bentham <matthew.bentham@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Change-Id: I26c00ed1941a20fb02fac7d6c03f06adf87fee4d
Reviewed-on: https://review.mlplatform.org/363
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Change-Id: I14cea30ffa9ca735941b559bb272b8c476814a34
Reviewed-on: https://review.mlplatform.org/338
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <Anthony.barbier@arm.com>
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
|
|
Change-Id: I827b26239043a9e90d26c2583122648d2a45303a
Reviewed-on: https://review.mlplatform.org/317
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Id0d4a07af24e2331161996083b0c1bab072bd405
Reviewed-on: https://review.mlplatform.org/322
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I9e6e43a5839d04c2e4b4552c05446efb0a5074cf
Reviewed-on: https://review.mlplatform.org/232
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I13f6e4c600f39355f69e015409bf30dafdc5e3aa
Reviewed-on: https://review.mlplatform.org/332
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
Change-Id: Ice653e48211053bd3cd20a693bd76de6b4efc370
Reviewed-on: https://review.mlplatform.org/270
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I7eae2e55cc0b0b7bbebb7617299daaca6f75f40c
Reviewed-on: https://review.mlplatform.org/292
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Adds support for Equal,NotEqual,Less,LessEqual,Greater,GreaterEqual
Change-Id: If0cdf4aae7f95c94709b195eee485f6663f45909
|
|
Change-Id: I2a18f0acea382960a8bc71a8f56928a5998f0dd6
|
|
Change-Id: I49b2e8b4200c9ed654736d9451e4ab9c073b4b10
|
|
Change-Id: I29e35024e29781a6b943b568abec9c73649215e6
|
|
Change-Id: I6ee2c0b670727fc808fa636c53ddfaec3a0036c9
|
|
Change-Id: I49f1d865f5e7562f1d80db849353a89ef77e6a9e
|
|
Output of Priorbox should be independent of the input
data layout and should always be in NCHW format
Change-Id: Ie80cd4e51c78945b158c0db1af1923bdf8d7ea7b
|
|
Change-Id: I95fdf5bd85becfe081f6ae587284f3b294681308
|
|
-Simplifies import memory interface
-Changes the used of void** handles with appropriate interfaces.
Change-Id: I5918c855c11f46352058864623336b352162a4b7
|
|
FP mixed precision support added to GEMM kernel used for fp16 winograd conv on Midgard GPUs
Change-Id: I1619beb025fc484a1ac9d3e528d785edabbc7ee6
|
|
Change-Id: I770b044b67d93510ef65e556905135b34be7ea0a
|
|
kernels
Change-Id: I98183f95814442b6f3dbb67a1bdae99df05b9b01
|
|
- Fixing a bug for which we did not scale the boxes before transforming them
- Adding the correct_transform_coords option to BoundingBoxTransformInfo
Change-Id: I40281254bcf87e7c8583c119e99562414fe59822
|
|
BoxWithNMSLimitKernel
COMPMID-1792: Accuracy issue in CLGenerateProposals
This patch does the following:
- Some fixes for GenerateProposals function and tests
- Adapting BoxWithNMSLimitKernel to only accept U32 tensors as keeps_size
- Update 3rdparty
- Adds a small tolerance for a GenerateProposals test
Change-Id: Ia8ec1cdfe941fe05003645e86deb9ea6a6044d74
|
|
Introduced F32 accumulation for F16 winograd gemm and output transform
WinogradConvolution will be available for F16 only if fast math flag is enabled
Change-Id: I215593c205236a0f9669218437bb40b184ec6a4f
|
|
With this patch we are able to dispatch a single GPU job also in case of
batched-flatten
Change-Id: I755e7af29d44b24f67fa04bad3c9b7646e8deefc
|
|
Change-Id: I2c2250669829e399fdc2363f729dc5e68d8aac17
|
|
Change-Id: I69e995973597ba3927d29e4f6ed5438560e53d77
|
|
Change-Id: Ib0798cc17496b7817f5b5769b25d98913a33a69d
|
|
Change-Id: I5bf5d751ec7c02d96c26a769f49d03ea23a248b7
|
|
Change-Id: Ie13a9eb6d417388b5de533bffa895796d9d2cf62
|
|
Change-Id: Ibab049f09413258c99335b7da6b151530a1bd136
|
|
and 8 tensors (Part 1)
Creating special cases for concatening 2 and 4 tensors.
Change-Id: I6a739a494ae45011acb65369e353f9ef96970b90
|
|
Removed gemmlowp_mm_bifrost_transposed_dot8 kernel as not used
Change-Id: I43cf463a3a4c0cdb2808621c534ffd5c9fd47ca1
|
|
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
Since we perform an element-wise operation, it is not necessary to pass the output_depth3d.
Change-Id: Ibfa07a0706e902acf59b444aa61e18a348162ea9
|
|
The issue was related to CLIm2Col when the number of input channels was less than
the number of elements processed by each thread.
The bug has been fixed in the validate_and_configure_window() function setting the correct number of elements accessed
in the output tensor.
Also fixed an issue GEMM3D when we have a single output channel
Change-Id: I094292d0c7662599c4a4c3916ec5f5821df5faef
|
|
Change-Id: I807ef84dbf893bd401dcac5c0fa3a4ee49aabc66
|
|
CLDepthWiseConvolutionLayer3x3Kernel
Change-Id: Ie274da79b15c03f86dfedc85bb721b3de34a0bb4
|
|
Commit 16121924 `COMPMID-1673: Collapse window in CLArithmeticAddition when one
operand is a vector` changed the number of elements processed per iteration to
8, but didn't update the quantized kernel to reflect that.
Change-Id: I49a2fbcee81f5bbc1b210b4a5c6d63b94eafdcec
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/156355
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Also added the test case reported by ArmNN.
Change-Id: I9fe9a1b4f74267a3346529f3a597b37486593c4a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155914
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
batches available.
Change-Id: Iad83df2a9116a7f350de83ec59b28cd8893c8d3a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155716
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I76e57af6608b55b6f59a5d06aecc30063ee4c3cc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155733
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
OpenCL
COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride
With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 %
Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride
but I have not seen any benefit (maybe because we have few arithemtic operation and we
do not have more load instructions). However Depthwise convolution has been improved by
30%
Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I051748502ca24b9952e7313524bbfd708162efb4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155166
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
COMPMID-1690: Add tests for NEPermute with PermutationVector dimension > 3
Change-Id: I4bfc6ff88cd46863c2e39975b5663c624db1a63d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155316
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: I9cb725a8052091469904ecc7cfffa4add9914ffb
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155261
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
vector
When one of the operands is a vector, the kernel does a broadcast addition and
the window is not collapsed. This represent an issue because it leads to a lot
of enqueues that increases the time taken by the OpenCL driver. This patch
allows to collapse the window when one of the two operands is a vector.
Furthermore, it adds LWS tuner to the kernel.
It also changes the number of elements processed per iteration to 8 to make
better usage of the cache.
Change-Id: I5f09ab0ddcffb3b7f9326a987c79a997b2d7fa8c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155003
Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: Ic8312a5b6790aa7cd4468d42f08d557ad40e9441
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154570
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: Id964d9068e18aaa13ab8adcbf7a9375b034ea6c3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154651
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|