Age | Commit message (Collapse) | Author |
|
Change-Id: Ib0798cc17496b7817f5b5769b25d98913a33a69d
|
|
Change-Id: I5bf5d751ec7c02d96c26a769f49d03ea23a248b7
|
|
Change-Id: Ie13a9eb6d417388b5de533bffa895796d9d2cf62
|
|
Change-Id: Ibab049f09413258c99335b7da6b151530a1bd136
|
|
and 8 tensors (Part 1)
Creating special cases for concatening 2 and 4 tensors.
Change-Id: I6a739a494ae45011acb65369e353f9ef96970b90
|
|
NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
Change-Id: I1d5bc4d24059917f9ddef0873dd3043b1f2320a8
|
|
Adds 0.5f after scaling AVG pooling to be able to round to nearest as
vcvtq_u32_f32 rounds towards zero.
Change-Id: I22ce78f9e628cf4184a317edabce47211ab09456
|
|
Removed gemmlowp_mm_bifrost_transposed_dot8 kernel as not used
Change-Id: I43cf463a3a4c0cdb2808621c534ffd5c9fd47ca1
|
|
Increases the steps for calculating invsqrt used in L2 pool by 1 to increase accuracy.
Change-Id: Ib938a963809b07c30d47ec0675abae75bc086986
|
|
Removes:
-sve_interleave_8way_block2_16bit
-sve_interleave_8way_block4_16bit
-sve_sgemm_3VLx8
Change-Id: I0aa35fe974d8e122937dfe8923ecf63ff5a52001
|
|
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
Since we perform an element-wise operation, it is not necessary to pass the output_depth3d.
Change-Id: Ibfa07a0706e902acf59b444aa61e18a348162ea9
|
|
The issue was related to CLIm2Col when the number of input channels was less than
the number of elements processed by each thread.
The bug has been fixed in the validate_and_configure_window() function setting the correct number of elements accessed
in the output tensor.
Also fixed an issue GEMM3D when we have a single output channel
Change-Id: I094292d0c7662599c4a4c3916ec5f5821df5faef
|
|
Change-Id: I86679adff556b6ffc9929b35cbf1b59b3958bdb1
|
|
Change-Id: I6d5f91579850906e1eb973ff6c5612195255e631
|
|
Change-Id: I807ef84dbf893bd401dcac5c0fa3a4ee49aabc66
|
|
Change-Id: If8fbd04d0817b9e654ffa9715879a2521de66963
|
|
Change-Id: I5aae537372bf797fbb2a2bae81038f8963b041a9
|
|
CLDepthWiseConvolutionLayer3x3Kernel
Change-Id: Ie274da79b15c03f86dfedc85bb721b3de34a0bb4
|
|
Commit 16121924 `COMPMID-1673: Collapse window in CLArithmeticAddition when one
operand is a vector` changed the number of elements processed per iteration to
8, but didn't update the quantized kernel to reflect that.
Change-Id: I49a2fbcee81f5bbc1b210b4a5c6d63b94eafdcec
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/156355
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Also added the test case reported by ArmNN.
Change-Id: I9fe9a1b4f74267a3346529f3a597b37486593c4a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155914
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
batches available.
Change-Id: Iad83df2a9116a7f350de83ec59b28cd8893c8d3a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155716
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I76e57af6608b55b6f59a5d06aecc30063ee4c3cc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155733
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
OpenCL
COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride
With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 %
Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride
but I have not seen any benefit (maybe because we have few arithemtic operation and we
do not have more load instructions). However Depthwise convolution has been improved by
30%
Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I051748502ca24b9952e7313524bbfd708162efb4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155166
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
COMPMID-1690: Add tests for NEPermute with PermutationVector dimension > 3
Change-Id: I4bfc6ff88cd46863c2e39975b5663c624db1a63d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155316
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: I9cb725a8052091469904ecc7cfffa4add9914ffb
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155261
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Manuel Bottini <manuel.bottini@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
strict-aliasing rules
Change-Id: I9e54d07cf1d77c14f124056d3724b49981bf3f97
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155292
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
NEWidthConcatenateLayerKernel works with 4D tensors too, hence the check has
been removed and tests have been added.
Change-Id: I73814cabe5fae975a44cc1a03b092c552497e57d
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155070
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
|
|
vector
When one of the operands is a vector, the kernel does a broadcast addition and
the window is not collapsed. This represent an issue because it leads to a lot
of enqueues that increases the time taken by the OpenCL driver. This patch
allows to collapse the window when one of the two operands is a vector.
Furthermore, it adds LWS tuner to the kernel.
It also changes the number of elements processed per iteration to 8 to make
better usage of the cache.
Change-Id: I5f09ab0ddcffb3b7f9326a987c79a997b2d7fa8c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155003
Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: Ic8312a5b6790aa7cd4468d42f08d557ad40e9441
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154570
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: Id964d9068e18aaa13ab8adcbf7a9375b034ea6c3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154651
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: I91865506166951b3bf7f06a0b2d4cde925cfefb6
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153447
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: Iae22554d5fe893fd22a000eab5bfd8275ea06eb3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154102
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: I146936c9e98b343496a4b61cdbadf0eaa38e885a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154008
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: Ibc0b1242804c2fdb183825406e3c78bd0d1d3564
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154368
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: Id974efad304c2513b8824a6561ad45ee60b9e7fb
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153763
Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
COMPMID-1651: Fix QASYMM8 CLDeconvolutionLayer
This patch also extends the range of values used for testing Convolution and
Deconvolution to cover quantized [-1.0f, 1.0f].
Change-Id: I8b280669db67bb3ec25bf5d411c8f5954f5b0dab
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/149869
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: Id331199f569f52a37280a9ada5bf84694580b93c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/152843
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
|
|
(384496)
Mirroring CLGEMM behaviour to CLGEMMLowp
Change-Id: I308b54e2c0de131a5322b77e83e7454db498d692
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153175
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
NEON and CL normalization layer was generating invalida results for
radius > 4.
Change-Id: I15d846405e6b3492fe44920bbf8cadceb4e5258f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153161
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Matteo Martincigh <matteo.martincigh@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: Ida71312bcf6dbd854f2ab1efc65f74910c79e152
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/151510
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
|
|
Change-Id: I05d3447336ee0bf330e2a0c58fc6904be1db8f83
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/152626
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: I4d9240924fe483d2dd127ad6a4ae6f8066f61bd1
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/151893
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Andrew Mundy <andrew.mundy@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I9dd26b80025ea3a4c66f5f0bf41b7a98dd0d3aa4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/152549
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: I5f2e6843526cb154176a5b113627d4f36c3a8edd
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150967
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: I4920e43059a713126f15493f38fe50f07d0a8c7f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/151087
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Kernel size 5x5 layout NHWC.
Change-Id: Ia82ff211d1c954df228962b5c2c5ad8df7112449
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/151740
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: Ie215daacd10477309dbf8af1bb2b05b7a0a8f203
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150773
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
COMPMID-1608 - (Nightly) CLGEMMConvolutionLayer QASYMM8 errors and mismatches
COMPMID-1609 - (Nightly) CLFullyConnectedLayer QASYMM8 mismatches
Change-Id: I84c0d4f468be892f437f9f38b964dc7dfb66663a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150869
Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|
|
Change-Id: I62bbf510cc106a90ed2884be3c9c0c127da25898
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150681
Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
Tested-by: bsgcomp <bsgcomp@arm.com>
|