Age | Commit message (Collapse) | Author |
|
This patch brings the MACs utilisation up to 25 % when both stride_x and stride_y are equal to 1
Performance reported in the following confluence page:
https://confluence.arm.com/display/MLENG/Depthwise+convolution+3x3+FP32+performance%3A+ACL+18.02
Change-Id: Ida1b64be9a88805902a3d90194559b58eb1224a3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119068
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Removed double managment of the same tensor object
Change-Id: Ibc74cd8c7bd199cd473ff68f692840cbf01b27b3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119119
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
|
|
The LWS hint has been applied for optimized cases 1x1 and 3x3
Change-Id: I6b4bfe2f9f7da627052336889b8a18d279fe2675
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119162
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
It's not safe to accumulate two u8xu8 results into a u16 accumulator.
This changes the kernel to use uadalp after every single multiply.
Correct the test fixture as well.
Change-Id: I011b90033c4673e55b843d079e3f7d185b1df330
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119096
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I5a420da6a8041f9ff6d0811815f2fc74c85c56a8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119014
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I84a914c13b162c4f74321c9cafc30a18ad4ebbdb
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118797
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Introduced optimizations for 1x1, 3x3, 5x5 and 11x11
Change-Id: Ibb7f7a9fbec01a7684746ed8513634078126e452
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118107
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|
|
Change-Id: If6f3888a035b557a6c369efa22b56d6c8d3efbd3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118789
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
|
|
example
Change-Id: Ic639d51fb5dd4f78912a9b11abc7df79d205a22b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118843
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: I12d4af007c123b19925ceb5e3c84285e096bc13b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118718
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
|
|
Change-Id: Ia30ec2afce0aafcd39f41440efb972b18bbda9f8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118657
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: I5296815cf04e5f805d6523196567b6c01715c8b5
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118711
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Iebd2a8fece1af87c93d6795e176d8c37ca64bbf6
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118187
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I71f67789648ef05ccdedce77c7427bc0127b3a69
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116741
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ie4ac7f61675c1fb9b1748d6784fccb26f058832a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118635
Reviewed-by: Robert Hughes <robert.hughes@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I83d0f2bc8e0ebfdc0b60931f2c5acf0469caf886
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118696
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Arm Cortex-A55 FPGA with 8 CPUs with lots of flags).
Change-Id: I493fb1013c6c25d9b9c809705b1ee24abac1d8d1
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118456
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
1) Removed the example files winograd_layer.hpp/cpp
2) Teplatized winograd transform kernels
Change-Id: I7045fa0b801b9d30a11275914aaa2dafd254aed2
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118332
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I14ff5e2964328d22c0bba5a77683e07f0c7920e9
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118389
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
issue on OpenGL ES
Change-Id: I7a8489bb0fddc72899ea165e414ee87bdbfb45b3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118106
Reviewed-by: Joel Liang <joel.liang@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I5366d11aefdb8f3ba7326ed7527eb216c4de0668
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118372
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ie480332e6e302edd406627e90be0d7df3e61dde5
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118303
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
investigated
Change-Id: I5a69198bfd60d9cdd061f2db9838d9f0df9ecc23
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118454
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Also, added instrumentation to support generic tensor broadcasting for
NEON and CL backends.
Change-Id: I1bc5747a286e1a4b464c209067581e103d473b9a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114201
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Icbb43de7642e2b433d7471d70b9dbbde850989d3
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118197
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Change-Id: I2d3cc9668852a1ba414fc3148866df408f770dc8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118308
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
specific dataset
Change-Id: I227e90445715c3bd394e49930b010c0a5f5ca177
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118108
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Joel Liang <joel.liang@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ic1f215c1ae85ad5c516cc3600447a50bba77ebc1
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117668
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I38ae204632ae27c5fe7a0131462343397899868c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118120
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Fully Connected test names are not unique
Change-Id: Ie4654cc1cb4720c51a3114162043562d5cbc6d28
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118126
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I880ac3a1c3f5ea09ccefe27d9ee40bd60afcea2b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118056
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I0a7ea4cde1dbf8edd28908dfff80928ef7e996c4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117647
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Iff50adf2993bd69c2696a47559d6b2e0011fed87
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110177
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I80437f7ba6e4b8ec1fb145300a017b3688f3f2b6
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118086
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Some minor improvements in the test fixture, for example making sure
the values in the mapx and mapy tensors are in the range of [-5, in_width+5]
and [-5,in_height].
Tolerance was changed to 0, no mismatches expected.
Change-Id: I2fad06defb293bf9fdd1988799b19547c102dee5
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118044
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: Ic460695b8a203c1080ea177b5463b48b07b70c4b
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118075
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Joel Liang <joel.liang@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I4f2cca52caf210fdb7d6bb7e9436ac51cb5088b4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/112398
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: Ia2874d30780cb597a6e5039120815f2368911e0c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118024
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
1) Updated to the latest code from the RSH repo.
2) Moved winograd transforms into kernels.
3) Added support for biases
Change-Id: I7f39f34a599b49d7d9b549cc10a4f4d4a8007ab8
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117474
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Change-Id: I33cf54e68f6c097ac58b6f16c3f9a720978f09cd
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117289
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Currently we output an array of timestamps: queued, submitted, start, end
This patch instead only output end-start (i.e the time it took to execute the kernel on the GPU)
Change-Id: Ic3c2b68128f6acd6bb018b7b3ead0b69dd5aca59
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117865
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Kevin Petit <kevin.petit@arm.com>
|
|
Change-Id: Iec82a91ad351cfe8d07d0976a24bd42f4703177a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116833
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
|
|
Change-Id: If8c1e0103ae2e3dfde3d0b9f23575c0e904c7f30
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117961
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
Refactored the console printer too (So that we can re-use the code if needed)
Change-Id: I16a0f70104f82f07cd59900b383038fa5a76e1bc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117858
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
|
|
Changed CLReductionOperationKernel: Now each kernel computes
a 2D slice instead of 1D. This reduces the memory footprint
from around 1.6Gb for a 4k input image to a few Mb, which was
caused by the __local memory and was probably the cause for this bug.
Change-Id: I71ac71ff09b041c945a134177600f0f3475e48cf
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117835
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
Tested-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
it supports asymmetric padding
Add asymmetric padding support for NEPoolingLayer
Change-Id: Ia5cc660aeca636c3c45df4916a28974cc2b7f2f4
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117275
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
This patch introduces a new GEMM capable to improve the mac utilisation
of 10% compared to the GEMM without reshape. However this implementation
is not faster in all cases as we need to take into account the time for
reshaping the matrices. For this reason an heuristic solution to select
the optimal GEMM to use has been added to the function. More information
about the heuristic implementation can be found at COMPMID-852.
With this new patch, GoogleNet, MobileNet, VGG16 and SqueezeNet can
improved the performance of 1.5x.
More information about the performance uplift can be found here:
https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.02
Change-Id: I024563c06b9aed02a211a974e452bae5c233b04c
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117140
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
|
|
when not running in a terminal
Change-Id: I4ec90803c5dc41b0cee05c36113ae3f189564d58
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117831
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
CustomConvolution (output S16)
Change-Id: Ic099336f558e994210a59e14ec0171fae68ccb80
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116663
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|
|
Change-Id: I25424481ddbbeb43f940cf51cef791e4fd83ea92
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117676
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Pablo Tello <pablo.tello@arm.com>
Tested-by: Jenkins <bsgcomp@arm.com>
|