aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2018-11-02COMPMID-873: Integrate RSH NEON Depthwise Convolution routineGeorgios Pinitas
Change-Id: Ida1e9a836bc518bfe5563e16bf7f92bde5fc13f7 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118472 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-876: Integrate RSH native GEMM kernel.Georgios Pinitas
Change-Id: Iaae87e155fa673bf099c2bc21a7be072c5c08fc1 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119118 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-578: Implement FAST corners for CL/NEONAbe Mbise
Change-Id: Ifa74e2bf05546de9a49aa185e22fba50438d8ad6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/113946 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-888 Valgrind invalid read in NEGEMVAArch64KernelMichalis Spyrou
Change-Id: I470f244718571e32ac55062b2b62fd0f6996efc6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118940 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-901 - Optimizing CLCol2ImKernelGian Marco
This patch makes col2im on OpenCL 2 times faster Change-Id: I8d90f5a72a050355ca1fd13433d8c2c26e5e33f5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119442 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-879: Investigate CL mismatches in Convolution S16Georgios Pinitas
Changes and simplifies the validation to divide the scale in integer format instead of double. Change-Id: Ib9156e9515e4e542391eeda11548f3d15613a0af Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119256 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-828 - Add support for non square pool size - Part1Isabella Gottardi
Change-Id: Ib8100e7c659c49694c746fa3f36ce20f44f6929f Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117804 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-895 - Optimizing CLDepthwiseConvolution3x3KernelGian Marco
This patch brings the MACs utilisation up to 25 % when both stride_x and stride_y are equal to 1 Performance reported in the following confluence page: https://confluence.arm.com/display/MLENG/Depthwise+convolution+3x3+FP32+performance%3A+ACL+18.02 Change-Id: Ida1b64be9a88805902a3d90194559b58eb1224a3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119068 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-894: Segfault: neon_cnn on S5 neoGeorgios Pinitas
Removed double managment of the same tensor object Change-Id: Ibc74cd8c7bd199cd473ff68f692840cbf01b27b3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119119 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
2018-11-02COMPMID-765 - Added LWS hint in CLIm2ColGian Marco
The LWS hint has been applied for optimized cases 1x1 and 3x3 Change-Id: I6b4bfe2f9f7da627052336889b8a18d279fe2675 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119162 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-784: Added support for PADDING = SAME in Winograd layer.Pablo Tello
Change-Id: I5a420da6a8041f9ff6d0811815f2fc74c85c56a8 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119014 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-891 - Use OpenCL timer in CLTunerGian Marco
Change-Id: I84a914c13b162c4f74321c9cafc30a18ad4ebbdb Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118797 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-855 - Optimizing im2col on OpenCL (DCHW)Gian Marco
Introduced optimizations for 1x1, 3x3, 5x5 and 11x11 Change-Id: Ibb7f7a9fbec01a7684746ed8513634078126e452 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118107 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
2018-11-02COMPMID-754: Add validation method to CLPermute kernelMichele Di Giorgio
Change-Id: If6f3888a035b557a6c369efa22b56d6c8d3efbd3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118789 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
2018-11-02COMPMID-874: Improve default number of threads choice in the SchedulerGeorgios Pinitas
Change-Id: Ia30ec2afce0aafcd39f41440efb972b18bbda9f8 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118657 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-890: Valgrind: NEON Convolution Layer validation failsIsabella Gottardi
Change-Id: I5296815cf04e5f805d6523196567b6c01715c8b5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118711 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-871: Remove vst4q/vld4q from NEActivationLayer.Georgios Pinitas
Change-Id: Iebd2a8fece1af87c93d6795e176d8c37ca64bbf6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118187 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-787: Add CL support for broadcast multiplyMichele Di Giorgio
Change-Id: I71f67789648ef05ccdedce77c7427bc0127b3a69 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116741 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-765: Fix direct convolution output stage.Georgios Pinitas
Change-Id: Ie4ac7f61675c1fb9b1748d6784fccb26f058832a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118635 Reviewed-by: Robert Hughes <robert.hughes@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-765: Fixed clangtidy warnings.Pablo Tello
Change-Id: I83d0f2bc8e0ebfdc0b60931f2c5acf0469caf886 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118696 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-765: Fixed CPU detection code to read larger buffer (was failing on ↵Pablo Tello
Arm Cortex-A55 FPGA with 8 CPUs with lots of flags). Change-Id: I493fb1013c6c25d9b9c809705b1ee24abac1d8d1 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118456 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-784: Winograd tramsforms refactoringPablo Tello
1) Removed the example files winograd_layer.hpp/cpp 2) Teplatized winograd transform kernels Change-Id: I7045fa0b801b9d30a11275914aaa2dafd254aed2 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118332 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-400: Implement the tensorshift kernel for fixing DC's alignment ↵Xinghang Zhou
issue on OpenGL ES Change-Id: I7a8489bb0fddc72899ea165e414ee87bdbfb45b3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118106 Reviewed-by: Joel Liang <joel.liang@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-394: GLES fails to compile Dropout kernel on S5steli01
Change-Id: Ie480332e6e302edd406627e90be0d7df3e61dde5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118303 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02IVGCVSW-863 Broadcast support in CL/NEON Arithmetic AddDiego Lopez Recas
Also, added instrumentation to support generic tensor broadcasting for NEON and CL backends. Change-Id: I1bc5747a286e1a4b464c209067581e103d473b9a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/114201 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-866: Integrate SGEMV Neon Assembly from RSHMichele Di Giorgio
Change-Id: Icbb43de7642e2b433d7471d70b9dbbde850989d3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118197 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-765: Allow RSH's code to not have default cases in their switchesAnthony Barbier
Change-Id: I2d3cc9668852a1ba414fc3148866df408f770dc8 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118308 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-390,397,398: bugfix and fully connected validation issue on ↵zhenglin
specific dataset Change-Id: I227e90445715c3bd394e49930b010c0a5f5ca177 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118108 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Joel Liang <joel.liang@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-815: Fixed Winograd 5x5 padding bug.Pablo Tello
Change-Id: I38ae204632ae27c5fe7a0131462343397899868c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118120 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-765 - Added third dimension for CLTunerGian Marco
Change-Id: I0a7ea4cde1dbf8edd28908dfff80928ef7e996c4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117647 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-588: Port Equalize Histogram to new validationJohn Richardson
Change-Id: Iff50adf2993bd69c2696a47559d6b2e0011fed87 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110177 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-765 Fixed missing cast that was breaking the bare metal buildAnthony Barbier
Change-Id: I80437f7ba6e4b8ec1fb145300a017b3688f3f2b6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118086 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-837: Fixed remap tests failures in Valgrind.Pablo Tello
Some minor improvements in the test fixture, for example making sure the values in the mapx and mapy tensors are in the range of [-5, in_width+5] and [-5,in_height]. Tolerance was changed to 0, no mismatches expected. Change-Id: I2fad06defb293bf9fdd1988799b19547c102dee5 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118044 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02APPBROWSER-395: Random error in FullyConnectedLayersteli01
Change-Id: Ic460695b8a203c1080ea177b5463b48b07b70c4b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/118075 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Joel Liang <joel.liang@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02IVGCVSW-798 Add Softmax NEON support for QASYMM8Diego Lopez Recas
Change-Id: I4f2cca52caf210fdb7d6bb7e9436ac51cb5088b4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/112398 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-784: Added support for biases in WinogradLayer.Pablo Tello
1) Updated to the latest code from the RSH repo. 2) Moved winograd transforms into kernels. 3) Added support for biases Change-Id: I7f39f34a599b49d7d9b549cc10a4f4d4a8007ab8 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117474 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-791: Generic Depthwise Convolution Layer NEON QASYMM8Georgios Pinitas
Change-Id: I33cf54e68f6c097ac58b6f16c3f9a720978f09cd Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117289 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-790 - NEON: Add QASYMM8 support to ConvolutionIsabella Gottardi
Change-Id: Iec82a91ad351cfe8d07d0976a24bd42f4703177a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116833 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-765: Clangtidy warningsPablo Tello
Change-Id: If8c1e0103ae2e3dfde3d0b9f23575c0e904c7f30 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117961 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-834 Fix arm_compute_nightly_validation getting killedMichalis Spyrou
Changed CLReductionOperationKernel: Now each kernel computes a 2D slice instead of 1D. This reduces the memory footprint from around 1.6Gb for a 4k input image to a few Mb, which was caused by the __local memory and was probably the cause for this bug. Change-Id: I71ac71ff09b041c945a134177600f0f3475e48cf Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117835 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-848 NEPoolingLayerKernel incorrectly reportsMichalis Spyrou
it supports asymmetric padding Add asymmetric padding support for NEPoolingLayer Change-Id: Ia5cc660aeca636c3c45df4916a28974cc2b7f2f4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117275 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-748 - Integrating optimized SGEMM for bifrostGian Marco
This patch introduces a new GEMM capable to improve the mac utilisation of 10% compared to the GEMM without reshape. However this implementation is not faster in all cases as we need to take into account the time for reshaping the matrices. For this reason an heuristic solution to select the optimal GEMM to use has been added to the function. More information about the heuristic implementation can be found at COMPMID-852. With this new patch, GoogleNet, MobileNet, VGG16 and SqueezeNet can improved the performance of 1.5x. More information about the performance uplift can be found here: https://confluence.arm.com/display/MLENG/GEMM+FP32+performance%3A+ACL+18.02 Change-Id: I024563c06b9aed02a211a974e452bae5c233b04c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117140 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-860: Neon HGEMM integrated assembly kernel from RSH for Arm ↵Pablo Tello
Cortex-A55r1. Change-Id: I640ae54dcc4591915c7a539b27728f05b70cf0eb Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117616 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-798 Add instrumentation to NEON kernelsAnthony Barbier
Change-Id: I9dbb090cac731d68bd98a7d1c8ab0e1cb0a5c911 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116746 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-816 - Optimizing CLGEMMLowpMatrixMultiplyCore - Part1Gian Marco
The performance improvements have been reported at the following confluence page: https://confluence.arm.com/display/MLENG/GEMMLowp+performance%3A+ACL+18.02 Config3 of McVail looks improved by 29x Change-Id: I8b203c0b75fc368f85cea863b7eed398fab3e79a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/115783 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-842: Add NEON QASYMM8 RELU ActivationMichele Di Giorgio
Change-Id: I7197d2ad7ac08112eba1570a257ad011b1ce0b75 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117404 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-858: Assert in ICLKernel on higher window dimensions moved to enqueueAnthony Barbier
Change-Id: I49d501e82f5c69b6912cb9e5fa684a904c62ed8e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117409 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-841: Add CL QASYMM8 RELU ActivationMichele Di Giorgio
Change-Id: I8e0b7cad2f977942224d0116e8498bf9b2d6014d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117229 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-784: Doxygen fixesPablo Tello
Change-Id: I35f429fbf08dece7c759242c37e0a68b0851ce49 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/117231 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02APPBROWSER-377: GCConvoutionLayer support for FP16Stephen Li
Change-Id: I801b5e393a16a9f92c062826e6fcfd5982ca7bb3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/116584 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>