aboutsummaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2018-11-19COMPMID-1644: NEDepthwiseConvolution for FP16 NHWCGeorgios Pinitas
Change-Id: I6e7dee8bd615a5eff01c523f208a218574ee5eab
2018-11-19COMPMID-1065 : Create documentation explaining how to add new functions / ↵Vidhya Sudhan Loganathan
kernels Change-Id: I98183f95814442b6f3dbb67a1bdae99df05b9b01
2018-11-16COMPMID-1451: Fixes for BoundingBoxTransformgiuros01
- Fixing a bug for which we did not scale the boxes before transforming them - Adding the correct_transform_coords option to BoundingBoxTransformInfo Change-Id: I40281254bcf87e7c8583c119e99562414fe59822
2018-11-16COMPMID-1451: (3RDPARTY_UPDATE) Fixes for GenerateProposals graph node and ↵Michele Di Giorgio
BoxWithNMSLimitKernel COMPMID-1792: Accuracy issue in CLGenerateProposals This patch does the following: - Some fixes for GenerateProposals function and tests - Adapting BoxWithNMSLimitKernel to only accept U32 tensors as keeps_size - Update 3rdparty - Adds a small tolerance for a GenerateProposals test Change-Id: Ia8ec1cdfe941fe05003645e86deb9ea6a6044d74
2018-11-16COMPMID-1266 : Add support for FP16 in CLWinogradConvolutionLayer: 5x5 kernelsVidhya Sudhan Loganathan
Introduced F32 accumulation for F16 winograd gemm and output transform WinogradConvolution will be available for F16 only if fast math flag is enabled Change-Id: I215593c205236a0f9669218437bb40b184ec6a4f
2018-11-16COMPMID-1785: Support for 4D tensor in CLFlattenLayerKernelGian Marco Iodice
With this patch we are able to dispatch a single GPU job also in case of batched-flatten Change-Id: I755e7af29d44b24f67fa04bad3c9b7646e8deefc
2018-11-16COMPMID-1791 CLPriorBox clCreateKernel errorMichalis Spyrou
Change-Id: I2c2250669829e399fdc2363f729dc5e68d8aac17
2018-11-16COMPMID-1461 SSD support: Create NEON PriorBoxMichalis Spyrou
Change-Id: I99e1c3939cfea4b9cb0ddfa313706f31b213ca89
2018-11-15COMPMID-1784: Fixed mapx/mapy access windowsAnthony Barbier
num_elems_processed was passed as a scale instead of a step Change-Id: I8c6d58fe4432f9f6beb31c0a1e02204c96775d98
2018-11-15COMPMID-1784 Fixed the padding checks in ↵Anthony Barbier
AccessWindowRectangle::update_window_if_needed() Change-Id: I56426cc9c9688a0aa0acdd439d5887c7ef208cd2 Note: The code to shrink the window hasn't been fixed yet.
2018-11-15COMPMID-1676: Change CLROIAlign interface to accept ROIs as tensorsManuel Bottini
Change-Id: I69e995973597ba3927d29e4f6ed5438560e53d77
2018-11-15COMPMID-1329: Add support for GenerateProposals operator in CLgiuros01
Change-Id: Ib0798cc17496b7817f5b5769b25d98913a33a69d
2018-11-14COMPMID-1462 SSD support: Create CL PriorBoxMichalis Spyrou
Change-Id: I5bf5d751ec7c02d96c26a769f49d03ea23a248b7
2018-11-14COMPMID-1638: Add BBoxTransform to the graph API (3RDPARTY_UPDATE)Manuel Bottini
Change-Id: Ie13a9eb6d417388b5de533bffa895796d9d2cf62
2018-11-14COMPMID-1781 Add channel support in CLL2NormalizationMichalis Spyrou
Change-Id: Ibab049f09413258c99335b7da6b151530a1bd136
2018-11-13COMPMID-1707: Create 3 special CLWidthConcatenate kernel to concatenate 2/4 ↵Michele Di Giorgio
and 8 tensors (Part 1) Creating special cases for concatening 2 and 4 tensors. Change-Id: I6a739a494ae45011acb65369e353f9ef96970b90
2018-11-13COMPMID-1751: Remove output_3d_depth from ↵Georgios Pinitas
NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint Change-Id: I1d5bc4d24059917f9ddef0873dd3043b1f2320a8
2018-11-09COMPMID-1451: Reduces accuracy issue in NEPoolingLayer for QASYMM8 NHWCGeorgios Pinitas
Adds 0.5f after scaling AVG pooling to be able to round to nearest as vcvtq_u32_f32 rounds towards zero. Change-Id: I22ce78f9e628cf4184a317edabce47211ab09456
2018-11-09COMPMID-1451 - Removed unused OpenCL kernel from gemmlowp.clGian Marco Iodice
Removed gemmlowp_mm_bifrost_transposed_dot8 kernel as not used Change-Id: I43cf463a3a4c0cdb2808621c534ffd5c9fd47ca1
2018-11-09COMPMID-1451: Perform two steps in calculating sqrt in NEPoolingLayerGeorgios Pinitas
Increases the steps for calculating invsqrt used in L2 pool by 1 to increase accuracy. Change-Id: Ib938a963809b07c30d47ec0675abae75bc086986
2018-11-09COMPMID-1675: Remove unused SVE filesGeorgios Pinitas
Removes: -sve_interleave_8way_block2_16bit -sve_interleave_8way_block4_16bit -sve_sgemm_3VLx8 Change-Id: I0aa35fe974d8e122937dfe8923ecf63ff5a52001
2018-11-08COMPMID-1451: Removed output_depth3d from ↵Gian Marco Iodice
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat Since we perform an element-wise operation, it is not necessary to pass the output_depth3d. Change-Id: Ibfa07a0706e902acf59b444aa61e18a348162ea9
2018-11-08COMPMID-1736: Fixed out-of-bound write in CLIm2ColGian Marco Iodice
The issue was related to CLIm2Col when the number of input channels was less than the number of elements processed by each thread. The bug has been fixed in the validate_and_configure_window() function setting the correct number of elements accessed in the output tensor. Also fixed an issue GEMM3D when we have a single output channel Change-Id: I094292d0c7662599c4a4c3916ec5f5821df5faef
2018-11-08COMPMID-1675: Add SVE supportGeorgios Pinitas
Change-Id: I86679adff556b6ffc9929b35cbf1b59b3958bdb1
2018-11-08COMPMID-1579: Add support for ChannelShuffle operator in NEONGeorgios Pinitas
Change-Id: I6d5f91579850906e1eb973ff6c5612195255e631
2018-11-08COMPMID-1776: Revert QuantizeDownStage to use fixed-pointGeorgios Pinitas
Change-Id: I807ef84dbf893bd401dcac5c0fa3a4ee49aabc66
2018-11-07COMPMID-1451: Fixed zerobuff sizes and clobbers in interleave transforms.Georgios Pinitas
Change-Id: If8fbd04d0817b9e654ffa9715879a2521de66963
2018-11-06COMPMID-1328 Add support for BoxWithNMSLimit operator in CPPMichalis Spyrou
Change-Id: I5aae537372bf797fbb2a2bae81038f8963b041a9
2018-11-06COMPMID-1703: Collapse the 4th dimensions in ↵Georgios Pinitas
CLDepthWiseConvolutionLayer3x3Kernel Change-Id: Ie274da79b15c03f86dfedc85bb721b3de34a0bb4
2018-11-02COMPMID-1739: Fix broadcast CLArithmeticAddition for QASYMM8Michele Di Giorgio
Commit 16121924 `COMPMID-1673: Collapse window in CLArithmeticAddition when one operand is a vector` changed the number of elements processed per iteration to 8, but didn't update the quantized kernel to reflect that. Change-Id: I49a2fbcee81f5bbc1b210b4a5c6d63b94eafdcec Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/156355 Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1712 CLPoolingLayer wrong results in QASYMM8Michalis Spyrou
Also added the test case reported by ArmNN. Change-Id: I9fe9a1b4f74267a3346529f3a597b37486593c4a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155914 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1699: Disable arithmetic operations in CLWinogradLayer when no ↵Georgios Pinitas
batches available. Change-Id: Iad83df2a9116a7f350de83ec59b28cd8893c8d3a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155716 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1704: Collapse the 4th dimension in CLPoolingLayerKernelGeorgios Pinitas
Change-Id: I76e57af6608b55b6f59a5d06aecc30063ee4c3cc Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155733 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-1413 - Improve the performance of GEMMLowp with 8 bit dot product on ↵Gian Marco Iodice
OpenCL COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 % Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride but I have not seen any benefit (maybe because we have few arithemtic operation and we do not have more load instructions). However Depthwise convolution has been improved by 30% Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1029: Collapse CLWinogradInputTransform/CLWinogradOutputTransformGeorgios Pinitas
Change-Id: I051748502ca24b9952e7313524bbfd708162efb4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155166 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1451: Fix CL/NEPermuteKernel PermuteVection checkIsabella Gottardi
COMPMID-1690: Add tests for NEPermute with PermutationVector dimension > 3 Change-Id: I4bfc6ff88cd46863c2e39975b5663c624db1a63d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155316 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1451: Fix inlines in cl helpersGeorgios Pinitas
Change-Id: I9cb725a8052091469904ecc7cfffa4add9914ffb Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155261 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-1530 error: dereferencing type-punned pointer will break ↵Michalis Spyrou
strict-aliasing rules Change-Id: I9e54d07cf1d77c14f124056d3724b49981bf3f97 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155292 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1681: (Nightly) NEWidthConcatenateLayer failsMichele Di Giorgio
NEWidthConcatenateLayerKernel works with 4D tensors too, hence the check has been removed and tests have been added. Change-Id: I73814cabe5fae975a44cc1a03b092c552497e57d Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155070 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com>
2018-11-02COMPMID-1673: Collapse window in CLArithmeticAddition when one operand is a ↵Michele Di Giorgio
vector When one of the operands is a vector, the kernel does a broadcast addition and the window is not collapsed. This represent an issue because it leads to a lot of enqueues that increases the time taken by the OpenCL driver. This patch allows to collapse the window when one of the two operands is a vector. Furthermore, it adds LWS tuner to the kernel. It also changes the number of elements processed per iteration to 8 to make better usage of the cache. Change-Id: I5f09ab0ddcffb3b7f9326a987c79a997b2d7fa8c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155003 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1451: Perform CLOutputStage using floats.Georgios Pinitas
Change-Id: Ic8312a5b6790aa7cd4468d42f08d557ad40e9441 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154570 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1451: Fuse activation in DepthwiseConvolution.Georgios Pinitas
Change-Id: Id964d9068e18aaa13ab8adcbf7a9375b034ea6c3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154651 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1327: Add support for BBoxTransform operator in CLgiuros01
Change-Id: I91865506166951b3bf7f06a0b2d4cde925cfefb6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153447 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-1632 Add CLL2NormalizationLayer for NHWC and FP32Michalis Spyrou
Change-Id: Iae22554d5fe893fd22a000eab5bfd8275ea06eb3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154102 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1523: Fuse BN node with convolution.Georgios Pinitas
Change-Id: I146936c9e98b343496a4b61cdbadf0eaa38e885a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154008 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1667: Add 4D tensors support to CLWidthConcatenateLayerKernelMichele Di Giorgio
Change-Id: Ibc0b1242804c2fdb183825406e3c78bd0d1d3564 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154368 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1580 Implement ReduceMean in NEONMichalis Spyrou
Change-Id: Id974efad304c2513b8824a6561ad45ee60b9e7fb Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153763 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1586: Add support for NHWC CLDeconvolutionLayerMichele Di Giorgio
COMPMID-1651: Fix QASYMM8 CLDeconvolutionLayer This patch also extends the range of values used for testing Convolution and Deconvolution to cover quantized [-1.0f, 1.0f]. Change-Id: I8b280669db67bb3ec25bf5d411c8f5954f5b0dab Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/149869 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1574 Implement ReduceMean in OpenCLMichalis Spyrou
Change-Id: Id331199f569f52a37280a9ada5bf84694580b93c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/152843 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1451: Reverting changes for CLGEMM and CLGEMMLowp previuosly done ↵Isabella Gottardi
(384496) Mirroring CLGEMM behaviour to CLGEMMLowp Change-Id: I308b54e2c0de131a5322b77e83e7454db498d692 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153175 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>