aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/cl_kernels
AgeCommit message (Collapse)Author
2018-11-13COMPMID-1707: Create 3 special CLWidthConcatenate kernel to concatenate 2/4 ↵Michele Di Giorgio
and 8 tensors (Part 1) Creating special cases for concatening 2 and 4 tensors. Change-Id: I6a739a494ae45011acb65369e353f9ef96970b90
2018-11-09COMPMID-1451 - Removed unused OpenCL kernel from gemmlowp.clGian Marco Iodice
Removed gemmlowp_mm_bifrost_transposed_dot8 kernel as not used Change-Id: I43cf463a3a4c0cdb2808621c534ffd5c9fd47ca1
2018-11-08COMPMID-1451: Removed output_depth3d from ↵Gian Marco Iodice
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat Since we perform an element-wise operation, it is not necessary to pass the output_depth3d. Change-Id: Ibfa07a0706e902acf59b444aa61e18a348162ea9
2018-11-06COMPMID-1703: Collapse the 4th dimensions in ↵Georgios Pinitas
CLDepthWiseConvolutionLayer3x3Kernel Change-Id: Ie274da79b15c03f86dfedc85bb721b3de34a0bb4
2018-11-02COMPMID-1739: Fix broadcast CLArithmeticAddition for QASYMM8Michele Di Giorgio
Commit 16121924 `COMPMID-1673: Collapse window in CLArithmeticAddition when one operand is a vector` changed the number of elements processed per iteration to 8, but didn't update the quantized kernel to reflect that. Change-Id: I49a2fbcee81f5bbc1b210b4a5c6d63b94eafdcec Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/156355 Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1712 CLPoolingLayer wrong results in QASYMM8Michalis Spyrou
Also added the test case reported by ArmNN. Change-Id: I9fe9a1b4f74267a3346529f3a597b37486593c4a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155914 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1699: Disable arithmetic operations in CLWinogradLayer when no ↵Georgios Pinitas
batches available. Change-Id: Iad83df2a9116a7f350de83ec59b28cd8893c8d3a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155716 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1704: Collapse the 4th dimension in CLPoolingLayerKernelGeorgios Pinitas
Change-Id: I76e57af6608b55b6f59a5d06aecc30063ee4c3cc Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155733 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-1413 - Improve the performance of GEMMLowp with 8 bit dot product on ↵Gian Marco Iodice
OpenCL COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 % Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride but I have not seen any benefit (maybe because we have few arithemtic operation and we do not have more load instructions). However Depthwise convolution has been improved by 30% Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1029: Collapse CLWinogradInputTransform/CLWinogradOutputTransformGeorgios Pinitas
Change-Id: I051748502ca24b9952e7313524bbfd708162efb4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155166 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1451: Fix inlines in cl helpersGeorgios Pinitas
Change-Id: I9cb725a8052091469904ecc7cfffa4add9914ffb Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155261 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-1673: Collapse window in CLArithmeticAddition when one operand is a ↵Michele Di Giorgio
vector When one of the operands is a vector, the kernel does a broadcast addition and the window is not collapsed. This represent an issue because it leads to a lot of enqueues that increases the time taken by the OpenCL driver. This patch allows to collapse the window when one of the two operands is a vector. Furthermore, it adds LWS tuner to the kernel. It also changes the number of elements processed per iteration to 8 to make better usage of the cache. Change-Id: I5f09ab0ddcffb3b7f9326a987c79a997b2d7fa8c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155003 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1451: Perform CLOutputStage using floats.Georgios Pinitas
Change-Id: Ic8312a5b6790aa7cd4468d42f08d557ad40e9441 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154570 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1451: Fuse activation in DepthwiseConvolution.Georgios Pinitas
Change-Id: Id964d9068e18aaa13ab8adcbf7a9375b034ea6c3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154651 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-1327: Add support for BBoxTransform operator in CLgiuros01
Change-Id: I91865506166951b3bf7f06a0b2d4cde925cfefb6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153447 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-1632 Add CLL2NormalizationLayer for NHWC and FP32Michalis Spyrou
Change-Id: Iae22554d5fe893fd22a000eab5bfd8275ea06eb3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154102 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1523: Fuse BN node with convolution.Georgios Pinitas
Change-Id: I146936c9e98b343496a4b61cdbadf0eaa38e885a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154008 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1667: Add 4D tensors support to CLWidthConcatenateLayerKernelMichele Di Giorgio
Change-Id: Ibc0b1242804c2fdb183825406e3c78bd0d1d3564 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/154368 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1574 Implement ReduceMean in OpenCLMichalis Spyrou
Change-Id: Id331199f569f52a37280a9ada5bf84694580b93c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/152843 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1451: Fix NormalizationLayer accross width normalization.Georgios Pinitas
NEON and CL normalization layer was generating invalida results for radius > 4. Change-Id: I15d846405e6b3492fe44920bbf8cadceb4e5258f Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/153161 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Matteo Martincigh <matteo.martincigh@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-1610: Fixed CLDirectConvolution mismatchesPablo Tello
Kernel size 5x5 layout NHWC. Change-Id: Ia82ff211d1c954df228962b5c2c5ad8df7112449 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/151740 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02[COMPMID-1331] Add support for RoIAlign operator in CLgiuros01
Change-Id: Ie215daacd10477309dbf8af1bb2b05b7a0a8f203 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150773 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-1607 - (Nightly) CLGEMMLowpMatrixMultiplyCore errors and mismatchesIsabella Gottardi
COMPMID-1608 - (Nightly) CLGEMMConvolutionLayer QASYMM8 errors and mismatches COMPMID-1609 - (Nightly) CLFullyConnectedLayer QASYMM8 mismatches Change-Id: I84c0d4f468be892f437f9f38b964dc7dfb66663a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150869 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-286: CL colour convert to U8Manuel Bottini
Change-Id: I62bbf510cc106a90ed2884be3c9c0c127da25898 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150681 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1519: Add support for 3D input/output in CLGEMMLowpOutputStageGeorgios Pinitas
Change-Id: I637add70310d2da4d82b236a6352af9d33be17a1 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/149706 Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1518: Add support for GEMM3D in CLGEMMLowpMatrixMultiplyCoreGeorgios Pinitas
Change-Id: Ib14ac821ee5d4aff80bd602cd3e76e7018abb5e6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/150268 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1598 : Fix compilation error in CLDepthwiseConvolutionQS8 kernelGeorgios Pinitas
Change-Id: I65eeb0cba2af462c6ef64a536ad263c407d62811 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/149609 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1588 Create UpsampleKernel for YOLOLayerMichalis Spyrou
Change-Id: Ic1f9e85306a0a0b1459c9f9aa35bd629deea1710 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148797 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1581: Collapse windowsGeorgios Pinitas
Change-Id: Iec56c9a96d9736a63f13b65efa33311950f20661 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148572 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1564: Add QASYMM8 on CLPixelwiseMultiplicationGeorgios Pinitas
Change-Id: I5f719f5b2915c18cd0ca6271db401152112863a6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148982 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Isabella Gottardi <isabella.gottardi@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
2018-11-02COMPMID-1554 Implementing Space to Batch on OpenCL - NHWCMichalis Spyrou
Change-Id: Ifa37a6758f79d0a6ca771dcfb4c55a5d96b452d0 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148892 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1568: Add support for QASYMM8 to CLNormalizePlanarYUVMichele Di Giorgio
Change-Id: Id7ea6e7f57179478e5ba0e9231274e98fa089590 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148028 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1566: Add broadcast to CLArithmeticSubtractionGeorgios Pinitas
Change-Id: I05d21f9a92013ecfd1128d12cf1561cfd6e5c5e9 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/147983 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02[COMPMID-1229] Implementing Pad on OpenCL -FP32/FP16Giuseppe Rossini
Change-Id: Ideead99410e5e0bda1035030af1bbcd0a65ea15e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/144792 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1507 Add support for QASYMM8 in CLScaleKernelMichalis Spyrou
Change-Id: I4a32e47e6d9152633668cf0e14db88fc8c26f7ea Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148167 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1549 Implementing Batch to Space on OpenCL - NHWCMichalis Spyrou
Change-Id: If7ae0a8b6255a10711365068d9fb153c71f09818 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/147751 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1584 - Collapse batch size in CLChannelShuffleLayerKernelGian Marco Iodice
COMPMID-1589 - Add support for NHWC to CLChannelShuffleLayerKernel Change-Id: I13936a5cd1659d01fdb10b346e90f0d72d79f1f1 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/148475 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1227 Implementing Space to Batch on OpenCLMichalis Spyrou
Change-Id: I6fd83d6584c56a4fd2470948f1987e23237c16d3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145577 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: bsgcomp <bsgcomp@arm.com>
2018-11-02COMPMID-1539 Implement YOLOLayer on CLGiorgio Arena
Change-Id: I332c0703e1399fca0c5b724529b54a28f49c88da Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/146842 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-1527 - Implementing ReorgLayer on OpenCLGian Marco Iodice
Also extended tests on NEON Change-Id: Icb0eced534e904ef807972dd3a31988f501bb02e Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/147095 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-1330: Add support for NormalizePlanarYUV operator in CLMichele Di Giorgio
Change-Id: Id0754b9e2bc3ef7ff2c4c21c3b89709588c41bd3 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/146637 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2018-11-02COMPMID-1266 : support for FP16 in CLWinogradConvolutionLayerVidhya Sudhan Loganathan
Added support for FP16 in CLWinogradConvolutionLayer: 5x5 kernels and 3x3 kernels(COMPMID-937) Change-Id: I0f394cbdc978dd04176416e9f612aca3986b09e6 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145537 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2018-11-02COMPMID-1451: Fix misssing comment in helpers.hGeorgios Pinitas
Change-Id: I30cb6b9b55fe762238ab402a28667eae9e2ab6a2 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/146530 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-1332: Implement Slice for CLGeorgios Pinitas
Change-Id: I0dbc4fd7f640d31daa1970eb3da0e941cb771f2b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/146145 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
2018-11-02COMPMID-1218 Implementing Batch to Space on OpenCLMichalis Spyrou
Change-Id: I12ba4c0c35f086ea3f395970b85af5bf8f94850b Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145052 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-1235: Implements StridedSlice for CLGeorgios Pinitas
Change-Id: If2b44da31fae528c76be742b4b3a21fb0eb06b49 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145284 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-1433: Use Arm macro to check whether we support dot product instructionsGeorgios Pinitas
Change-Id: I70c0ee5adfac81dccae26b6756f424f4200ba584 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145990 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
2018-11-02COMPMID-1533 Implementing CLDepthWiseConvolutionLayer with FP16 (NHWC)Giorgio Arena
Change-Id: I46965aeb1fffba8cbf083cab7284c549b0e94d00 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/145334 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com>
2018-11-02COMPMID-1376: Add support for QASYMM8 in CLDeconvolutionLayerMichele Di Giorgio
Change-Id: I13ec79b6668e2b9559d3fa789ae0b51ab6975289 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/139126 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Jenkins <bsgcomp@arm.com>
2018-11-02COMPMID-1343: Add grouping support to CLCol2ImKernelMichele Di Giorgio
Change-Id: I5188a2163e7341f1915d98c21464fea13a9a7faf Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/143330 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>