aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/CLKernelLibrary.cpp
AgeCommit message (Collapse)Author
2018-11-02COMPUTE-8024 Fixed the maximum OpenCL workgroup sizeAbel Bernabeu
The maximum workgroup size depends on the kernel and the device, rather than being a property of the device. The present patch fixes the case when a kernel is queued with no workgroup size and the default workgroup size is used instead. A previous patch introduced a maximum workgroup size that depended on the device but ignored the kernel. In OpenCL the maximum workgroup size we query from the device is an upper bound of the actual maximum that we can query for a given kernel running on the same device. For some kernels the values will match, but for others we will get a lower value when querying for an specific kernel (i.e. if the kernel uses a high number of registers). Change-Id: I3bed6bde80ddc4f0ddb8f82c80903774aa1999b6 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/89471 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-452 CL Generic Depthwise Convolution implementation.Giorgio Arena
Change-Id: I115e48fe6ce5e281f3791aa5d80fdc754cdd2b5e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/85082 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-522 - Added support for GlobalPooling in CLPoolingLayer and ↵Gian Marco Iodice
CLFlattening for 3D tensor Change-Id: Ifc7db1e4d4af322a4dcbfeb3e132e5c326596872 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/86618 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-462: Implement TensorReshape for NEON and CL.Georgios Pinitas
Change-Id: I11b39c2ceca26ade73822e29a384ef866ae05729 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/87707 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-448: Implement CL Quantization/Dequantization Layer.Michele Di Giorgio
Change-Id: Id002e23a2ac48af3d245416dc6411d9a04a1e513 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81827 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-477 - Optimized CLDirectConvolution1x1 for BifrostGian Marco Iodice
- Fixed bug in CLDirectConvolution3x3 Change-Id: Iaf34ef44f0b7bc02e66f3eb4452ff7a90ef83523 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/86725 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-11-02COMPMID-476 L2 Normalization for CLMichalis Spyrou
Change-Id: I88f87173645880eb823916c5d4ac884c372a4fb4 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83269 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-358 Implement OpenCL ROI PoolingSiCong Li
* Implement OpenCL ROI Pooling * Add CLROIPoolingLayer benchmarks Change-Id: I8786d01d551850a1b4d599a48fabe3925e0a27d0 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79833 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-477 - Optimized batched case in CLConvolutionLayerGian Marco Iodice
Change-Id: I4ef18f49f1da0cb816aaa0762466b940792c15ed Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84162 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-513 Choose maximum local workgroup size at run timesteniu01
Change-Id: I9ab3cf6dc92a93b0ae5f746e078355e443b3a545 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84906 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-452 CL Depthwise Separable Convolution Layer kernel implementation, ↵Giorgio Arena
validation and benchmarking for 3x3xC depthwise filter and DataType::F32. Change-Id: I95c0c87709763cdbf58d0de66025eac86e30791b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82768 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Steven Niu <steven.niu@arm.com>
2018-11-02COMPMID-477 - Optimizing Pooling 3x3 with stride_x <= 3 on OpenCLGian Marco Iodice
Change-Id: Ie000166307cdb5bfae00ebf84d35e49a6bfb9dbd Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83372 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-477 - Optimized Direct Convolution 3x3 and 5x5 (f32) for Bifrost.Gian Marco Iodice
Each work-item computes 4x3 output elements in case of 3x3 convolution and 4x2 in case of 5x5 convolution Change-Id: I6ebbaff8b7e971c1f90d5845c0b58d2a40f39df5 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84345 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-478 Implemnt CL direct convolution 5x5steniu01
Change-Id: I4b975aff310cda9964d8c5dcee182d5d5c82741b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83474 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-09-17COMPMID-472 : Implement Floor for CL and NEON.Georgios Pinitas
Change-Id: I675a4545b1fe9ab665a07c834720bfe7ff589cee Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82527 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-09-17COMPMID-355 Implement CL DirectConvolution1x1SiCong Li
* Add FP16 to validation tests. * Complete benchmark tests for CL and NEON Direct Convolution. Change-Id: Ie73d8580832372db01b82b39786fd9c8be560090 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82014 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-455 - Optimizing CLIm2ColKernelGian Marco Iodice
Change-Id: Iee618948cc8f310ee9af2d786240e8120e4c6ab9 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81665 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-09-17COMPMID-355 Implement 3x3 CL direct convolutionsteniu01
Change-Id: I1b44dc375045964e65557f0ead57a7c12d6bf097 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81418 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-09-17COMPMID-417 Checking CL non uniform support at runtime.steniu01
What have been done in the ticket are: 1. Add support to check whether cl-non-unform-workgroup is supported at runtime 2. Add helper function to check the CL version at runtime 3. Add boolen to check whether CLSecheduler's init has been called. Change-Id: I6e6df8eb5cebfac7229aa406242bb183477fd191 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/80265 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-09-17COMPMID-434 - Port CLGEMM to support 16 bit fixed pointGian Marco Iodice
Change-Id: I30aef3c7ecd1ee740c2a7f2ce65a63c7dcd66e49 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79630 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-418 Add check and fix comments after preprocessor conditionsAnthony Barbier
Change-Id: I1353fd652ee180e3931e58b4ce13d651a48c7e2c Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79567 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-09-17COMPMID-408 Create OpenCL complex math functions for 8 bit fixed point ↵Michalis Spyrou
arithmetic. Logarithm, inverse square root, exponential and multiplication for 8 bit fixed point arithmetic in OPenCL. Change-Id: Ib976da7057242967c940df28ceebf39bc3ea3811 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78273 Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-411 - Port CLGEMM to support 8 bit fixed pointGian Marco Iodice
Change-Id: I6c8bd69ae9715e4d83d128b2162fc15aa5561afb Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78804 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-09-17COMPMID-423: Port CLSoftmaxLayer to QS8Georgios Pinitas
Change-Id: I759b7585656d018d7c864425118cd3ec2ca9b0eb Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78908 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-414 - Port CLConvolutionLayer to support 8 bit fixed point - ↵Gian Marco Iodice
CLGEMMMatrixAccumulateBiasesKernel Change-Id: Idba13b578dc564b8003ce2fa3392eea2af3ce806 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78664 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-09-17COMPMID-411 - Ported CLGEMMInterleave4x4Kernel and CLGEMMTranspose1xWKernel ↵Gian Marco Iodice
to support 8 bit fixed point Change-Id: If236c9047ed536e808a0ed26e97e1799ca938e03 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78529 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-09-17COMPMID-403:Add support for 7x7 pooling on CL.Georgios Pinitas
Change-Id: I3c2c8d7e8e61d7737170cb1568900ce4ac337068 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/78181 Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-09-17COMPMID-344 Updated doxygenAnthony Barbier
Change-Id: I32f7b84daa560e460b77216add529c8fa8b327ae