aboutsummaryrefslogtreecommitdiff
path: root/src/core/CL/kernels
AgeCommit message (Collapse)Author
2018-11-02IVGCVSW-601: support for asymetric padding in cl conv and depthwise convJaroslaw Rzepecki
Change-Id: I5c6c95091ae77dba96459c0640f9f6167a988c8c Reviewed-on: http://mpd-gerrit.cambridge.arm.com/91700 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02IVGCVSW-632 CL support for Softmax beta parameterPablo Palmier
Change-Id: I21da48d2f40aa900301235eaced54b7eb644b0b2 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/91307 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-569 - Implemented reference implementation and validation tests ↵Isabella Gottardi
(NEON and CL) for Median3x3 Change-Id: I7028f0bcc4a502261210f536ed604a7651ab6726 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/91332 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-618: Fix mismatches in CLDepthConvert OclgrindGeorgios Pinitas
Out-of-bounds float to integer conversion is implementation defined. Oclgrind converts to S32 and truncated while GPU converts S32 and clamps. We force to always SATURATE for float to int conversion. Change-Id: I82be9e8cdcc49b32adb8c0da064542b63f891666 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/91512 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
2018-11-02COMPMID-554 Add NodesMichalis Spyrou
- BatchNormalization - L2Normalize - Floor Change-Id: I03e06dea30e956f56a86f9c5642cd609c6696ad2 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/91364 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-417 - Added validation for FP16 CLBatchNormalizationLayerGian Marco Iodice
Change-Id: Icc6194a311af0e96978e6be2cc4c5da9d7fb0bcc Reviewed-on: http://mpd-gerrit.cambridge.arm.com/89493 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: Steven Niu <steven.niu@arm.com>
2018-11-02COMPMID-417: Fix border and window in CLGEMMMatrixVectorMultiplyKernelGeorgios Pinitas
Change-Id: I2eacba2c87bce84b7f6b69a734ff775473f990bc Reviewed-on: http://mpd-gerrit.cambridge.arm.com/89401 Reviewed-by: Steven Niu <steven.niu@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-424 Implemented reference implementation and tests for WarpAffineIsabella Gottardi
Change-Id: I4924ab1de17adc3b880a5cc22f2497abbc8e221b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/85820 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Steven Niu <steven.niu@arm.com>
2018-11-02COMPIMID-523: Fix CLDepthwiseConvolution test.Georgios Pinitas
The specified output size of the failing test case was invalid. Additionally the kernel has been cleaned up and asserts have been added in case of invalid configurations. Change-Id: I198f3574f003b71968e4081a54cf102d748af5c1 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/88821 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Steven Niu <steven.niu@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-524 - Implemented CLTuner objectGian Marco
Change-Id: Idbdbecca1fc299ed042936119d90e2bed8db0938 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/87101 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-541: Fix padding in CLMinMaxLocationKernelMoritz Pflanzer
Change-Id: Ie17e3f14c428553d433da2a564e016bfac7749a9 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/88881 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
2018-11-02COMPMID-417 fix the depthwise conv bugsteniu01
Change-Id: Ica3c26d09f8009240467e0d3a12f585170fbcd44 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/88677 Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-515: L2 Pooling for FP32/FP16 in CL.Georgios Pinitas
Change-Id: I43641fa672f5905ca62edd1f63fc93e0cf7ea382 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/85963 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-516 Increase tolerance rate of Scale, Conv, fully connected and GEMMsteniu01
This patch also fix the scale kernel issue where it was calcuated the scale factor inside the gpu but now in the CPU. The GPU and CPU gave different result for simple float division operation Change-Id: Ib6709cb6c41dcf4fc0fa4eb79e481430695bf40e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/87266 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-11-02COMPMID-452 CL Generic Depthwise Convolution implementation.Giorgio Arena
Change-Id: I115e48fe6ce5e281f3791aa5d80fdc754cdd2b5e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/85082 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-522 - Added support for GlobalPooling in CLPoolingLayer and ↵Gian Marco Iodice
CLFlattening for 3D tensor Change-Id: Ifc7db1e4d4af322a4dcbfeb3e132e5c326596872 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/86618 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-417: Add support for floats in scale.Georgios Pinitas
Change-Id: I7d714ba13861509080a89817f54e9d32da83e970 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/86026 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-462: Implement TensorReshape for NEON and CL.Georgios Pinitas
Change-Id: I11b39c2ceca26ade73822e29a384ef866ae05729 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/87707 Reviewed-by: Pablo Tello <pablo.tello@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-417 Fix reduction kernel's __local buffer sizeMichalis Spyrou
Change-Id: If97a79d86b174b1d9b41360303d624e3b2d22001 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/87703 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-448: Implement CL Quantization/Dequantization Layer.Michele Di Giorgio
Change-Id: Id002e23a2ac48af3d245416dc6411d9a04a1e513 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81827 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-477 - Optimized CLDirectConvolution1x1 for BifrostGian Marco Iodice
- Fixed bug in CLDirectConvolution3x3 Change-Id: Iaf34ef44f0b7bc02e66f3eb4452ff7a90ef83523 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/86725 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-11-02COMPMID-514 (3RDPARTY_UPDATE)(DATA_UPDATE) Add support to load .npy dataSiCong Li
* Add tensorflow_data_extractor script. * Incorporate 3rdparty npy reader libnpy. * Port AlexNet system test to validation_new. * Port LeNet5 system test to validation_new. * Update 3rdparty/ and data/ submodules. Change-Id: I156d060fe9185cd8db810b34bf524cbf5cb34f61 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84914 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-415 - Fixed bug in CLDepthConcatenateKernelGian Marco Iodice
Change-Id: Ieedb714cb3666504c175aa488505e0485778c589 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/86705 Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-424 Implemented reference implementation, new output valid region ↵Isabella Gottardi
and validation tests (NEON and CL) for Scale Change-Id: I056fa3588b807a97cacf0b8afaec56e37ffc92af Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83872 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-476 L2 Normalization for CLMichalis Spyrou
Change-Id: I88f87173645880eb823916c5d4ac884c372a4fb4 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83269 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-417: Fix invalid read in CL GEMM accumulate biasesMoritz Pflanzer
Change-Id: Ie7786a29faa0d98d8ad65c2333d0d6a1665340bc Reviewed-on: http://mpd-gerrit.cambridge.arm.com/85635 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-358 Implement OpenCL ROI PoolingSiCong Li
* Implement OpenCL ROI Pooling * Add CLROIPoolingLayer benchmarks Change-Id: I8786d01d551850a1b4d599a48fabe3925e0a27d0 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79833 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-477 - Optimized batched case in CLConvolutionLayerGian Marco Iodice
Change-Id: I4ef18f49f1da0cb816aaa0762466b940792c15ed Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84162 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-417: Port PoolingLayer to new validation.Georgios Pinitas
Change-Id: I7f2f5f5f81ad9932661fc4c660bf90614288bc96 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/85270 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-452 CL Depthwise Separable Convolution Layer kernel implementation, ↵Giorgio Arena
validation and benchmarking for 3x3xC depthwise filter and DataType::F32. Change-Id: I95c0c87709763cdbf58d0de66025eac86e30791b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82768 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Steven Niu <steven.niu@arm.com>
2018-11-02COMPMID-477 - Optimized CLNormalizationLayerGian Marco Iodice
CLPixelWiseMultiplication has been removed within the function Change-Id: Ibe7edd7921d5cef6ff68fdeeca89771129a8eaea Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84459 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02COMPMID-431 Port OpenCL pooling layer to use fixed pointsteniu01
Change-Id: I6a73cd6582097aaefa83588aad789bdefdc74406 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79967 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com>
2018-11-02COMPMID-477 - Optimizing Pooling 3x3 with stride_x <= 3 on OpenCLGian Marco Iodice
Change-Id: Ie000166307cdb5bfae00ebf84d35e49a6bfb9dbd Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83372 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-417: Cleanup CL FullyConnectedLayerMoritz Pflanzer
Change-Id: Ic7191be1f136c6aad4037cf2ec4bc6d7d0e440d3 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83713 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-417 - Fixed call of direct convolution 1x1 for bifrostGian Marco Iodice
Change-Id: Ic4e56e8881b8c66758e67c486514ec397cf43f8e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84592 Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-477 - Optimized Direct Convolution 3x3 and 5x5 (f32) for Bifrost.Gian Marco Iodice
Each work-item computes 4x3 output elements in case of 3x3 convolution and 4x2 in case of 5x5 convolution Change-Id: I6ebbaff8b7e971c1f90d5845c0b58d2a40f39df5 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84345 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-11-02COMPMID-417: Add in-place support for batch-normalization.Georgios Pinitas
Change-Id: I4b0c9348f3bc2addc198a76fadd1b583abf42b60 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/84434 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-478 Implemnt CL direct convolution 5x5steniu01
Change-Id: I4b975aff310cda9964d8c5dcee182d5d5c82741b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83474 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-11-02COMPMID-474 - Add support for QS8/QS16 DirectConvolution CLMichalis Spyrou
Change-Id: I537e4acbc02c8d880ff8630ea62223e0f1a1dda3 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82875 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Pablo Tello <pablo.tello@arm.com>
2018-11-02COMPMID-424 NEON/CL Harris Corners validation tests.Giorgio Arena
Change-Id: I82d2a73f515a8d45d16b9ddb702fea51ae05c82e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79687 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-11-02COMPMID-417 - Fixed bug in CLCol2ImKernek related to the stride passed ↵Gian Marco Iodice
during the configuration Change-Id: I9818f72e5ddd0d21f6700c651fc968ff61507424 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83909 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Moritz Pflanzer <moritz.pflanzer@arm.com>
2018-11-02COMPMID-459 Collapse CL Im2col's higher dimensionssteniu01
Change-Id: I0ccc39cbcf6926e6810faf3fe264c4af7adc3f7b Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83070 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-11-02COMPMID-477 - Optimizing CLDirectConvolution 3x3 on OpenCL and added the ↵Gian Marco Iodice
auto configuration Change-Id: I3c8384dcbc9d7786943134bb658dafb35356d90d Reviewed-on: http://mpd-gerrit.cambridge.arm.com/83253 Reviewed-by: Steven Niu <steven.niu@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-415: Add autoconfigure to CLCol2ImKernelAnthony Barbier
Change-Id: I50c114d0c78d443a21bf43aa36a370474f0769ce Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82955 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2018-09-17COMPMID-417: Fix CLNormalization error issue.Georgios Pinitas
Change-Id: Ie538245ee0451e4cdb28120e80b9a65f56a07e7d Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82933 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Michele DiGiorgio <michele.digiorgio@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-09-17COMPMID-472 : Implement Floor for CL and NEON.Georgios Pinitas
Change-Id: I675a4545b1fe9ab665a07c834720bfe7ff589cee Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82527 Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
2018-09-17COMPMID-443 collapse higher dimension for CL col2im kernelsteniu01
Change-Id: I99d41c7c95b8d4e3cd5c1685c68936b6a2db4192 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81885 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-417 NEON/CL MeanStdDev bugfix using FillBorderKernelGiorgio Arena
Change-Id: Ic48ba7f69783d0e1e80611264e2bc67d1732436e Reviewed-on: http://mpd-gerrit.cambridge.arm.com/81293 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-355 Implement CL DirectConvolution1x1SiCong Li
* Add FP16 to validation tests. * Complete benchmark tests for CL and NEON Direct Convolution. Change-Id: Ie73d8580832372db01b82b39786fd9c8be560090 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/82014 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>
2018-09-17COMPMID-438: Add support for floating point Min-Max Location layer.Michele Di Giorgio
Change-Id: I84ae564a40fc7320a6f94a84d53906ba51404f51 Reviewed-on: http://mpd-gerrit.cambridge.arm.com/79797 Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Tested-by: Kaizen <jeremy.johnson+kaizengerrit@arm.com>