ComputeLibrary.git -

Age	Commit message (Collapse)	Author
2021-07-25	Reorganize the kernels into nhwc, nchw and common folders	Adnan AlSinan
	The Following kernels have been split into nchw/nhwc kernels files: - batchnormalization_layer - batch_to_space - channel_shuffle - depth_to_space - dequantization_layer - im2col - normalization_layer - normalize_planar_yuv_layer - normalize_planar_yuv_layer_quantized - pooling_layer - pooling_layer_quantized - remap - reorg_layer - scale - scale_quantized - space_to_batch - space_to_depth - upsample_layer - winograd_filter_transform - winograd_input_transform - winograd_output_transform The following kernels have been moved to nchw folder: - direct_convolution1x1 - direct_convolution3x3 - direct_convolution5x5 - direct_convolution_quantized - prior_box_layer The following kernels have been moved to nhwc folder: - direct_convolution - dwc_native_fp_nhwc - dwc_native_quantized_nhwc The following kernels have been removed: - sobel_filter While the rest kerenls have been moved to the common folder. Partially resolves COMPMID-4453 Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com> Change-Id: Ic327ac935687ec351c610c65a3c6357f364a5a58 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5919 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2021-05-12	Fix 'ARM_DOT_K0XN0' macro redefined	Giorgio Arena
	Resolve COMPMID-4339 Change-Id: Ieacd288ca8f69e04d88692a093325a24198ebe3e Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5623 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-12-02	Remove support for (NE/CL)LocallyConnectedLayer	Georgios Pinitas
	Remove out-of-date and unmaintained LocallyConnectedLayer for both NEON and OpenCL. Resolves: COMPMID-3924 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Change-Id: Ia61398ed8cfa3876f41c1b342c4a80d1cca0ca83 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4634 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-11-23	Fix output address calculation in GEMM OpenCL kernels	Michele Di Giorgio
	Resolves COMPMID-3977 Change-Id: I222e0d1726993e54699646323820fc4ae53ab520 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4530 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-11-05	COMPMID-3730 Remove padding from CLGEMMMatrixMultiplyKernel Patch1	SiCong Li
	* Remove default definition for STORE_BLOCK_BOUNDARY_AWARE to avoid elusive bugs * Clean up gemm_mm_interleaved* and gemm_mm_floating_point* kernels * Relocate to gemm_v1.cl to avoid clashing with new kernels * Rename compile time arguments to conform with the established terminology(MNKB), and to facilitate the use of STORE_BLOCK_BOUNDARY_AWARE Change-Id: Ia85c746b2536cad87257a79685b459b5d2f9a1be Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4329 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-10-26	COMPMID-3925: Dispatch CLGEMM with no padding y requirement	Gian Marco Iodice
	- Add has_pad_y flag in GEMMKernelInfo - Skip reinterpret as 3D in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel if has_pad_y = false - Add test to validate CLGEMMMatrixMultiplyReshapedOnlyRHSkernel with had_pad_y = false/true - Configure two variants of CLGEMMMatrixMultiplyReshapedOnlyRHSKernel to run with has_pad_y = false/true in CLGEMM - Check if the lhs/dst tensors have pad y. If not, run CLGEMMMatrixMultiplyReshapedOnlyRHSKernel without padding requirement Change-Id: I68bb43389789736d676b899ac7c77fd9138babaf Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4248 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-10-21	COMPMID-3712 Remove OpenCL padding: CLDepthwiseConvolutionLayer3x3NHWCKernel ↵	Giorgio Arena
	FP16/32 Removed unused N from partial block loading macro Created utility to assert change in padding Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Change-Id: Ifdd30c66dbf5f2842c6b2d939000613d5011708e Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4192 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-10-12	COMPMID-3826 ArmNN Nightly failing for CL	Giorgio Arena
	Change-Id: I09f557b5cecafc669e12764e8592457212168d62 Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4131 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-08-12	COMPMID-3337: Remove write paddings in both axes from ↵	Gian Marco Iodice
	CLGEMMMatrixMultiplyReshapedKernel - Change the interface of STORE_BLOCK_BOUNDARY_AWARE passing the conditions on Y and X rather than the X/ coordinates. This allows to use the macro with both GEMM reshaped and GEMM reshaped rhs only - Remove padding from the output tensor of CLGEMMMatrixMultiplyReshapedKernel - Add tests for validating the zero padding requirement Change-Id: I13263cc71ce065c5be34ed198def320dd5823495 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3712 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-08-11	COMPMID-3335: Remove x/y-axis padding from CLGEMMReshapeLHSMatrixKernel	Gian Marco Iodice
	- Remove padding requirement for the input tensor of CLGEMMReshapeLHSMatrixKernel - Add utility function to load a boundary aware 2d tensor from buffer - Extend validation for validating the zero padding requirement Change-Id: I0ac6b1b517d75fd56998f406e0cce97b40918ce1 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3701 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: SiCong Li <sicong.li@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-07-21	COMPMID-3331 Remove y load padding from ↵	SiCong Li
	CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLGEMMMatrixMultiplyNativeKernel Resolves: COMPMID-3333, COMPMID-3334 * Implement an "overlap load, but don't overlap store" strategy: - Change STORE_BLOCK_BOUNDARY_AWARE so that the partial block in y dimension is placed at the beginning instead of at the end. - Implement 3 auxiliary functions to calculate the lhs, bias and dst addresses, taking into account the potential partial block in y dimension. * Remove y load padding from Lhs and Bias tensors in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLGEMMMatrixMultiplyNativeKernel * Modify config tests to assert zero-padding in new dimensions Change-Id: I8f8585c7c0f543d720c2c91b885417c7dad35af4 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3576 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-07-20	COMPMID-3532: Align data type support between doxygen and implementation - CL	Michele Di Giorgio
	Also removes some unused code. Change-Id: I85687c40999c3cdf9e6fccfcd020b0901a9515fe Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3581 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-07-13	COMPMID-3338 COMPMID-3336 COMPMID-3584	SiCong Li
	COMPMID-3338 Remove store padding in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel COMPMID-3336 Remove store padding in CLGEMMMatrixMultiplyNativeKernel COMPMID-3584 Fix VSTORE to correctly deal with scalar case * Implement STORE_BLOCK_BOUNDARY_AWARE, as part of the COMPMID-3332 investigation, with the following substantial changes: - Separate STORE_BLOCK_PARTIAL, STORE_ROW_PARTIAL and VSTORE_PARTIAL so that this change does not affect kernels not using STORE_BLOCK_BOUNDARY_AWARE. - Revamp vstore_ext_n to vstore_partial_n, and enhance VSTORE_PARTIAL to correctly handle both vector and scalar cases * Remove the store padding (dst tensor) in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLGEMMMatrixMultiplyNativeKernel * Add configuration tests to check no padding is added by the configuration. Signed-off-by: SiCong Li <sicong.li@arm.com> Change-Id: I4f0907867979d8dacedd03b4bcbd2fb19e4f1602 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3522 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2020-07-09	COMPMID-3324: Adjusting capitalization of Arm copyright claim to reflect Arm ↵	Michele Di Giorgio
	preferred presentation Change-Id: Ib7dcfcbb24b408999dfae366b9da396485aacf78 Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3525 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-06-29	COMPMID-3322: Add cl_image support for GEMMReshapedOnlyRHS NT	Gian Marco Iodice
	COMPMID-3323: Add cl_image support for GEMMReshapedOnlyRHS T - Added support for cl_image in CLGEMMMatrixMultiplyReshapedInlyRHSKernel (both NT and T kernels) - Extended the tests for the validating rhs_info.export_to_cl_image = true - Updated doxygen documentation in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.h Change-Id: If253794323aac072d84a4d8680b9a2339ab7ad92 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3437 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-06-26	COMPMID-3560: Fix F16 performance regression (OpenCL)	Gian Marco Iodice
	The performance regression was caused by a change in the interface of the OpenCL kernels gemm_mm_reshaped_lhs_* Change-Id: I030df4975dc040886c17e71710a27137b50edd9b Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3465 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2020-06-18	COMPMID-3320: Add cl_image support for GEMMReshaped T_NT	Gian Marco Iodice
	COMPMID-3321: Add cl_image support for GEMMReshaped NT_T - Added support for cl_image in CLGEMMMatrixMultiplyReshapedKernel (both NT and T kernels) - Extended the tests for the validating rhs_info.export_to_cl_image = true - Added utility macros in OpenCL to load data from a OpenCL image object - Updated doxygen documentation in CLGEMMMatrixMultiplyReshapedKernel.h - Updated doxygen documentation in CLGEMMReshapeRHSMatrixKernel.h Change-Id: I953b10e4ef205d1b76dcbc366e5a91fd5a8e1d5c Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3329 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2020-05-14	COMPMID-3290: Test improvement for CLGEMMMatrixMultiplyReshapedOnlyRHSKernel	Sheri Zhang
	Signed-off-by: Sheri Zhang <sheri.zhang@arm.com> Change-Id: I7335ee07f777087e06ca26f762b2b5e3668362ab Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3175 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Sang-Hoon Park <sang-hoon.park@arm.com>
2019-10-16	COMPMID-2741: [CL] GEMMMatrixMultiplyReshaped clCreateKernel Error	Georgios Pinitas
	Using the accumulator type during the activation in gemm for mixed precision operations as some OpenCL compiler versions complain for mixing float data types for primitive built-ins. Change-Id: I1c4b394c57962aeebb541de572614a7e3ee3f223 Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-on: https://review.mlplatform.org/c/2080 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-09-30	COMPMID-2571: Add mixed-precision support in CLGEMMReshaped for FP16	Gian Marco Iodice
	Change-Id: I5ba90d4de4594ed784c7230aa6b10503be67c001 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1991 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-09-26	COMPMID-2571: Add support for FP16 in CLGEMMReshaped - part 1	Gian Marco Iodice
	Change-Id: I8adb8850cc5ade49ebc1dbf63401f03d5ecad708 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1983 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-09-20	COMPMID-2675: Fix arguments passed at compile time for GEMM - OpenCL	Gian Marco Iodice
	Change-Id: I47b84a6f815492e24771d488aa8b29d14e572f40 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1956 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-09-02	COMPMID-1965 Extend CLGEMMMatrixMultiplyReshapedKernel to support transposed ↵	Giorgio Arena
	LHS (t) and not-transpose RHS Change-Id: I437a00d7213fefd6f4365071b46174d44df8b85c Signed-off-by: Giorgio Arena <giorgio.arena@arm.com> Reviewed-on: https://review.mlplatform.org/c/1677 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-07-23	COMPMID-1979: Fuse Activation Function in CLGEMM - part 3	Gian Marco Iodice
	Fused beta*bias in in the old cl gemm kernels Fused activation function in the old cl gemm kernels Change-Id: I695fb9189e6d4792010abd256784624982d17d79 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1587 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-07-17	COMPMID-1979: Fuse Activation Function in CLGEMM - part 2	Gian Marco Iodice
	Fuse activation function in: CLGEMMMatrixMultiplyNativeKernel CLGEMMMatrixMultiplyReshapedKernel CLGEMMMatrixMultiplyReshapedOnlyRHSKernel Change-Id: I033ace2bdc58903594c9f31175e4b23c4b559f6f Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1565 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
2019-06-24	COMPMID-2172: Fuse bias addition with CLGEMMMatrixMultiplyNativeKernel	Gian Marco Iodice
	Change-Id: I714b92ec001fc71172719b67fb66d490538b6948 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1399 Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-06-20	COMPMID-2053: Fuse bias addition with CLGEMMMatrixMultiplyReshapedKernel	Gian Marco Iodice
	Change-Id: I5bfd38c94a6fd18a1cba2104f7e1b04e7bef6ec2 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1359 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-06-04	COMPMID-2171: Fuse bias addition with CLGEMMMatrixMultiplyReshapedOnlyRHSKernel	Georgios Pinitas
	Change-Id: I1d1e1f28fe7022309d72900893e8368820ca0f89 Signed-off-by: giuros01 <giuseppe.rossini@arm.com> Reviewed-on: https://review.mlplatform.org/c/1259 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-05-20	COMPMID-2338: Remove CLGEMMInterleave4x4 and CLGEMMTranspose1xW	Gian Marco Iodice
	Change-Id: I527fc97eac51308de601e5d1d50e75e4d89c5ee5 Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/1158 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-05-17	COMPMID-2093: Implement CLGEMMNative	giuros01
	Change-Id: I347130f6b5ae8d08b7c5c101b523b158565874a1 Signed-off-by: giuros01 <giuseppe.rossini@arm.com> Reviewed-on: https://review.mlplatform.org/c/1114 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-05-16	COMPMID-2041: Create GEMM helper file for OpenCL.	Usama Arif
	Change-Id: I7203d7e4d5540536b5e6638c81b26a955aa70f5c Signed-off-by: Usama Arif <usama.arif@arm.com> Reviewed-on: https://review.mlplatform.org/c/1144 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Tested-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2019-04-01	COMPMID-2002: Implement CLGEMMLowpMatrixMultiplyReshapedOnlyRHS - Transposed	Gian Marco Iodice
	Change-Id: I3907d151107766dc34749fe5710d7219e810b39f Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/875 Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2019-03-20	COMPMID-2043: Add support for "dummy threads" in CLGEMMReshaped	Gian Marco Iodice
	Change-Id: I89403b97503fbb99f6a32f5d62b8c535ab26a7be Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/877 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
2019-03-12	COMPMID-1964: Implement CLGEMMMatrixMultiplyReshapedOnlyRHS - Not transposed	Gian Marco Iodice
	Change-Id: I6b7f7a406fcb8d64adb221a24d7983e11bcb391e Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/846 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2019-03-08	COMPMID-2000: Implement CLGEMMMatrixMultiplyReshapedOnlyRHS - Transposed	Gian Marco Iodice
	Change-Id: I364c7ec5a43ad391a73429489802b0e679ee0c6e Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-on: https://review.mlplatform.org/c/732 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2019-02-07	COMPMID-1706: Fuse the bias addition within CLGEMM	Michele Di Giorgio
	Change-Id: I378f2023f4fa010f195f76716ac07aa86279bfae Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-on: https://review.mlplatform.org/280 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
2019-01-24	COMPMID-1900: Nightly issue with GEMMReshapeLHSMatrix	Gian Marco Iodice
	Change-Id: I01656785fe58e6157469f6c6e66bb7eec53380f1 Reviewed-on: https://review.mlplatform.org/563 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-01-21	COMPMID-1899: Fix NaN issue in CLGEMMMatrixMultiplyReshapedKernel	Gian Marco Iodice
	Change-Id: Ide950b46c4d41de230c272c7044a03f4f9f237ed Reviewed-on: https://review.mlplatform.org/548 Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Giuseppe Rossini <giuseppe.rossini@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2019-01-15	COMPMID-1687: Optimize CLGEMMMatrixMultiplyKernel (part 1)	Gian Marco Iodice
	Extended CLGEMMMatrixMultiplyReshapedKernel to support more parameters Change-Id: I4a27f986e3fe2dd071a4ccba5cfa0565f3db39ad Reviewed-on: https://review.mlplatform.org/495 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2019-01-09	COMPMID-1837 : Implement REPEAT utility macro on OpenCL	Vidhya Sudhan Loganathan
	Change-Id: I2b0dbfe7d430a8d0f62eb906f0334b16cde9e45b Reviewed-on: https://review.mlplatform.org/457 Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-12-20	COMPMID-1858: Fix boundary check in gemm_reshape_rhs_matrix_t and ↵	Gian Marco Iodice
	gemm_reshape_rhs_matrix_nt Change-Id: Ic935f1947f3f3a60455a8108fcf6f313258abe51 Reviewed-on: https://review.mlplatform.org/425 Tested-by: Anthony Barbier <Anthony.barbier@arm.com> Reviewed-by: Manuel Bottini <manuel.bottini@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2018-12-19	COMPMID-1834: Add transpose support to CLGEMMReshapeLHSMatrixKernel	Gian Marco Iodice
	Change-Id: I913a7297a0c34a05b1d37eab1489b430423700e8 Reviewed-on: https://review.mlplatform.org/417 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2018-12-17	COMPMID-1710: Fixing gemm_mm_reshaped_lhs_nt_rhs_t with REINTERPRET_OUTPUT_AS_3D	Gian Marco Iodice
	Change-Id: I9af1f7263c6e71e38af97f3112d35044cf60ddf0 Reviewed-on: https://review.mlplatform.org/403 Reviewed-by: Anthony Barbier <Anthony.barbier@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2018-12-14	COMPMID-1687: Optimize CLGEMMMatrixMultiplyKernel for Mali-G76 - Part1	Gian Marco Iodice
	The current implementation is limited just to FP32 Change-Id: I185ab57e483e879d7c301e9cc3033efc8b41e244 Reviewed-on: https://review.mlplatform.org/389 Reviewed-by: Anthony Barbier <Anthony.barbier@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
2018-12-11	COMPMID-1775: Implement CLGEMMReshapeRHSMatrixKernel to reshape the RHS ↵	Gian Marco Iodice
	matrix of GEMM/GEMMLowp Change-Id: I77f2bfcc5d170bcc2428a2f27104942c1ec877d7 Reviewed-on: https://review.mlplatform.org/375 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2018-12-10	COMPMID-1774: Implement CLGEMMReshapeLHSMatrixKernel to reshape the LHS ↵	Gian Marco Iodice
	matrix of GEMM/GEMMLowp Change-Id: I8c5fd4c8bcdffda1522c83158981ed92baa045f4 Reviewed-on: https://review.mlplatform.org/364 Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com>
2018-11-20	COMPMID-1801 : (Nightly) CLWinogradConvolutionLayer FP16 mismatches	Vidhya Sudhan Loganathan
	FP mixed precision support added to GEMM kernel used for fp16 winograd conv on Midgard GPUs Change-Id: I1619beb025fc484a1ac9d3e528d785edabbc7ee6
2018-11-16	COMPMID-1266 : Add support for FP16 in CLWinogradConvolutionLayer: 5x5 kernels	Vidhya Sudhan Loganathan
	Introduced F32 accumulation for F16 winograd gemm and output transform WinogradConvolution will be available for F16 only if fast math flag is enabled Change-Id: I215593c205236a0f9669218437bb40b184ec6a4f
2018-11-02	COMPMID-1413 - Improve the performance of GEMMLowp with 8 bit dot product on ↵	Gian Marco Iodice
	OpenCL COMPMID-1424 - Add dot product support for CLDepthwise QASYMM8 3x3 NHWC non-unit stride With this patch we are able to improve the performance of MobileNet v1-qasymm8 by 37 % Tried to use the dot product instruction in CLDepthwise QASYMM8 3x3 NHWC non-unit stride but I have not seen any benefit (maybe because we have few arithemtic operation and we do not have more load instructions). However Depthwise convolution has been improved by 30% Change-Id: Id768a99c2e53a04276707e427af5d0ec93419ada Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/155082 Tested-by: bsgcomp <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2018-11-02	COMPMID-1276 - Allow GEMM to work with 3D input tensor	Gian Marco Iodice
	Skipped im2col in CLGEMMConvolutionLayer for 1x1 convolutions with NHWC data layout Change-Id: I894e6b952ed8605e8f3ffc0ffc25c24730d4664c Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/141909 Tested-by: Jenkins <bsgcomp@arm.com> Reviewed-by: Anthony Barbier <anthony.barbier@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>