aboutsummaryrefslogtreecommitdiff
path: root/src/gpu
AgeCommit message (Expand)Author
2023-05-05Connect CLMatMul function to quantized kernels and resolve NE BatchMatMul int...Jakub Sujak
2023-05-04Implement OpenCL MatMul heuristic for Arm® Mali™-G710Gian Marco Iodice
2023-05-02Fix export_to_cl_image issue in the fp16 GeMM implementationGian Marco Iodice
2023-05-02Add fp16 GeMM heuristic for Arm® Mali™-G710Gian Marco Iodice
2023-04-27Add quantized CL MatMul kernel for LHS NT, RHS TJakub Sujak
2023-04-26Change fp16 GeMM heuristic for Arm® Mali™-G77Gian Marco Iodice
2023-04-26Improve Winograd performance on OpenCLGian Marco Iodice
2023-04-20Implement CL kernel for a native batched matmul Quantized - LHS transposed, R...Omar Al Khatib
2023-04-17Add quantized CL MatMul kernels for Lhs NT/T, Rhs NTGunes Bayir
2023-04-14Align naming convention of ClMatMulJakub Sujak
2023-04-04Support dynamic weights for Fully Connected layers on GPUJakub Sujak
2023-04-03Implement MatMul FunctionRamy Elgammal
2023-03-24Work around CLScale compiler-specific issueSiCong Li
2023-03-24Add Texture Pipe Support for Matmul Lhs T/NT Rhs NT kernelsGunes Bayir
2023-03-20Implement OpenCL MatMul for Lhs T Rhs T/NT FP32/16Gunes Bayir
2023-03-17Implementation of RSQRT for quantized int8Ramy Elgammal
2023-03-17Implement OpenCL MatMul for Lhs NT Rhs T/NT FP32/16Ramy Elgammal
2023-03-06Fix LWS search space used by CLTunerSiCong Li
2023-02-28Add an option to use lowest for max-poolingAdnan AlSinan
2023-01-18Add broadcast batched matmul validation casesSiCong Li
2023-01-17Fix ClGemm crashes on unsupported data typesSiCong Li
2023-01-10Fix CL DirectConvolutionLayer validate testsSiCong Li
2023-01-10Extend cl image support to input and output tensorsGian Marco Iodice
2022-12-29Optimize CL Scale/Resize Quantized by removing (de)quant. codeGunes Bayir
2022-12-29Update the ClConv2d heuristicGian Marco Iodice
2022-12-29Extend Transposed Conv. for tiles with N0>1Gunes Bayir
2022-12-23Make CLReshape kernel window based on dst instead of srcRamy Elgammal
2022-12-14Optimize Transposed Convolution for CL backend (Quantized)Gunes Bayir
2022-12-13Add CLAMP operator to Dynamic Fusion interfaceJakub Sujak
2022-12-12Fix build error resulting from incorrect header pathJakub Sujak
2022-12-09Use heuristics for setting dynamic fusion direct conv2d tile sizesRamy Elgammal
2022-12-09Implement the OpenCL kernel to compute the indirect convolutionGian Marco Iodice
2022-11-25Implement address precalculation for indirect conv2d - OpenCLGian Marco Iodice
2022-11-22Remove dynamic fusion prototype with tests and examplesSiCong Li
2022-11-14Optimize Transposed Convolution for CL backend (FP32/16)Gunes Bayir
2022-11-01Rework direct convolution heuristic on OpenCLGian Marco Iodice
2022-10-06Rework DepthwiseConvolution heuristic on OpenCLGian Marco Iodice
2022-10-06Improve start-up time in gemmlowp reshaped rhs only.Adnan AlSinan
2022-10-04Update GEMM reshaped rhs only heuristicGian Marco Iodice
2022-10-03Force CL kernel compilation with 64 registersViet-Hoa Do
2022-09-16Fix validation in validate_image2d_support_on_rhsGian Marco Iodice
2022-09-09Rework heuristic in ClConv2dGian Marco Iodice
2022-09-09Add a macro guard in all OpenCL kernels in gemmlowp.clGian Marco Iodice
2022-09-02Enable Winograd-based conv2d when IFM>=8 on GpuGian Marco Iodice
2022-08-17Revert "Fix performance regression in ClConv2D"Ramy Elgammal
2022-08-16Fix performance regression in ClConv2DGian Marco Iodice
2022-08-11Fix performance regression in Conv2D on OpenCLAdnan AlSinan
2022-08-11Disable unsafe FP optimizations in Winograd Output TransformGunes Bayir
2022-08-05Fix LeNet-f16 convolution regressionAdnan AlSinan
2022-07-22Add GemmLowp MMUL Reshaped Only Rhs Support for QASYMM8/QASYMM8_SIGNEDFreddie Liardet