aboutsummaryrefslogtreecommitdiff
path: root/docs/user_guide/release_version_and_change_log.dox
diff options
context:
space:
mode:
Diffstat (limited to 'docs/user_guide/release_version_and_change_log.dox')
-rw-r--r--docs/user_guide/release_version_and_change_log.dox442
1 files changed, 372 insertions, 70 deletions
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox
index 3ffa11b045..ca8092797f 100644
--- a/docs/user_guide/release_version_and_change_log.dox
+++ b/docs/user_guide/release_version_and_change_log.dox
@@ -1,5 +1,5 @@
///
-/// Copyright (c) 2017-2021 Arm Limited.
+/// Copyright (c) 2017-2024 Arm Limited.
///
/// SPDX-License-Identifier: MIT
///
@@ -37,9 +37,311 @@ If there is more than one release in a month then an extra sequential number is
v17.04 (First release of April 2017)
@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
+@note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly.
@section S2_2_changelog Changelog
+v24.05 Public major release
+ - Add @ref CLScatter operator for FP32/16, S32/16/8, U32/16/8 data types
+
+v24.04 Public major release
+ - Add Bfloat16 data type support for @ref NEMatMul.
+ - Add support for SoftMax in SME2 for FP32 and FP16.
+ - Add support for in place accumulation to CPU GEMM kernels.
+ - Add low-precision Int8 * Int8 -> FP32 CPU GEMM which dequantizes after multiplication
+ - Add is_dynamic flag to QuantizationInfo to signal to operators that it may change after configuration
+ - Performance optimizations:
+ - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
+ - Optimize @ref NEConvolutionLayer for input tensor size > 1e7 bytes and weight tensor height > 7
+ - Optimize @ref NESoftmaxLayer for axis != 0 by natively supporting higher axes up to axis 3.
+
+v24.02.1 Public patch release
+ - Fix performance regression in fixed-format kernels
+ - Fix compile and runtime errors in arm_compute_validation for Windows on Arm(WoA)
+
+v24.02 Public major release
+ - Replace template writer with compute kernel writer in dynamic fusion.
+ - Performance optimizations:
+ - Parallelize @ref NEDepthwiseConvolutionLayer over batches if there is only 1 row
+
+v24.01 Public major release
+ - Remove the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose.
+ You should link only to the main `libarm_compute` library for core functionality.
+ - Expand GPUTarget list with Mali™ G720 and G620.
+ - Optimize CPU activation functions using LUT-based implementation:
+ - Sigmoid function for FP16.
+ - New features
+ - Add support for FP16 in all multi_isa builds.
+ - Performance optimizations:
+ - Optimize @ref NESoftmaxLayer
+ - Optimize @ref NEDepthToSpaceLayer.
+
+v23.11 Public major release
+ - New features
+ - Add support for input data type U64/S64 in CLCast and NECast.
+ - Add support for output data type S64 in NEArgMinMaxLayer and CLArgMinMaxLayer
+ - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface:
+ - @ref experimental::dynamic_fusion::GpuCkwResize
+ - @ref experimental::dynamic_fusion::GpuCkwPool2d
+ - @ref experimental::dynamic_fusion::GpuCkwDepthwiseConv2d
+ - @ref experimental::dynamic_fusion::GpuCkwMatMul
+ - Add support for OpenCL™ comand buffer with mutable dispatch extension.
+ - Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82.
+ - Add support for negative axis values and inverted axis values in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse.
+ - Add new OpenCL™ kernels:
+ - @ref opencl::kernels::ClMatMulLowpNativeMMULKernel support for QASYMM8 and QASYMM8_SIGNED, with batch support
+ - Performance optimizations:
+ - Optimize @ref cpu::CpuReshape
+ - Optimize @ref opencl::ClTranspose
+ - Optimize @ref NEStackLayer
+ - Optimize @ref CLReductionOperation.
+ - Optimize @ref CLSoftmaxLayer.
+ - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
+ - Reduce CPU Overhead by optimal flushing of CL kernels.
+ - Deprecate support for Bfloat16 in @ref cpu::CpuCast.
+ - Support for U32 axis in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse will be deprecated in 24.02.
+ - Remove legacy PostOps interface. PostOps was the experimental interface for kernel fusion and is replaced by the new Dynamic Fusion interface.
+ - Update OpenCL™ API headers to v2023.04.17
+
+v23.08 Public major release
+ - Deprecate the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose.
+ Users must no longer link their applications to this library and instead link only to the main `libarm_compute` library for core functionality.
+ - New features
+ - Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output.
+ - Add multi-sketch support for dynamic fusion.
+ - Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers.
+ - Add Fused Activation to CLMatMul.
+ - Implement FP32/FP16 @ref opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension.
+ - Use MatMul in fully connected layer with dynamic weights when supported.
+ - Optimize CPU depthwise convolution with channel multiplier.
+ - Add support in CpuCastKernel for conversion of S64/U64 to F32.
+ - Add new OpenCL™ kernels:
+ - @ref opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support
+ - Enable transposed convolution with non-square kernels on CPU and GPU.
+ - Add support for input data type U64/S64 in CLCast.
+ - Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion.
+ - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only:
+ - @ref experimental::dynamic_fusion::GpuCkwActivation
+ - @ref experimental::dynamic_fusion::GpuCkwCast
+ - @ref experimental::dynamic_fusion::GpuCkwDirectConv2d
+ - @ref experimental::dynamic_fusion::GpuCkwElementwiseBinary
+ - @ref experimental::dynamic_fusion::GpuCkwStore
+ - Various optimizations and bug fixes.
+
+v23.05.1 Public patch release
+ - Enable CMake and Bazel option to build multi_isa without FP16 support.
+ - Fix compilation error in NEReorderLayer (aarch64 only).
+ - Disable invalid (false-negative) validation test with CPU scale layer on FP16.
+ - Various bug fixes
+
+v23.05 Public major release
+ - New features:
+ - Add new Arm® Neon™ kernels / functions:
+ - @ref NEMatMul for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
+ - NEReorderLayer (aarch64 only)
+ - Add new OpenCL™ kernels / functions:
+ - @ref CLMatMul support for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
+ - Add support for the multiple dimensions in the indices parameter for both the Arm® Neon™ and OpenCL™ implementations of the Gather Layer.
+ - Add support for dynamic weights in @ref CLFullyConnectedLayer and @ref NEFullyConnectedLayer for all data types.
+ - Add support for cropping in the Arm® Neon™ and OpenCL™: implementations of the BatchToSpace Layer for all data types.
+ - Add support for quantized data types for the ElementwiseUnary Operators for Arm® Neon™.
+ - Implement RSQRT for quantized data types on OpenCL™.
+ - Add FP16 depthwise convolution kernels for SME2.
+ - Performance optimizations:
+ - Improve CLTuner exhaustive mode tuning time.
+ - Deprecate dynamic block shape in @ref NEBatchToSpaceLayer and @ref CLBatchToSpaceLayer.
+ - Various optimizations and bug fixes.
+
+v23.02.1 Public patch release
+ - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels.
+ - Fixes for experimental CPU only Bazel and CMake builds.
+
+v23.02 Public major release
+ - New features:
+ - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator.
+ - Add the following operators to the experimental dynamic fusion API:
+ - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub.
+ - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling.
+ - Add new CPU operator AddMulAdd for float and quantized types.
+ - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings.
+ - Add experimental support for CPU only Bazel and CMake builds.
+ - Performance optimizations:
+ - Optimize CPU base-e exponential functions for FP32.
+ - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible.
+ - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel.
+ - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain.
+ - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain.
+ - Update the heuristic for CLDepthwiseConvolutionNative kernel.
+ - Add new optimized OpenCL kernel to compute indirect convolution:
+ - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink
+ - Add new optimized OpenCL kernel to compute transposed convolution:
+ - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink
+ - Update recommended/minimum NDK version to r20b.
+ - Various optimizations and bug fixes.
+
+v22.11 Public major release
+ - New features:
+ - Add new experimental dynamic fusion API.
+ - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32.
+ - Add CPU MeanStdDevNorm for QASYMM8.
+ - Add CPU and GPU GELU activation function for FP32 and FP16.
+ - Add CPU swish activation function for FP32 and FP16.
+ - Performance optimizations:
+ - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8.
+ - Optimize CPU activation functions using LUT-based implementation:
+ - Sigmoid function for QASYMM8 and QASYMM8_SIGNED.
+ - Hard swish function for QASYMM8_SIGNED.
+ - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic.
+ - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D.
+ - Optimize GPU depthwise convolution kernel and heuristic.
+ - Optimize GPU Conv2d heuristic.
+ - Optimize CPU MeanStdDevNorm for FP16.
+ - Optimize CPU tanh activation function for FP16 using rational approximation.
+ - Improve GPU GeMMLowp start-up time.
+ - Various optimizations and bug fixes.
+
+v22.08 Public major release
+ - Various bug fixes.
+ - Disable unsafe FP optimizations causing accuracy issues in:
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink
+ - @ref CLDepthwiseConvolutionLayerNativeKernel
+ - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add.
+ - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615.
+ - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel.
+ - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310.
+ - Extend the direct convolution 2d interface to configure the block size.
+ - Update ClConv2D heuristic to use direct convolution.
+ - Use official Khronos® OpenCL extensions:
+ - Add cl_khr_integer_dot_product extension support.
+ - Add support of OpenCL 3.0 non-uniform workgroup.
+ - Cpu performance optimizations:
+ - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build.
+ - Optimize Add layer by considering the input tensors as 1D array.
+ - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights.
+ - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator.
+ - Add experimental support for native builds for Windows® on Arm™.
+ - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions.
+ - Build flag change: toolchain_prefix, compiler_prefix:
+ - Use empty string "" to suppress any prefixes.
+ - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified.
+ - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools.
+ - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto".
+ - armv7a with Android build will no longer be tested or maintained.
+
+v22.05 Public major release
+ - Various bug fixes.
+ - Various optimizations.
+ - Add support for NDK r23b.
+ - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
+ - New Arm® Neon™ kernels / functions :
+ - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink
+ - New OpenCL kernels / functions :
+ - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink
+ - Improve the start-up times for the following OpenCL kernels:
+ - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink
+ - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink
+ - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink
+ - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink
+ - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
+ - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
+ - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink
+ - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink
+ - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink
+ - @ref NEFuseBatchNormalizationKernel
+ - @ref NEL2NormalizeLayerKernel
+
+v22.02 Public major release
+ - Various bug fixes.
+ - Various optimizations.
+ - Update A510 arm_gemm cpu Kernels.
+ - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
+ - Improve the start-up time for the following OpenCL kernels:
+ - @ref CLScale
+ - @ref CLGEMM
+ - @ref CLDepthwiseConvolutionLayer
+ - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
+ - Remove functions:
+ - CLRemap
+ - NERemap
+ - Remove padding from OpenCL kernels:
+ - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
+ - Remove padding from Cpu kernels:
+ - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
+ - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
+ - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink
+ - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink
+ - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink
+ - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink
+ - @ref NEBoundingBoxTransformKernel
+ - @ref NECropKernel
+ - @ref NEComputeAllAnchorsKernel
+ - @ref NEInstanceNormalizationLayerKernel
+ - NEMaxUnpoolingLayerKernel
+ - @ref NEMeanStdDevNormalizationKernel
+ - @ref NERangeKernel
+ - @ref NEROIAlignLayerKernel
+ - @ref NESelectKernel
+
+v21.11 Public major release
+ - Various bug fixes.
+ - Various optimizations:
+ - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types
+ - Improve performance of Softmax on GPU for Uint8/Int8
+ - New OpenCL kernels / functions:
+ - @ref CLConv3D
+ - New Arm® Neon™ kernels / functions:
+ - @ref NEConv3D
+ - Support configurable build by a selected subset of operator list
+ - Support MobileBert on Neon™ backend
+ - Improve operator/function logging
+ - Remove padding from OpenCL kernels:
+ - ClPool2dKernel
+ - ClScaleKernel
+ - ClGemmMatrixMultiplyReshapedKernel
+ - Remove padding from Cpu kernels:
+ - CpuPool2dKernel
+ - Remove Y padding from OpenCL kernels:
+ - ClGemmMatrixMultiplyKernel
+ - ClGemmReshapedRHSMatrixKernel
+ - Remove legacy GeMM kernels in gemm_v1.cl
+
+v21.08 Public major release
+ - Various bug fixes.
+ - Various optimizations:
+ - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used
+ - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures
+ - Add dynamic weights support in Fully connected layer (CPU/GPU)
+ - Various performance optimizations for floating-point data types (CPU/GPU)
+ - Add a reduced core library build arm_compute_core_v2
+ - Expose Operator API
+ - Support fat binary build for arm8.2-a via fat_binary build flag
+ - Add CPU discovery capabilities
+ - Add data type f16 support for:
+ - CLRemapKernel
+ - Port the following functions to stateless API:
+ - @ref CLConvolutionLayer
+ - @ref CLFlattenLayer
+ - @ref CLFullyConnectedLayer
+ - @ref CLGEMM
+ - @ref CLGEMMConvolutionLayer
+ - @ref CLGEMMLowpMatrixMultiplyCore
+ - @ref CLWinogradConvolutionLayer
+ - @ref NEConvolutionLayer
+ - @ref NEFlattenLayer
+ - @ref NEFullyConnectedLayer
+ - @ref NEGEMM
+ - @ref NEGEMMConv2d
+ - @ref NEGEMMConvolutionLayer
+ - @ref NEGEMMLowpMatrixMultiplyCore
+ - @ref NEWinogradConvolutionLayer
+ - Remove the following functions:
+ - CLWinogradInputTransform
+ - Remove CLCoreRuntimeContext
+ - Remove ICPPSimpleKernel
+ - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h
+
v21.05 Public major release
- Various bug fixes.
- Various optimisations.
@@ -62,7 +364,7 @@ v21.05 Public major release
- @ref NEDeconvolutionLayer
- Remove padding from OpenCL kernels:
- @ref CLL2NormalizeLayerKernel
- - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - CLDepthwiseConvolutionLayer3x3NHWCKernel
- @ref CLNormalizationLayerKernel
- @ref CLNormalizePlanarYUVLayerKernel
- @ref opencl::kernels::ClMulKernel
@@ -153,7 +455,7 @@ v21.05 Public major release
- CLThreshold
- CLWarpAffine
- CLWarpPerspective
-
+
v21.02 Public major release
- Various bug fixes.
- Various optimisations.
@@ -165,8 +467,8 @@ v21.02 Public major release
- @ref NEActivationLayer
- @ref NEArithmeticAddition
- @ref NEBatchNormalizationLayerKernel
- - @ref cpu::kernels::CpuLogits1DSoftmaxKernel
- - @ref cpu::kernels::CpuLogits1DMaxKernel
+ - cpu::kernels::CpuLogits1DSoftmaxKernel
+ - cpu::kernels::CpuLogits1DMaxKernel
- @ref cpu::kernels::CpuElementwiseUnaryKernel
- Remove padding from OpenCL kernels:
- CLDirectConvolutionLayerKernel
@@ -227,7 +529,7 @@ v20.11 Public major release
- @ref CLLogSoftmaxLayer
- GCSoftmaxLayer
- New OpenCL kernels / functions:
- - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
- @ref CLLogicalNot
- @ref CLLogicalAnd
- @ref CLLogicalOr
@@ -238,40 +540,40 @@ v20.11 Public major release
- Removed padding from Arm® Neon™ kernels:
- NEComplexPixelWiseMultiplicationKernel
- NENonMaximaSuppression3x3Kernel
- - @ref NERemapKernel
- - @ref NEGEMMInterleave4x4Kernel
+ - NERemapKernel
+ - NEGEMMInterleave4x4Kernel
- NEDirectConvolutionLayerKernel
- NEScaleKernel
- NELocallyConnectedMatrixMultiplyKernel
- - @ref NEGEMMLowpOffsetContributionKernel
- - @ref NEGEMMTranspose1xWKernel
+ - NEGEMMLowpOffsetContributionKernel
+ - NEGEMMTranspose1xWKernel
- NEPoolingLayerKernel
- NEConvolutionKernel
- NEDepthwiseConvolutionLayerNativeKernel
- - @ref NEGEMMLowpMatrixMultiplyKernel
- - @ref NEGEMMMatrixMultiplyKernel
+ - NEGEMMLowpMatrixMultiplyKernel
+ - NEGEMMMatrixMultiplyKernel
- NEDirectConvolutionLayerOutputStageKernel
- @ref NEReductionOperationKernel
- - @ref NEGEMMLowpMatrixAReductionKernel
- - @ref NEGEMMLowpMatrixBReductionKernel
+ - NEGEMMLowpMatrixAReductionKernel
+ - NEGEMMLowpMatrixBReductionKernel
- Removed padding from OpenCL kernels:
- CLBatchConcatenateLayerKernel
- CLElementwiseOperationKernel
- @ref CLBatchNormalizationLayerKernel
- CLPoolingLayerKernel
- CLWinogradInputTransformKernel
- - @ref CLGEMMLowpMatrixMultiplyNativeKernel
- - @ref CLGEMMLowpMatrixAReductionKernel
- - @ref CLGEMMLowpMatrixBReductionKernel
- - @ref CLGEMMLowpOffsetContributionOutputStageKernel
- - @ref CLGEMMLowpOffsetContributionKernel
+ - CLGEMMLowpMatrixMultiplyNativeKernel
+ - CLGEMMLowpMatrixAReductionKernel
+ - CLGEMMLowpMatrixBReductionKernel
+ - CLGEMMLowpOffsetContributionOutputStageKernel
+ - CLGEMMLowpOffsetContributionKernel
- CLWinogradOutputTransformKernel
- - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+ - CLGEMMLowpMatrixMultiplyReshapedKernel
- @ref CLFuseBatchNormalizationKernel
- @ref CLDepthwiseConvolutionLayerNativeKernel
- CLDepthConvertLayerKernel
- CLCopyKernel
- - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - CLDepthwiseConvolutionLayer3x3NHWCKernel
- CLActivationLayerKernel
- CLWinogradFilterTransformKernel
- CLWidthConcatenateLayerKernel
@@ -281,11 +583,11 @@ v20.11 Public major release
- CLLogits1DNormKernel
- CLHeightConcatenateLayerKernel
- CLGEMMMatrixMultiplyKernel
- - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel
- - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
- - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+ - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
- CLDepthConcatenateLayerKernel
- - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
- Removed OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
- CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
@@ -520,7 +822,7 @@ v20.08 Public major release
- New OpenCL kernels / functions:
- @ref CLMaxUnpoolingLayerKernel
- New Arm® Neon™ kernels / functions:
- - @ref NEMaxUnpoolingLayerKernel
+ - NEMaxUnpoolingLayerKernel
- New graph example:
- graph_yolov3_output_detector
- GEMMTuner improvements:
@@ -567,7 +869,7 @@ v20.08 Public major release
The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0.
Only axis 0 is supported.
- The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity.
- - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and @ref CLIm2ColKernel (NHWC only)
+ - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only)
- This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output.
- Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation.
- Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding.
@@ -583,9 +885,9 @@ v20.05 Public major release
- Updated recommended gcc version to Linaro 6.3.1.
- Added Bfloat16 type support
- Added Bfloat16 support in:
- - @ref NEWeightsReshapeKernel
- - @ref NEConvolutionLayerReshapeWeights
- - @ref NEIm2ColKernel
+ - NEWeightsReshapeKernel
+ - NEConvolutionLayerReshapeWeights
+ - NEIm2ColKernel
- NEIm2Col
- NEDepthConvertLayerKernel
- @ref NEDepthConvertLayer
@@ -596,9 +898,9 @@ v20.05 Public major release
- @ref CLDeconvolutionLayer
- @ref CLDirectDeconvolutionLayer
- @ref CLGEMMDeconvolutionLayer
- - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
- - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel
- - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+ - CLGEMMLowpMatrixMultiplyReshapedKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleKernel
+ - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
- @ref CLReductionOperation
- @ref CLReduceMean
- @ref NEScale
@@ -609,7 +911,7 @@ v20.05 Public major release
- @ref NEReduceMean
- @ref NEArgMinMaxLayer
- @ref NEDeconvolutionLayer
- - @ref NEGEMMLowpQuantizeDownInt32ScaleKernel
+ - NEGEMMLowpQuantizeDownInt32ScaleKernel
- @ref CPPBoxWithNonMaximaSuppressionLimit
- @ref CPPDetectionPostProcessLayer
- @ref CPPPermuteKernel
@@ -639,9 +941,9 @@ v20.05 Public major release
- Removed NEDepthwiseConvolutionLayerOptimized
- Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
- @ref NEWinogradConvolutionLayer
- - @ref NEWinogradLayerTransformInputKernel
- - @ref NEWinogradLayerTransformOutputKernel
- - @ref NEWinogradLayerTransformWeightsKernel
+ - CpuWinogradConv2dTransformInputKernel
+ - CpuWinogradConv2dTransformOutputKernel
+ - CpuWinogradConv2dTransformWeightsKernel
- Added CLCompileContext
- Added Arm® Neon™ GEMM kernel with 2D window support
@@ -655,9 +957,9 @@ v20.02 Public major release
- @ref CLDepthwiseConvolutionLayer
- CLDepthwiseConvolutionLayer3x3
- @ref CLGEMMConvolutionLayer
- - @ref CLGEMMLowpMatrixMultiplyCore
- - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
- - @ref CLGEMMLowpMatrixMultiplyNativeKernel
+ - CLGEMMLowpMatrixMultiplyCore
+ - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLGEMMLowpMatrixMultiplyNativeKernel
- @ref NEActivationLayer
- NEComparisonOperationKernel
- @ref NEConvolutionLayer
@@ -680,10 +982,10 @@ v20.02 Public major release
- @ref NESplit
- New OpenCL kernels / functions:
- @ref CLFill
- - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+ - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- New Arm® Neon™ kernels / functions:
- @ref NEFill
- - @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+ - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- Deprecated Arm® Neon™ functions / interfaces:
- CLDepthwiseConvolutionLayer3x3
- NEDepthwiseConvolutionLayerOptimized
@@ -800,7 +1102,7 @@ v19.08 Public major release
- NEBatchConcatenateLayerKernel
- @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer
- NEDepthwiseConvolutionLayerNativeKernel
- - @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
+ - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
- @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer
- @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer
- New OpenCL kernels / functions:
@@ -848,7 +1150,7 @@ v19.05 Public major release
- @ref NEFFTDigitReverseKernel
- @ref NEFFTRadixStageKernel
- @ref NEFFTScaleKernel
- - @ref NEGEMMLowpOffsetContributionOutputStageKernel
+ - NEGEMMLowpOffsetContributionOutputStageKernel
- NEHeightConcatenateLayerKernel
- @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer
- @ref NEFFT1D
@@ -861,7 +1163,7 @@ v19.05 Public major release
- @ref CLFFTDigitReverseKernel
- @ref CLFFTRadixStageKernel
- @ref CLFFTScaleKernel
- - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+ - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
- CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
- CLHeightConcatenateLayerKernel
- @ref CLDirectDeconvolutionLayer
@@ -953,7 +1255,7 @@ v19.02 Public major release
- @ref CLRangeKernel / @ref CLRange
- @ref CLUnstack
- @ref CLGatherKernel / @ref CLGather
- - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+ - CLGEMMLowpMatrixMultiplyReshapedKernel
- New CPP kernels / functions:
- @ref CPPDetectionOutputLayer
- @ref CPPTopKV / @ref CPPTopKVKernel
@@ -1020,7 +1322,7 @@ v18.11 Public major release
- Added the validate method in:
- @ref NEDepthConvertLayer
- @ref NEFloor / @ref CLFloor
- - @ref NEGEMMMatrixAdditionKernel
+ - NEGEMMMatrixAdditionKernel
- @ref NEReshapeLayer / @ref CLReshapeLayer
- @ref CLScale
- Added new examples:
@@ -1032,10 +1334,10 @@ v18.11 Public major release
- CLWidthConcatenateLayer
- CLFlattenLayer
- @ref CLSoftmaxLayer
- - Add dot product support for @ref CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
+ - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
- Add SVE support
- Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization
- - Fuses activation in @ref CLDepthwiseConvolutionLayer3x3NCHWKernel, @ref CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer
+ - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer
- Added NHWC data layout support to:
- @ref CLChannelShuffleLayer
- @ref CLDeconvolutionLayer
@@ -1045,7 +1347,7 @@ v18.11 Public major release
- NEDepthwiseConvolutionLayer3x3Kernel
- CLPixelWiseMultiplicationKernel
- Added FP16 support to the following kernels:
- - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel
+ - CLDepthwiseConvolutionLayer3x3NHWCKernel
- NEDepthwiseConvolutionLayer3x3Kernel
- @ref CLNormalizePlanarYUVLayerKernel
- @ref CLWinogradConvolutionLayer (5x5 kernel)
@@ -1064,7 +1366,7 @@ v18.08 Public major release
- @ref CLDirectConvolutionLayer
- @ref CLConvolutionLayer
- @ref CLScale
- - @ref CLIm2ColKernel
+ - CLIm2ColKernel
- New Arm® Neon™ kernels / functions:
- @ref NERNNLayer
- New OpenCL kernels / functions:
@@ -1171,9 +1473,9 @@ v18.02 Public major release
- Added name() method to all kernels.
- Added support for Winograd 5x5.
- NEPermuteKernel / @ref NEPermute
- - @ref NEWinogradLayerTransformInputKernel / NEWinogradLayer
- - @ref NEWinogradLayerTransformOutputKernel / NEWinogradLayer
- - @ref NEWinogradLayerTransformWeightsKernel / NEWinogradLayer
+ - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer
+ - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer
+ - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer
- Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel
- New GLES kernels / functions:
- GCTensorShiftKernel / GCTensorShift
@@ -1242,13 +1544,13 @@ v17.12 Public major release
- arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
- arm_compute::NEHGEMMAArch64FP16Kernel
- NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
- - @ref NEGEMMLowpOffsetContributionKernel / @ref NEGEMMLowpMatrixAReductionKernel / @ref NEGEMMLowpMatrixBReductionKernel / @ref NEGEMMLowpMatrixMultiplyCore
- - @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+ - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore
+ - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
- NEWinogradLayer / NEWinogradLayerKernel
- New OpenCL kernels / functions
- - @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore
- - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+ - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore
+ - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
- New graph nodes for Arm® Neon™ and OpenCL
- graph::BranchLayer
@@ -1280,13 +1582,13 @@ v17.09 Public major release
- NEDequantizationLayerKernel / @ref NEDequantizationLayer
- NEFloorKernel / @ref NEFloor
- @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer
- - NEQuantizationLayerKernel @ref NEMinMaxLayerKernel / @ref NEQuantizationLayer
+ - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer
- @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer
- @ref NEReductionOperationKernel / @ref NEReductionOperation
- NEReshapeLayerKernel / @ref NEReshapeLayer
- New OpenCL kernels / functions:
- - @ref CLDepthwiseConvolutionLayer3x3NCHWKernel @ref CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer
+ - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer
- CLDequantizationLayerKernel / CLDequantizationLayer
- CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer
- CLFlattenLayer
@@ -1294,7 +1596,7 @@ v17.09 Public major release
- CLGEMMTranspose1xW
- CLGEMMMatrixVectorMultiplyKernel
- @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer
- - CLQuantizationLayerKernel @ref CLMinMaxLayerKernel / @ref CLQuantizationLayer
+ - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer
- @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer
- @ref CLReductionOperationKernel / @ref CLReductionOperation
- CLReshapeLayerKernel / @ref CLReshapeLayer
@@ -1307,13 +1609,13 @@ v17.06 Public major release
- Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
- Added @ref OMPScheduler (OpenMP) scheduler for Neon
- Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
- - User can specify his own scheduler by implementing the @ref IScheduler interface.
+ - User can specify their own scheduler by implementing the @ref IScheduler interface.
- New OpenCL kernels / functions:
- @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
- CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer
- CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection
- CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer
- - @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights
+ - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights
- New C++ kernels:
- CPPDetectionWindowNonMaximaSuppressionKernel
- New Arm® Neon™ kernels / functions:
@@ -1321,7 +1623,7 @@ v17.06 Public major release
- NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
- NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
- NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer
- - @ref NEWeightsReshapeKernel / @ref NEConvolutionLayerReshapeWeights
+ - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights
v17.05 Public bug fixes release
- Various bug fixes
@@ -1362,9 +1664,9 @@ v17.03.1 First Major public release of the sources
- @ref NENormalizationLayerKernel / @ref NENormalizationLayer
- NETransposeKernel / @ref NETranspose
- NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
- - @ref NEIm2ColKernel, @ref NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer
+ - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer
- NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer
- - @ref NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp
+ - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp
v17.03 Sources preview
- New OpenCL kernels / functions:
@@ -1377,15 +1679,15 @@ v17.03 Sources preview
- CLLaplacianPyramid, CLLaplacianReconstruct
- New Arm® Neon™ kernels / functions:
- NEActivationLayerKernel / @ref NEActivationLayer
- - GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM
+ - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM
- NEPoolingLayerKernel / @ref NEPoolingLayer
v17.02.1 Sources preview
- New OpenCL kernels / functions:
- CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer
- CLPoolingLayerKernel / @ref CLPoolingLayer
- - @ref CLIm2ColKernel, @ref CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
- - @ref CLRemapKernel / @ref CLRemap
+ - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
+ - CLRemapKernel / CLRemap
- CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
- CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
- CLNonLinearFilterKernel / CLNonLinearFilter
@@ -1412,4 +1714,4 @@ v16.12 Binary preview release
- Original release
*/
-} // namespace arm_compute \ No newline at end of file
+} // namespace arm_compute