diff options
Diffstat (limited to 'docs/user_guide/release_version_and_change_log.dox')
-rw-r--r-- | docs/user_guide/release_version_and_change_log.dox | 442 |
1 files changed, 372 insertions, 70 deletions
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox index 3ffa11b045..ca8092797f 100644 --- a/docs/user_guide/release_version_and_change_log.dox +++ b/docs/user_guide/release_version_and_change_log.dox @@ -1,5 +1,5 @@ /// -/// Copyright (c) 2017-2021 Arm Limited. +/// Copyright (c) 2017-2024 Arm Limited. /// /// SPDX-License-Identifier: MIT /// @@ -37,9 +37,311 @@ If there is more than one release in a month then an extra sequential number is v17.04 (First release of April 2017) @note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes. +@note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly. @section S2_2_changelog Changelog +v24.05 Public major release + - Add @ref CLScatter operator for FP32/16, S32/16/8, U32/16/8 data types + +v24.04 Public major release + - Add Bfloat16 data type support for @ref NEMatMul. + - Add support for SoftMax in SME2 for FP32 and FP16. + - Add support for in place accumulation to CPU GEMM kernels. + - Add low-precision Int8 * Int8 -> FP32 CPU GEMM which dequantizes after multiplication + - Add is_dynamic flag to QuantizationInfo to signal to operators that it may change after configuration + - Performance optimizations: + - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm + - Optimize @ref NEConvolutionLayer for input tensor size > 1e7 bytes and weight tensor height > 7 + - Optimize @ref NESoftmaxLayer for axis != 0 by natively supporting higher axes up to axis 3. + +v24.02.1 Public patch release + - Fix performance regression in fixed-format kernels + - Fix compile and runtime errors in arm_compute_validation for Windows on Arm(WoA) + +v24.02 Public major release + - Replace template writer with compute kernel writer in dynamic fusion. + - Performance optimizations: + - Parallelize @ref NEDepthwiseConvolutionLayer over batches if there is only 1 row + +v24.01 Public major release + - Remove the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. + You should link only to the main `libarm_compute` library for core functionality. + - Expand GPUTarget list with Mali™ G720 and G620. + - Optimize CPU activation functions using LUT-based implementation: + - Sigmoid function for FP16. + - New features + - Add support for FP16 in all multi_isa builds. + - Performance optimizations: + - Optimize @ref NESoftmaxLayer + - Optimize @ref NEDepthToSpaceLayer. + +v23.11 Public major release + - New features + - Add support for input data type U64/S64 in CLCast and NECast. + - Add support for output data type S64 in NEArgMinMaxLayer and CLArgMinMaxLayer + - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface: + - @ref experimental::dynamic_fusion::GpuCkwResize + - @ref experimental::dynamic_fusion::GpuCkwPool2d + - @ref experimental::dynamic_fusion::GpuCkwDepthwiseConv2d + - @ref experimental::dynamic_fusion::GpuCkwMatMul + - Add support for OpenCL™ comand buffer with mutable dispatch extension. + - Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82. + - Add support for negative axis values and inverted axis values in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse. + - Add new OpenCL™ kernels: + - @ref opencl::kernels::ClMatMulLowpNativeMMULKernel support for QASYMM8 and QASYMM8_SIGNED, with batch support + - Performance optimizations: + - Optimize @ref cpu::CpuReshape + - Optimize @ref opencl::ClTranspose + - Optimize @ref NEStackLayer + - Optimize @ref CLReductionOperation. + - Optimize @ref CLSoftmaxLayer. + - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm + - Reduce CPU Overhead by optimal flushing of CL kernels. + - Deprecate support for Bfloat16 in @ref cpu::CpuCast. + - Support for U32 axis in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse will be deprecated in 24.02. + - Remove legacy PostOps interface. PostOps was the experimental interface for kernel fusion and is replaced by the new Dynamic Fusion interface. + - Update OpenCL™ API headers to v2023.04.17 + +v23.08 Public major release + - Deprecate the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. + Users must no longer link their applications to this library and instead link only to the main `libarm_compute` library for core functionality. + - New features + - Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output. + - Add multi-sketch support for dynamic fusion. + - Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers. + - Add Fused Activation to CLMatMul. + - Implement FP32/FP16 @ref opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension. + - Use MatMul in fully connected layer with dynamic weights when supported. + - Optimize CPU depthwise convolution with channel multiplier. + - Add support in CpuCastKernel for conversion of S64/U64 to F32. + - Add new OpenCL™ kernels: + - @ref opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support + - Enable transposed convolution with non-square kernels on CPU and GPU. + - Add support for input data type U64/S64 in CLCast. + - Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion. + - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only: + - @ref experimental::dynamic_fusion::GpuCkwActivation + - @ref experimental::dynamic_fusion::GpuCkwCast + - @ref experimental::dynamic_fusion::GpuCkwDirectConv2d + - @ref experimental::dynamic_fusion::GpuCkwElementwiseBinary + - @ref experimental::dynamic_fusion::GpuCkwStore + - Various optimizations and bug fixes. + +v23.05.1 Public patch release + - Enable CMake and Bazel option to build multi_isa without FP16 support. + - Fix compilation error in NEReorderLayer (aarch64 only). + - Disable invalid (false-negative) validation test with CPU scale layer on FP16. + - Various bug fixes + +v23.05 Public major release + - New features: + - Add new Arm® Neon™ kernels / functions: + - @ref NEMatMul for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support. + - NEReorderLayer (aarch64 only) + - Add new OpenCL™ kernels / functions: + - @ref CLMatMul support for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support. + - Add support for the multiple dimensions in the indices parameter for both the Arm® Neon™ and OpenCL™ implementations of the Gather Layer. + - Add support for dynamic weights in @ref CLFullyConnectedLayer and @ref NEFullyConnectedLayer for all data types. + - Add support for cropping in the Arm® Neon™ and OpenCL™: implementations of the BatchToSpace Layer for all data types. + - Add support for quantized data types for the ElementwiseUnary Operators for Arm® Neon™. + - Implement RSQRT for quantized data types on OpenCL™. + - Add FP16 depthwise convolution kernels for SME2. + - Performance optimizations: + - Improve CLTuner exhaustive mode tuning time. + - Deprecate dynamic block shape in @ref NEBatchToSpaceLayer and @ref CLBatchToSpaceLayer. + - Various optimizations and bug fixes. + +v23.02.1 Public patch release + - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels. + - Fixes for experimental CPU only Bazel and CMake builds. + +v23.02 Public major release + - New features: + - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator. + - Add the following operators to the experimental dynamic fusion API: + - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub. + - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling. + - Add new CPU operator AddMulAdd for float and quantized types. + - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings. + - Add experimental support for CPU only Bazel and CMake builds. + - Performance optimizations: + - Optimize CPU base-e exponential functions for FP32. + - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible. + - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel. + - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain. + - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain. + - Update the heuristic for CLDepthwiseConvolutionNative kernel. + - Add new optimized OpenCL kernel to compute indirect convolution: + - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink + - Add new optimized OpenCL kernel to compute transposed convolution: + - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink + - Update recommended/minimum NDK version to r20b. + - Various optimizations and bug fixes. + +v22.11 Public major release + - New features: + - Add new experimental dynamic fusion API. + - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32. + - Add CPU MeanStdDevNorm for QASYMM8. + - Add CPU and GPU GELU activation function for FP32 and FP16. + - Add CPU swish activation function for FP32 and FP16. + - Performance optimizations: + - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8. + - Optimize CPU activation functions using LUT-based implementation: + - Sigmoid function for QASYMM8 and QASYMM8_SIGNED. + - Hard swish function for QASYMM8_SIGNED. + - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic. + - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D. + - Optimize GPU depthwise convolution kernel and heuristic. + - Optimize GPU Conv2d heuristic. + - Optimize CPU MeanStdDevNorm for FP16. + - Optimize CPU tanh activation function for FP16 using rational approximation. + - Improve GPU GeMMLowp start-up time. + - Various optimizations and bug fixes. + +v22.08 Public major release + - Various bug fixes. + - Disable unsafe FP optimizations causing accuracy issues in: + - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink + - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink + - @ref CLDepthwiseConvolutionLayerNativeKernel + - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add. + - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615. + - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel. + - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310. + - Extend the direct convolution 2d interface to configure the block size. + - Update ClConv2D heuristic to use direct convolution. + - Use official Khronos® OpenCL extensions: + - Add cl_khr_integer_dot_product extension support. + - Add support of OpenCL 3.0 non-uniform workgroup. + - Cpu performance optimizations: + - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build. + - Optimize Add layer by considering the input tensors as 1D array. + - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights. + - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator. + - Add experimental support for native builds for Windows® on Arm™. + - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions. + - Build flag change: toolchain_prefix, compiler_prefix: + - Use empty string "" to suppress any prefixes. + - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified. + - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools. + - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto". + - armv7a with Android build will no longer be tested or maintained. + +v22.05 Public major release + - Various bug fixes. + - Various optimizations. + - Add support for NDK r23b. + - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details. + - New Arm® Neon™ kernels / functions : + - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink + - New OpenCL kernels / functions : + - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink + - Improve the start-up times for the following OpenCL kernels: + - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink + - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink + - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink + - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink + - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int): + - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink + - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink + - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink + - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink + - @ref NEFuseBatchNormalizationKernel + - @ref NEL2NormalizeLayerKernel + +v22.02 Public major release + - Various bug fixes. + - Various optimizations. + - Update A510 arm_gemm cpu Kernels. + - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details. + - Improve the start-up time for the following OpenCL kernels: + - @ref CLScale + - @ref CLGEMM + - @ref CLDepthwiseConvolutionLayer + - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink + - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink + - Remove functions: + - CLRemap + - NERemap + - Remove padding from OpenCL kernels: + - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink + - Remove padding from Cpu kernels: + - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink + - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int): + - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink + - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink + - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink + - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink + - @ref NEBoundingBoxTransformKernel + - @ref NECropKernel + - @ref NEComputeAllAnchorsKernel + - @ref NEInstanceNormalizationLayerKernel + - NEMaxUnpoolingLayerKernel + - @ref NEMeanStdDevNormalizationKernel + - @ref NERangeKernel + - @ref NEROIAlignLayerKernel + - @ref NESelectKernel + +v21.11 Public major release + - Various bug fixes. + - Various optimizations: + - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types + - Improve performance of Softmax on GPU for Uint8/Int8 + - New OpenCL kernels / functions: + - @ref CLConv3D + - New Arm® Neon™ kernels / functions: + - @ref NEConv3D + - Support configurable build by a selected subset of operator list + - Support MobileBert on Neon™ backend + - Improve operator/function logging + - Remove padding from OpenCL kernels: + - ClPool2dKernel + - ClScaleKernel + - ClGemmMatrixMultiplyReshapedKernel + - Remove padding from Cpu kernels: + - CpuPool2dKernel + - Remove Y padding from OpenCL kernels: + - ClGemmMatrixMultiplyKernel + - ClGemmReshapedRHSMatrixKernel + - Remove legacy GeMM kernels in gemm_v1.cl + +v21.08 Public major release + - Various bug fixes. + - Various optimizations: + - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used + - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures + - Add dynamic weights support in Fully connected layer (CPU/GPU) + - Various performance optimizations for floating-point data types (CPU/GPU) + - Add a reduced core library build arm_compute_core_v2 + - Expose Operator API + - Support fat binary build for arm8.2-a via fat_binary build flag + - Add CPU discovery capabilities + - Add data type f16 support for: + - CLRemapKernel + - Port the following functions to stateless API: + - @ref CLConvolutionLayer + - @ref CLFlattenLayer + - @ref CLFullyConnectedLayer + - @ref CLGEMM + - @ref CLGEMMConvolutionLayer + - @ref CLGEMMLowpMatrixMultiplyCore + - @ref CLWinogradConvolutionLayer + - @ref NEConvolutionLayer + - @ref NEFlattenLayer + - @ref NEFullyConnectedLayer + - @ref NEGEMM + - @ref NEGEMMConv2d + - @ref NEGEMMConvolutionLayer + - @ref NEGEMMLowpMatrixMultiplyCore + - @ref NEWinogradConvolutionLayer + - Remove the following functions: + - CLWinogradInputTransform + - Remove CLCoreRuntimeContext + - Remove ICPPSimpleKernel + - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h + v21.05 Public major release - Various bug fixes. - Various optimisations. @@ -62,7 +364,7 @@ v21.05 Public major release - @ref NEDeconvolutionLayer - Remove padding from OpenCL kernels: - @ref CLL2NormalizeLayerKernel - - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel + - CLDepthwiseConvolutionLayer3x3NHWCKernel - @ref CLNormalizationLayerKernel - @ref CLNormalizePlanarYUVLayerKernel - @ref opencl::kernels::ClMulKernel @@ -153,7 +455,7 @@ v21.05 Public major release - CLThreshold - CLWarpAffine - CLWarpPerspective - + v21.02 Public major release - Various bug fixes. - Various optimisations. @@ -165,8 +467,8 @@ v21.02 Public major release - @ref NEActivationLayer - @ref NEArithmeticAddition - @ref NEBatchNormalizationLayerKernel - - @ref cpu::kernels::CpuLogits1DSoftmaxKernel - - @ref cpu::kernels::CpuLogits1DMaxKernel + - cpu::kernels::CpuLogits1DSoftmaxKernel + - cpu::kernels::CpuLogits1DMaxKernel - @ref cpu::kernels::CpuElementwiseUnaryKernel - Remove padding from OpenCL kernels: - CLDirectConvolutionLayerKernel @@ -227,7 +529,7 @@ v20.11 Public major release - @ref CLLogSoftmaxLayer - GCSoftmaxLayer - New OpenCL kernels / functions: - - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel + - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel - @ref CLLogicalNot - @ref CLLogicalAnd - @ref CLLogicalOr @@ -238,40 +540,40 @@ v20.11 Public major release - Removed padding from Arm® Neon™ kernels: - NEComplexPixelWiseMultiplicationKernel - NENonMaximaSuppression3x3Kernel - - @ref NERemapKernel - - @ref NEGEMMInterleave4x4Kernel + - NERemapKernel + - NEGEMMInterleave4x4Kernel - NEDirectConvolutionLayerKernel - NEScaleKernel - NELocallyConnectedMatrixMultiplyKernel - - @ref NEGEMMLowpOffsetContributionKernel - - @ref NEGEMMTranspose1xWKernel + - NEGEMMLowpOffsetContributionKernel + - NEGEMMTranspose1xWKernel - NEPoolingLayerKernel - NEConvolutionKernel - NEDepthwiseConvolutionLayerNativeKernel - - @ref NEGEMMLowpMatrixMultiplyKernel - - @ref NEGEMMMatrixMultiplyKernel + - NEGEMMLowpMatrixMultiplyKernel + - NEGEMMMatrixMultiplyKernel - NEDirectConvolutionLayerOutputStageKernel - @ref NEReductionOperationKernel - - @ref NEGEMMLowpMatrixAReductionKernel - - @ref NEGEMMLowpMatrixBReductionKernel + - NEGEMMLowpMatrixAReductionKernel + - NEGEMMLowpMatrixBReductionKernel - Removed padding from OpenCL kernels: - CLBatchConcatenateLayerKernel - CLElementwiseOperationKernel - @ref CLBatchNormalizationLayerKernel - CLPoolingLayerKernel - CLWinogradInputTransformKernel - - @ref CLGEMMLowpMatrixMultiplyNativeKernel - - @ref CLGEMMLowpMatrixAReductionKernel - - @ref CLGEMMLowpMatrixBReductionKernel - - @ref CLGEMMLowpOffsetContributionOutputStageKernel - - @ref CLGEMMLowpOffsetContributionKernel + - CLGEMMLowpMatrixMultiplyNativeKernel + - CLGEMMLowpMatrixAReductionKernel + - CLGEMMLowpMatrixBReductionKernel + - CLGEMMLowpOffsetContributionOutputStageKernel + - CLGEMMLowpOffsetContributionKernel - CLWinogradOutputTransformKernel - - @ref CLGEMMLowpMatrixMultiplyReshapedKernel + - CLGEMMLowpMatrixMultiplyReshapedKernel - @ref CLFuseBatchNormalizationKernel - @ref CLDepthwiseConvolutionLayerNativeKernel - CLDepthConvertLayerKernel - CLCopyKernel - - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel + - CLDepthwiseConvolutionLayer3x3NHWCKernel - CLActivationLayerKernel - CLWinogradFilterTransformKernel - CLWidthConcatenateLayerKernel @@ -281,11 +583,11 @@ v20.11 Public major release - CLLogits1DNormKernel - CLHeightConcatenateLayerKernel - CLGEMMMatrixMultiplyKernel - - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel - - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel - - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel + - CLGEMMLowpQuantizeDownInt32ScaleKernel + - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel + - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel - CLDepthConcatenateLayerKernel - - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel + - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel - Removed OpenCL kernels / functions: - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel @@ -520,7 +822,7 @@ v20.08 Public major release - New OpenCL kernels / functions: - @ref CLMaxUnpoolingLayerKernel - New Arm® Neon™ kernels / functions: - - @ref NEMaxUnpoolingLayerKernel + - NEMaxUnpoolingLayerKernel - New graph example: - graph_yolov3_output_detector - GEMMTuner improvements: @@ -567,7 +869,7 @@ v20.08 Public major release The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0. Only axis 0 is supported. - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity. - - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and @ref CLIm2ColKernel (NHWC only) + - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only) - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output. - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation. - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding. @@ -583,9 +885,9 @@ v20.05 Public major release - Updated recommended gcc version to Linaro 6.3.1. - Added Bfloat16 type support - Added Bfloat16 support in: - - @ref NEWeightsReshapeKernel - - @ref NEConvolutionLayerReshapeWeights - - @ref NEIm2ColKernel + - NEWeightsReshapeKernel + - NEConvolutionLayerReshapeWeights + - NEIm2ColKernel - NEIm2Col - NEDepthConvertLayerKernel - @ref NEDepthConvertLayer @@ -596,9 +898,9 @@ v20.05 Public major release - @ref CLDeconvolutionLayer - @ref CLDirectDeconvolutionLayer - @ref CLGEMMDeconvolutionLayer - - @ref CLGEMMLowpMatrixMultiplyReshapedKernel - - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel - - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel + - CLGEMMLowpMatrixMultiplyReshapedKernel + - CLGEMMLowpQuantizeDownInt32ScaleKernel + - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel - @ref CLReductionOperation - @ref CLReduceMean - @ref NEScale @@ -609,7 +911,7 @@ v20.05 Public major release - @ref NEReduceMean - @ref NEArgMinMaxLayer - @ref NEDeconvolutionLayer - - @ref NEGEMMLowpQuantizeDownInt32ScaleKernel + - NEGEMMLowpQuantizeDownInt32ScaleKernel - @ref CPPBoxWithNonMaximaSuppressionLimit - @ref CPPDetectionPostProcessLayer - @ref CPPPermuteKernel @@ -639,9 +941,9 @@ v20.05 Public major release - Removed NEDepthwiseConvolutionLayerOptimized - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16: - @ref NEWinogradConvolutionLayer - - @ref NEWinogradLayerTransformInputKernel - - @ref NEWinogradLayerTransformOutputKernel - - @ref NEWinogradLayerTransformWeightsKernel + - CpuWinogradConv2dTransformInputKernel + - CpuWinogradConv2dTransformOutputKernel + - CpuWinogradConv2dTransformWeightsKernel - Added CLCompileContext - Added Arm® Neon™ GEMM kernel with 2D window support @@ -655,9 +957,9 @@ v20.02 Public major release - @ref CLDepthwiseConvolutionLayer - CLDepthwiseConvolutionLayer3x3 - @ref CLGEMMConvolutionLayer - - @ref CLGEMMLowpMatrixMultiplyCore - - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel - - @ref CLGEMMLowpMatrixMultiplyNativeKernel + - CLGEMMLowpMatrixMultiplyCore + - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel + - CLGEMMLowpMatrixMultiplyNativeKernel - @ref NEActivationLayer - NEComparisonOperationKernel - @ref NEConvolutionLayer @@ -680,10 +982,10 @@ v20.02 Public major release - @ref NESplit - New OpenCL kernels / functions: - @ref CLFill - - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint + - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint - New Arm® Neon™ kernels / functions: - @ref NEFill - - @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint + - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint - Deprecated Arm® Neon™ functions / interfaces: - CLDepthwiseConvolutionLayer3x3 - NEDepthwiseConvolutionLayerOptimized @@ -800,7 +1102,7 @@ v19.08 Public major release - NEBatchConcatenateLayerKernel - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer - NEDepthwiseConvolutionLayerNativeKernel - - @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel + - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer - New OpenCL kernels / functions: @@ -848,7 +1150,7 @@ v19.05 Public major release - @ref NEFFTDigitReverseKernel - @ref NEFFTRadixStageKernel - @ref NEFFTScaleKernel - - @ref NEGEMMLowpOffsetContributionOutputStageKernel + - NEGEMMLowpOffsetContributionOutputStageKernel - NEHeightConcatenateLayerKernel - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer - @ref NEFFT1D @@ -861,7 +1163,7 @@ v19.05 Public major release - @ref CLFFTDigitReverseKernel - @ref CLFFTRadixStageKernel - @ref CLFFTScaleKernel - - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel + - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel - CLHeightConcatenateLayerKernel - @ref CLDirectDeconvolutionLayer @@ -953,7 +1255,7 @@ v19.02 Public major release - @ref CLRangeKernel / @ref CLRange - @ref CLUnstack - @ref CLGatherKernel / @ref CLGather - - @ref CLGEMMLowpMatrixMultiplyReshapedKernel + - CLGEMMLowpMatrixMultiplyReshapedKernel - New CPP kernels / functions: - @ref CPPDetectionOutputLayer - @ref CPPTopKV / @ref CPPTopKVKernel @@ -1020,7 +1322,7 @@ v18.11 Public major release - Added the validate method in: - @ref NEDepthConvertLayer - @ref NEFloor / @ref CLFloor - - @ref NEGEMMMatrixAdditionKernel + - NEGEMMMatrixAdditionKernel - @ref NEReshapeLayer / @ref CLReshapeLayer - @ref CLScale - Added new examples: @@ -1032,10 +1334,10 @@ v18.11 Public major release - CLWidthConcatenateLayer - CLFlattenLayer - @ref CLSoftmaxLayer - - Add dot product support for @ref CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride + - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride - Add SVE support - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization - - Fuses activation in @ref CLDepthwiseConvolutionLayer3x3NCHWKernel, @ref CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer + - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer - Added NHWC data layout support to: - @ref CLChannelShuffleLayer - @ref CLDeconvolutionLayer @@ -1045,7 +1347,7 @@ v18.11 Public major release - NEDepthwiseConvolutionLayer3x3Kernel - CLPixelWiseMultiplicationKernel - Added FP16 support to the following kernels: - - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel + - CLDepthwiseConvolutionLayer3x3NHWCKernel - NEDepthwiseConvolutionLayer3x3Kernel - @ref CLNormalizePlanarYUVLayerKernel - @ref CLWinogradConvolutionLayer (5x5 kernel) @@ -1064,7 +1366,7 @@ v18.08 Public major release - @ref CLDirectConvolutionLayer - @ref CLConvolutionLayer - @ref CLScale - - @ref CLIm2ColKernel + - CLIm2ColKernel - New Arm® Neon™ kernels / functions: - @ref NERNNLayer - New OpenCL kernels / functions: @@ -1171,9 +1473,9 @@ v18.02 Public major release - Added name() method to all kernels. - Added support for Winograd 5x5. - NEPermuteKernel / @ref NEPermute - - @ref NEWinogradLayerTransformInputKernel / NEWinogradLayer - - @ref NEWinogradLayerTransformOutputKernel / NEWinogradLayer - - @ref NEWinogradLayerTransformWeightsKernel / NEWinogradLayer + - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer + - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer + - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel - New GLES kernels / functions: - GCTensorShiftKernel / GCTensorShift @@ -1242,13 +1544,13 @@ v17.12 Public major release - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore - arm_compute::NEHGEMMAArch64FP16Kernel - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer - - @ref NEGEMMLowpOffsetContributionKernel / @ref NEGEMMLowpMatrixAReductionKernel / @ref NEGEMMLowpMatrixBReductionKernel / @ref NEGEMMLowpMatrixMultiplyCore - - @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint + - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore + - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint - NEWinogradLayer / NEWinogradLayerKernel - New OpenCL kernels / functions - - @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore - - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint + - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore + - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint - New graph nodes for Arm® Neon™ and OpenCL - graph::BranchLayer @@ -1280,13 +1582,13 @@ v17.09 Public major release - NEDequantizationLayerKernel / @ref NEDequantizationLayer - NEFloorKernel / @ref NEFloor - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer - - NEQuantizationLayerKernel @ref NEMinMaxLayerKernel / @ref NEQuantizationLayer + - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer - @ref NEReductionOperationKernel / @ref NEReductionOperation - NEReshapeLayerKernel / @ref NEReshapeLayer - New OpenCL kernels / functions: - - @ref CLDepthwiseConvolutionLayer3x3NCHWKernel @ref CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer + - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer - CLDequantizationLayerKernel / CLDequantizationLayer - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer - CLFlattenLayer @@ -1294,7 +1596,7 @@ v17.09 Public major release - CLGEMMTranspose1xW - CLGEMMMatrixVectorMultiplyKernel - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer - - CLQuantizationLayerKernel @ref CLMinMaxLayerKernel / @ref CLQuantizationLayer + - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer - @ref CLReductionOperationKernel / @ref CLReductionOperation - CLReshapeLayerKernel / @ref CLReshapeLayer @@ -1307,13 +1609,13 @@ v17.06 Public major release - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels. - Added @ref OMPScheduler (OpenMP) scheduler for Neon - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal) - - User can specify his own scheduler by implementing the @ref IScheduler interface. + - User can specify their own scheduler by implementing the @ref IScheduler interface. - New OpenCL kernels / functions: - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer - - @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights + - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights - New C++ kernels: - CPPDetectionWindowNonMaximaSuppressionKernel - New Arm® Neon™ kernels / functions: @@ -1321,7 +1623,7 @@ v17.06 Public major release - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer - - @ref NEWeightsReshapeKernel / @ref NEConvolutionLayerReshapeWeights + - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights v17.05 Public bug fixes release - Various bug fixes @@ -1362,9 +1664,9 @@ v17.03.1 First Major public release of the sources - @ref NENormalizationLayerKernel / @ref NENormalizationLayer - NETransposeKernel / @ref NETranspose - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer - - @ref NEIm2ColKernel, @ref NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer + - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer - - @ref NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp + - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp v17.03 Sources preview - New OpenCL kernels / functions: @@ -1377,15 +1679,15 @@ v17.03 Sources preview - CLLaplacianPyramid, CLLaplacianReconstruct - New Arm® Neon™ kernels / functions: - NEActivationLayerKernel / @ref NEActivationLayer - - GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM + - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM - NEPoolingLayerKernel / @ref NEPoolingLayer v17.02.1 Sources preview - New OpenCL kernels / functions: - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer - CLPoolingLayerKernel / @ref CLPoolingLayer - - @ref CLIm2ColKernel, @ref CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer - - @ref CLRemapKernel / @ref CLRemap + - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer + - CLRemapKernel / CLRemap - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation - CLNonLinearFilterKernel / CLNonLinearFilter @@ -1412,4 +1714,4 @@ v16.12 Binary preview release - Original release */ -} // namespace arm_compute
\ No newline at end of file +} // namespace arm_compute |