aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/00_introduction.dox136
-rw-r--r--docs/01_library.dox24
-rw-r--r--docs/02_tests.dox2
-rw-r--r--docs/04_adding_operator.dox16
-rw-r--r--docs/06_functions_list.dox2
-rw-r--r--docs/07_errata.dox2
-rw-r--r--docs/ComputeLibrary.dir32
7 files changed, 107 insertions, 107 deletions
diff --git a/docs/00_introduction.dox b/docs/00_introduction.dox
index 45824b1f42..389757e9d9 100644
--- a/docs/00_introduction.dox
+++ b/docs/00_introduction.dox
@@ -27,12 +27,12 @@ namespace arm_compute
@tableofcontents
-The Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies.
+The Computer Vision and Machine Learning library is a set of functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Several builds of the library are available using various configurations:
- - OS: Linux, Android, macOS or bare metal.
- - Architecture: armv7a (32bit) or arm64-v8a (64bit)
- - Technology: NEON / OpenCL / GLES_COMPUTE / NEON and OpenCL and GLES_COMPUTE
+ - OS: Android or Linux.
+ - Architecture: armv7a (32bit) or arm64-v8a (64bit).
+ - Technology: Neon / OpenCL / Neon and OpenCL.
- Debug / Asserts / Release: Use a build with asserts enabled to debug your application and enable extra validation. Once you are sure your application works as expected you can switch to a release build of the library for maximum performance.
@section S0_1_contact Contact / Support
@@ -161,11 +161,11 @@ v20.11 Public major release
- @ref CLLogicalNot
- @ref CLLogicalAnd
- @ref CLLogicalOr
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NELogicalNot
- @ref NELogicalAnd
- @ref NELogicalOr
- - Removed padding from NEON kernels:
+ - Removed padding from Neon kernels:
- @ref NEComplexPixelWiseMultiplicationKernel
- @ref NENonMaximaSuppression3x3Kernel
- @ref NERemapKernel
@@ -315,7 +315,7 @@ v20.11 Public major release
- CLWarpAffineKernel
- CLWarpPerspective
- CLWarpPerspectiveKernel
- - Deprecated NEON kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - Deprecated Neon kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
- NELocallyConnectedLayer
- NELocallyConnectedMatrixMultiplyKernel
- NEAbsoluteDifference
@@ -449,7 +449,7 @@ v20.08 Public major release
- @ref CLScaleKernel
- New OpenCL kernels / functions:
- @ref CLMaxUnpoolingLayerKernel
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEMaxUnpoolingLayerKernel
- New graph example:
- graph_yolov3_output_detector
@@ -485,7 +485,7 @@ v20.08 Public major release
- Removed OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- - Removed NEON kernels / functions:
+ - Removed Neon kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- NEGEMMMatrixAccumulateBiasesKernel
- Deprecated functions / interfaces:
@@ -551,7 +551,7 @@ v20.05 Public major release
- New OpenCL kernels / functions:
- @ref CLQLSTMLayer
- @ref CLQLSTMLayerNormalizationKernel
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEQLSTMLayer
- @ref NEQLSTMLayerNormalizationKernel
- Added HARD_SWISH support in:
@@ -560,20 +560,20 @@ v20.05 Public major release
- Deprecated OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- - Deprecated NEON kernels / functions:
+ - Deprecated Neon kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- Removed CPP kernels / functions:
- CPPFlipWeightsKernel
- Removed PoolingLayerInfo constructors without Data Layout.
- Removed CLDepthwiseConvolutionLayer3x3
- Removed NEDepthwiseConvolutionLayerOptimized
- - Added support for Winograd 3x3,4x4 on NEON FP16:
+ - Added support for Winograd 3x3,4x4 on Neon FP16:
- @ref NEWinogradConvolutionLayer
- @ref NEWinogradLayerTransformInputKernel
- @ref NEWinogradLayerTransformOutputKernel
- @ref NEWinogradLayerTransformWeightsKernel
- Added CLCompileContext
- - Added NEON GEMM kernel with 2D window support
+ - Added Neon GEMM kernel with 2D window support
v20.02.1 Maintenance release
- Added Android-NN build script.
@@ -611,14 +611,14 @@ v20.02 Public major release
- New OpenCL kernels / functions:
- @ref CLFill
- CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEFill
- @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- - Deprecated NEON functions / interfaces:
+ - Deprecated Neon functions / interfaces:
- CLDepthwiseConvolutionLayer3x3
- NEDepthwiseConvolutionLayerOptimized
- PoolingLayerInfo constructors without Data Layout.
- - Added support for quantization with multiplier greater than 1 on NEON and CL.
+ - Added support for quantization with multiplier greater than 1 on Neon and CL.
- Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
- Added the ability to build bootcode for bare metal.
- Added support for generating synthetic QASYMM8 graphs.
@@ -643,7 +643,7 @@ v19.11 Public major release
- CLDepthwiseSeparableConvolutionLayer
- CLDepthwiseVectorToTensorKernel
- CLDirectConvolutionLayerOutputStageKernel
- - Deprecated NEON kernels / functions:
+ - Deprecated Neon kernels / functions:
- NEDepthwiseWeightsReshapeKernel
- NEDepthwiseIm2ColKernel
- NEDepthwiseSeparableConvolutionLayer
@@ -654,7 +654,7 @@ v19.11 Public major release
- @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
OpenCL kernels / functions)
- @ref CLLogSoftmaxLayer
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
- @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
- @ref NEDetectionPostProcessLayer
@@ -693,8 +693,8 @@ v19.11 Public major release
- Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
- Improved performance for CL Inception V3 - FP16.
- Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
- - Improved NEON performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
- - Improved NEON performance for MobileNet-SSD by improving the output detection performance.
+ - Improved Neon performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
+ - Improved Neon performance for MobileNet-SSD by improving the output detection performance.
- Optimized @ref CLPadLayer.
- Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
- Reduced memory consumption by implementing weights sharing.
@@ -710,7 +710,7 @@ v19.08.1 Public maintenance release
v19.08 Public major release
- Various bug fixes.
- Various optimisations.
- - Deprecated NEON functions
+ - Deprecated Neon functions
- NEDepthConcatenateLayer
- NEWidthConcatenateLayer
- Deprecated OpenCL kernels / functions
@@ -718,7 +718,7 @@ v19.08 Public major release
- CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
- CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
- CLWidthConcatenateLayer
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEAbsLayer
- @ref NECast
- @ref NEElementwisePower
@@ -757,15 +757,15 @@ v19.08 Public major release
- Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
- Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
- Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
- - Re-factored the depthwise convolution layer kernel on NEON for generic cases
- - Added an optimized depthwise convolution layer kernel for 5x5 filters (NEON only)
+ - Re-factored the depthwise convolution layer kernel on Neon for generic cases
+ - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only)
- Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
- Altered @ref QuantizationInfo interface to support per-channel quantization.
- The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations.
- The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
- Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
- Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
- - Optimized the NEON assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
+ - Optimized the Neon assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
v19.05 Public major release
- Various bug fixes.
@@ -822,7 +822,7 @@ v19.05 Public major release
- Add support for QASYMM8 in NEArithmeticSubtractionKernel.
- Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
- Add support for QASYMM8 NEDeconvolution.
- - Add support for DequantizationLayer for NEON/CL.
+ - Add support for DequantizationLayer for Neon/CL.
- Add support for dilation in CLDepthwiseConvolution.
- Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
- Optimize CLDeconvolution.
@@ -898,7 +898,7 @@ v19.02 Public major release
- @ref NESoftmaxLayer
- Fused activation in @ref CLWinogradConvolutionLayer
- Extented @ref NEPermute to support more cases
- - Added NEON/SVE GEMM Hybrid kernels
+ - Added Neon/SVE GEMM Hybrid kernels
- Added u8 and s8 hybrid assembly kernels
- Introduced GEMM strategy name in NEGEMMAssemblyWrapper
- Improved @ref CLTuner
@@ -1012,7 +1012,7 @@ v18.05 Public major release
- Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
- Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions.
- Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
- - Moved neon assembly kernels to the folder src/core/NEON/kernels/arm_gemm.
+ - Moved neon assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
- Improved doxygen documentation.
- Improved memory management for layer's transitions.
- Added support for NHWC data layout in tensors.
@@ -1043,7 +1043,7 @@ v18.05 Public major release
- Port mobilenet example to NHWC data layout.
- Enabled Winograd method in @ref CLConvolutionLayer.
- Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer.
- - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/NEON/kernels/arm_gemm.
+ - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
- Added memory manager support in GLES functions.
- Major refactoring of the graph API.
- Added GLES backend in the graph API.
@@ -1053,7 +1053,7 @@ v18.05 Public major release
- Replaced NEDeconvolutionLayerUpsampleKernel with @ref NEScaleKernel in @ref NEDeconvolutionLayer.
- Added fast maths flag in @ref CLConvolutionLayer.
- Added new tests and benchmarks in validation and benchmark frameworks
- - Merge Activation layer with Convolution Layer (NEON. CL, GLES)
+ - Merge Activation layer with Convolution Layer (Neon. CL, GLES)
- Added support to OpenCL 2.0 SVM
- Added support to import memory in OpenCL tensors.
- Added the prepare() method to perform any one off pre-processing before running the function.
@@ -1072,7 +1072,7 @@ v18.03 Public maintenance release
- Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
v18.02 Public major release
- - Various NEON / OpenCL / GLES optimisations.
+ - Various Neon / OpenCL / GLES optimisations.
- Various bug fixes.
- Changed default number of threads on big LITTLE systems.
- Refactored examples and added:
@@ -1097,7 +1097,7 @@ v18.02 Public major release
- Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
- New OpenCL kernels / functions:
- CLDirectConvolutionLayerOutputStageKernel
- - New NEON kernels / functions
+ - New Neon kernels / functions
- Added name() method to all kernels.
- Added support for Winograd 5x5.
- NEPermuteKernel / @ref NEPermute
@@ -1124,9 +1124,9 @@ v18.01 Public maintenance release
- @ref GCGEMMInterleave4x4Kernel
- @ref GCGEMMTranspose1xWKernel
- @ref GCIm2ColKernel
- - Refactored NEON Winograd (NEWinogradLayerKernel)
+ - Refactored Neon Winograd (NEWinogradLayerKernel)
- Added @ref NEDirectConvolutionLayerOutputStageKernel
- - Added QASYMM8 support to the following NEON kernels:
+ - Added QASYMM8 support to the following Neon kernels:
- NEDepthwiseConvolutionLayer3x3Kernel
- @ref NEFillBorderKernel
- NEPoolingLayerKernel
@@ -1141,7 +1141,7 @@ v17.12 Public major release
- Introduced logging interface
- Introduced opencl timer
- Reworked GEMMLowp interface
- - Added new NEON assembly kernels for GEMMLowp, SGEMM and HGEMM
+ - Added new Neon assembly kernels for GEMMLowp, SGEMM and HGEMM
- Added validation method for most Machine Learning kernels / functions
- Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
- Added sgemm example for OpenCL
@@ -1168,7 +1168,7 @@ v17.12 Public major release
- @ref GCLogits1DMaxKernel / @ref GCLogits1DShiftExpSumKernel / @ref GCLogits1DNormKernel / @ref GCSoftmaxLayer
- @ref GCTransposeKernel / @ref GCTranspose
- - New NEON kernels / functions
+ - New Neon kernels / functions
- arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
- arm_compute::NEHGEMMAArch64FP16Kernel
- NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
@@ -1180,7 +1180,7 @@ v17.12 Public major release
- @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
- - New graph nodes for NEON and OpenCL
+ - New graph nodes for Neon and OpenCL
- graph::BranchLayer
- graph::DepthConvertLayer
- graph::DepthwiseConvolutionLayer
@@ -1204,8 +1204,8 @@ v17.09 Public major release
- Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
- Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
- New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
- - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both NEON and OpenCL.
- - New NEON kernels / functions:
+ - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Neon and OpenCL.
+ - New Neon kernels / functions:
- arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
- @ref NEDequantizationLayerKernel / @ref NEDequantizationLayer
- NEFloorKernel / @ref NEFloor
@@ -1231,12 +1231,12 @@ v17.09 Public major release
v17.06 Public major release
- Various bug fixes
- - Added support for fixed point 8 bit (QS8) to the various NEON machine learning kernels.
+ - Added support for fixed point 8 bit (QS8) to the various Neon machine learning kernels.
- Added unit tests and benchmarks (AlexNet, LeNet)
- Added support for sub tensors.
- Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
- - Added @ref OMPScheduler (OpenMP) scheduler for NEON
- - Added @ref SingleThreadScheduler scheduler for NEON (For bare metal)
+ - Added @ref OMPScheduler (OpenMP) scheduler for Neon
+ - Added @ref SingleThreadScheduler scheduler for Neon (For bare metal)
- User can specify his own scheduler by implementing the @ref IScheduler interface.
- New OpenCL kernels / functions:
- @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
@@ -1246,7 +1246,7 @@ v17.06 Public major release
- @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights
- New C++ kernels:
- @ref CPPDetectionWindowNonMaximaSuppressionKernel
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
- NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
- @ref NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
@@ -1284,11 +1284,11 @@ v17.04 Public bug fixes release
v17.03.1 First Major public release of the sources
- Renamed the library to arm_compute
- - New CPP target introduced for C++ kernels shared between NEON and CL functions.
+ - New CPP target introduced for C++ kernels shared between Neon and CL functions.
- New padding calculation interface introduced and ported most kernels / functions to use it.
- New OpenCL kernels / functions:
- CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NENormalizationLayerKernel / @ref NENormalizationLayer
- @ref NETransposeKernel / @ref NETranspose
- NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
@@ -1305,7 +1305,7 @@ v17.03 Sources preview
- @ref CLLKTrackerInitKernel, @ref CLLKTrackerStage0Kernel, @ref CLLKTrackerStage1Kernel, @ref CLLKTrackerFinalizeKernel / @ref CLOpticalFlow
- @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
- @ref CLLaplacianPyramid, @ref CLLaplacianReconstruct
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- NEActivationLayerKernel / @ref NEActivationLayer
- GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM
- NEPoolingLayerKernel / @ref NEPoolingLayer
@@ -1319,7 +1319,7 @@ v17.02.1 Sources preview
- @ref CLGaussianPyramidHorKernel, @ref CLGaussianPyramidVertKernel / @ref CLGaussianPyramid, @ref CLGaussianPyramidHalf, @ref CLGaussianPyramidOrb
- @ref CLMinMaxKernel, @ref CLMinMaxLocationKernel / @ref CLMinMaxLocation
- @ref CLNonLinearFilterKernel / @ref CLNonLinearFilter
- - New NEON FP16 kernels (Requires armv8.2 CPU)
+ - New Neon FP16 kernels (Requires armv8.2 CPU)
- @ref NEAccumulateWeightedFP16Kernel
- @ref NEBox3x3FP16Kernel
- @ref NENonMaximaSuppression3x3FP16Kernel
@@ -1331,7 +1331,7 @@ v17.02 Sources preview
- @ref CLDerivativeKernel / @ref CLChannelExtract
- @ref CLFastCornersKernel / @ref CLFastCorners
- @ref CLMeanStdDevKernel / @ref CLMeanStdDev
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- HOG / SVM: @ref NEHOGOrientationBinningKernel, @ref NEHOGBlockNormalizationKernel, @ref NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / @ref NEHOGDescriptor, @ref NEHOGDetector, @ref NEHOGGradient, @ref NEHOGMultiDetection
- @ref NENonLinearFilterKernel / @ref NENonLinearFilter
- Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
@@ -1524,11 +1524,11 @@ To see the build options available simply run ```scons -h```:
@b arch: The x86_32 and x86_64 targets can only be used with neon=0 and opencl=1.
@b os: Choose the operating system you are targeting: Linux, Android or bare metal.
-@note bare metal can only be used for NEON (not OpenCL), only static libraries get built and NEON's multi-threading support is disabled.
+@note bare metal can only be used for Neon (not OpenCL), only static libraries get built and Neon's multi-threading support is disabled.
@b build: you can either build directly on your device (native) or cross compile from your desktop machine (cross-compile). In both cases make sure the compiler is available in your path.
-@note If you want to natively compile for 32bit on a 64bit ARM device running a 64bit OS then you will have to use cross-compile too.
+@note If you want to natively compile for 32bit on a 64bit Arm device running a 64bit OS then you will have to use cross-compile too.
There is also an 'embed_only' option which will generate all the .embed files for the OpenCL kernels and / or OpenGLES compute shaders. This might be useful if using a different build system to compile the library.
@@ -1536,7 +1536,7 @@ In addittion the option 'compress_kernels' will compress the embedded OpenCL ker
@b Werror: If you are compiling using the same toolchains as the ones used in this guide then there shouldn't be any warning and therefore you should be able to keep Werror=1. If with a different compiler version the library fails to build because of warnings interpreted as errors then, if you are sure the warnings are not important, you might want to try to build with Werror=0 (But please do report the issue on Github).
-@b opencl / @b neon / @b gles_compute: Choose which SIMD technology you want to target. (NEON for ARM Cortex-A CPUs or OpenCL / GLES_COMPUTE for ARM Mali GPUs)
+@b opencl / @b neon / @b gles_compute: Choose which SIMD technology you want to target. (Neon for Arm Cortex-A CPUs or OpenCL / GLES_COMPUTE for Arm Mali GPUs)
@b embed_kernels: For OpenCL / GLES_COMPUTE only: set embed_kernels=1 if you want the OpenCL / GLES_COMPUTE kernels to be built in the library's binaries instead of being read from separate ".cl" / ".cs" files. If embed_kernels is set to 0 then the application can set the path to the folder containing the OpenCL / GLES_COMPUTE kernel files by calling CLKernelLibrary::init() / GCKernelLibrary::init(). By default the path is set to "./cl_kernels" / "./cs_shaders".
@@ -1564,11 +1564,11 @@ Example:
@b mali: Enable the collection of Mali hardware counters to measure execution time in benchmark tests. (Your device needs to have a Mali driver that supports it)
-@b openmp Build in the OpenMP scheduler for NEON.
+@b openmp Build in the OpenMP scheduler for Neon.
@note Only works when building with g++ not clang++
-@b cppthreads Build in the C++11 scheduler for NEON.
+@b cppthreads Build in the C++11 scheduler for Neon.
@sa Scheduler::set
@@ -1582,12 +1582,12 @@ In order to use this option, the external tests directory must have the followin
│   ├── CL
│   ├── datasets
│   ├── fixtures
- │   └── NEON
+ │   └── Neon
└── validation
   ├── CL
    ├── datasets
    ├── fixtures
-     └── NEON
+     └── Neon
Then, build the library with `external_tests_dir=<PATH_TO_EXTERNAL_TESTS_DIR>`.
@@ -1600,7 +1600,7 @@ For Linux, the library was successfully built and tested using the following Lin
- gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf
- gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu
-To cross-compile the library in debug mode, with NEON only support, for Linux 32bit:
+To cross-compile the library in debug mode, with Neon only support, for Linux 32bit:
scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=linux arch=armv7a
@@ -1612,12 +1612,12 @@ To cross-compile the library in asserts mode, with GLES_COMPUTE only support, fo
scons Werror=1 -j8 debug=0 asserts=1 neon=0 opencl=0 gles_compute=1 embed_kernels=1 os=linux arch=arm64-v8a
-You can also compile the library natively on an ARM device by using <b>build=native</b>:
+You can also compile the library natively on an Arm device by using <b>build=native</b>:
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=arm64-v8a build=native
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=armv7a build=native
-@note g++ for ARM is mono-arch, therefore if you want to compile for Linux 32bit on a Linux 64bit platform you will have to use a cross compiler.
+@note g++ for Arm is mono-arch, therefore if you want to compile for Linux 32bit on a Linux 64bit platform you will have to use a cross compiler.
For example on a 64bit Debian based system you would have to install <b>g++-arm-linux-gnueabihf</b>
@@ -1637,11 +1637,11 @@ The examples get automatically built by scons as part of the build process of th
@note The following command lines assume the arm_compute libraries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built libraries with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.
-To cross compile a NEON example for Linux 32bit:
+To cross compile a Neon example for Linux 32bit:
arm-linux-gnueabihf-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute -larm_compute_core -o neon_convolution
-To cross compile a NEON example for Linux 64bit:
+To cross compile a Neon example for Linux 64bit:
aarch64-linux-gnu-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -L. -larm_compute -larm_compute_core -o neon_convolution
@@ -1679,21 +1679,21 @@ i.e. to cross compile the "graph_lenet" example for Linux 64bit:
@note If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, arm_compute, arm_compute_core
-To compile natively (i.e directly on an ARM device) for NEON for Linux 32bit:
+To compile natively (i.e directly on an Arm device) for Neon for Linux 32bit:
g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -larm_compute -larm_compute_core -o neon_convolution
-To compile natively (i.e directly on an ARM device) for NEON for Linux 64bit:
+To compile natively (i.e directly on an Arm device) for Neon for Linux 64bit:
g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o neon_convolution
(notice the only difference with the 32 bit command is that we don't need the -mfpu option)
-To compile natively (i.e directly on an ARM device) for OpenCL for Linux 32bit or Linux 64bit:
+To compile natively (i.e directly on an Arm device) for OpenCL for Linux 32bit or Linux 64bit:
g++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o cl_convolution -DARM_COMPUTE_CL
-To compile natively (i.e directly on an ARM device) for GLES for Linux 32bit or Linux 64bit:
+To compile natively (i.e directly on an Arm device) for GLES for Linux 32bit or Linux 64bit:
g++ examples/gc_absdiff.cpp utils/Utils.cpp -I. -Iinclude/ -L. -larm_compute -larm_compute_core -std=c++14 -DARM_COMPUTE_GC -Iinclude/linux/ -o gc_absdiff
@@ -1768,7 +1768,7 @@ Here is a guide to <a href="https://developer.android.com/ndk/guides/standalone_
@subsubsection S3_3_1_library How to build the library ?
-To cross-compile the library in debug mode, with NEON only support, for Android 32bit:
+To cross-compile the library in debug mode, with Neon only support, for Android 32bit:
CXX=clang++ CC=clang scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=android arch=armv7a
@@ -1788,7 +1788,7 @@ The examples get automatically built by scons as part of the build process of th
Once you've got your Android standalone toolchain built and added to your path you can do the following:
-To cross compile a NEON example:
+To cross compile a Neon example:
#32 bit:
arm-linux-androideabi-clang++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o neon_convolution_arm -static-libstdc++ -pie
@@ -1850,7 +1850,7 @@ And finally to run the example:
For example:
adb shell /data/local/tmp/graph_lenet --help
-In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on NEON, 1 to run on OpenCL if available, 2 to run on OpenCL using the CLTuner), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run.
+In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on Neon, 1 to run on OpenCL if available, 2 to run on OpenCL using the CLTuner), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run.
@subsection S3_4_macos Building for macOS
@@ -1874,7 +1874,7 @@ Download linaro for <a href="https://releases.linaro.org/components/toolchain/bi
@subsubsection S3_5_1_library How to build the library ?
-To cross-compile the library with NEON support for baremetal arm64-v8a:
+To cross-compile the library with Neon support for baremetal arm64-v8a:
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=bare_metal arch=arm64-v8a build=cross_compile cppthreads=0 openmp=0 standalone=1
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 742a246582..848b060e9f 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -38,10 +38,10 @@ The Core library is a low level collection of algorithms implementations, it is
The Runtime library is a very basic wrapper around the Core library which can be used for quick prototyping, it is basic in the sense that:
- It allocates images and tensors by using standard malloc().
-- It multi-threads NEON code in a very basic way using a very simple pool of threads.
+- It multi-threads Neon code in a very basic way using a very simple pool of threads.
- For OpenCL it uses the default CLScheduler command queue for all mapping operations and kernels.
-For maximum performance, it is expected that the users would re-implement an equivalent to the runtime library which suits better their needs (With a more clever multi-threading strategy, load-balancing between NEON and OpenCL, etc.)
+For maximum performance, it is expected that the users would re-implement an equivalent to the runtime library which suits better their needs (With a more clever multi-threading strategy, load-balancing between Neon and OpenCL, etc.)
@section S4_1_2 Data-type and Data-layout support
@@ -62,7 +62,7 @@ where N = batches, C = channels, H = height, W = width
@section S4_1_3 Fast-math support
Compute Library supports different types of convolution methods, fast-math flag is only used for the Winograd algorithm.
-When the fast-math flag is enabled, both NEON and CL convolution layers will try to dispatch the fastest implementation available, which may introduce a drop in accuracy as well. The different scenarios involving the fast-math flag are presented below:
+When the fast-math flag is enabled, both Neon and CL convolution layers will try to dispatch the fastest implementation available, which may introduce a drop in accuracy as well. The different scenarios involving the fast-math flag are presented below:
- For FP32:
- no-fast-math: Only supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7
- fast-math: Supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7,5x5,7x7
@@ -116,7 +116,7 @@ kernel.run( q, max_window ); // Enqueue the kernel to process the full window on
q.finish();
@endcode
-NEON / CPP kernels:
+Neon / CPP kernels:
@code{.cpp}
//Create a kernel object:
@@ -131,7 +131,7 @@ kernel.run( max_window ); // Run the kernel on the full window
@subsection S4_2_3 Multi-threading
-The previous section shows how to run a NEON / CPP kernel in the current thread, however if your system has several CPU cores, you will probably want the kernel to use several cores. Here is how this can be done:
+The previous section shows how to run a Neon / CPP kernel in the current thread, however if your system has several CPU cores, you will probably want the kernel to use several cores. Here is how this can be done:
@code{.cpp}
ThreadInfo info;
@@ -181,7 +181,7 @@ The previous section shows how to run a NEON / CPP kernel in the current thread,
}
@endcode
-This is a very basic implementation which was originally used in the NEON runtime library by all the NEON functions.
+This is a very basic implementation which was originally used in the Neon runtime library by all the Neon functions.
@sa CPPScheduler
@@ -228,11 +228,11 @@ In order to block until all the jobs in the CLScheduler's command queue are done
For example:
@snippet cl_events.cpp OpenCL events
-@subsection S4_4_2_cl_neon OpenCL / NEON interoperability
+@subsection S4_4_2_cl_neon OpenCL / Neon interoperability
-You can mix OpenCL and NEON kernels and functions. However it is the user's responsibility to handle the mapping/unmapping of OpenCL objects, for example:
+You can mix OpenCL and Neon kernels and functions. However it is the user's responsibility to handle the mapping/unmapping of OpenCL objects, for example:
-@snippet neoncl_scale_median_gaussian.cpp NEON / OpenCL Interop
+@snippet neoncl_scale_median_gaussian.cpp Neon / OpenCL Interop
@sa main_neoncl_scale_median_gaussian
@@ -256,7 +256,7 @@ You have 3 types of @ref BorderMode :
- @ref BorderMode::REPLICATE : Neighbor pixels outside of the image are treated as having the same value as the closest valid pixel.
- @ref BorderMode::CONSTANT : Neighbor pixels outside of the image are treated as having the same constant value. (The user can choose what this value should be).
-Moreover both OpenCL and NEON use vector loads and stores instructions to access the data in buffers, so in order to avoid having special cases to handle for the borders all the images and tensors used in this library must be padded.
+Moreover both OpenCL and Neon use vector loads and stores instructions to access the data in buffers, so in order to avoid having special cases to handle for the borders all the images and tensors used in this library must be padded.
@subsubsection padding Padding
@@ -483,7 +483,7 @@ conv2.run();
The implemented @ref TensorAllocator and @ref CLTensorAllocator objects provide an interface capable of importing existing memory to a tensor as backing memory.
-A simple NEON example can be the following:
+A simple Neon example can be the following:
@code{.cpp}
// External backing memory
void* external_ptr = ...;
@@ -550,6 +550,6 @@ Consequently, this will allow finer control of these services among pipelines wh
This feature introduces some changes to our API.
All the kernels/functions will now accept a Runtime Context object which will allow the function to use the mentioned services.
-Finally, we will try to adapt our code-base progressively to use the new mechanism but will continue supporting the legacy mechanism to allow a smooth transition. Changes will apply to all our three backends: NEON, OpenCL and OpenGL ES.
+Finally, we will try to adapt our code-base progressively to use the new mechanism but will continue supporting the legacy mechanism to allow a smooth transition. Changes will apply to all our three backends: Neon, OpenCL and OpenGL ES.
*/
} // namespace arm_compute
diff --git a/docs/02_tests.dox b/docs/02_tests.dox
index c46e1f5663..0aee8e59d8 100644
--- a/docs/02_tests.dox
+++ b/docs/02_tests.dox
@@ -371,7 +371,7 @@ To run the OpenCL precommit validation tests:
LD_LIBRARY_PATH=. ./arm_compute_validation --mode=precommit --filter="^CL.*"
-To run the NEON precommit benchmark tests with PMU and Wall Clock timer in miliseconds instruments enabled:
+To run the Neon precommit benchmark tests with PMU and Wall Clock timer in miliseconds instruments enabled:
LD_LIBRARY_PATH=. ./arm_compute_benchmark --mode=precommit --filter="^NEON.*" --instruments="pmu,wall_clock_timer_ms" --iterations=10
diff --git a/docs/04_adding_operator.dox b/docs/04_adding_operator.dox
index f311fb4d51..1b4b575964 100644
--- a/docs/04_adding_operator.dox
+++ b/docs/04_adding_operator.dox
@@ -71,12 +71,12 @@ Similarly, all common functions that process shapes, like calculating output sha
@subsection S4_1_2_add_kernel Add a kernel
-As we mentioned at the beginning, the kernel is the implementation of the operator or algorithm partially using a specific programming language related to the backend we want to use. Adding a kernel in the library means implementing the algorithm in a SIMD technology like NEON or OpenCL. All kernels in Compute Library must implement a common interface IKernel or one of the specific subinterfaces.
+As we mentioned at the beginning, the kernel is the implementation of the operator or algorithm partially using a specific programming language related to the backend we want to use. Adding a kernel in the library means implementing the algorithm in a SIMD technology like Neon or OpenCL. All kernels in Compute Library must implement a common interface IKernel or one of the specific subinterfaces.
IKernel is the common interface for all the kernels in the core library, it contains the main methods for configure and run the kernel itself, such as window() that return the maximum window the kernel can be executed on or is_parallelisable() for indicate whether or not the kernel is parallelizable. If the kernel is parallelizable then the window returned by the window() method can be split into sub-windows which can then be run in parallel, in the other case, only the window returned by window() can be passed to the run method.
There are specific interfaces for OpenCL and Neon: @ref ICLKernel, INEKernel (using INEKernel = @ref ICPPKernel).
- @ref ICLKernel is the common interface for all the OpenCL kernels. It implements the inherited methods and adds all the methods necessary to configure the CL kernel, such as set/return the Local-Workgroup-Size hint, add single, array or tensor argument, set the targeted GPU architecture according to the CL device. All these methods are used during the configuration and the run of the operator.
-- INEKernel inherits from @ref IKernel as well and it's the common interface for all kernels implemented in NEON, it adds just the run and the name methods.
+- INEKernel inherits from @ref IKernel as well and it's the common interface for all kernels implemented in Neon, it adds just the run and the name methods.
There are two others implementation of @ref IKernel called @ref ICLSimpleKernel and INESimpleKernel, they are the interface for simple kernels that have just one input tensor and one output tensor.
Creating a new kernel implies adding new files:
@@ -120,10 +120,10 @@ For OpenCL:
@snippet src/core/gpu/cl/kernels/ClReshapeKernel.cpp ClReshapeKernel Kernel
The run will call the function defined in the .cl file.
-For the NEON backend case:
+For the Neon backend case:
@snippet src/core/cpu/kernels/CpuReshapeKernel.cpp NEReshapeLayerKernel Kernel
-In the NEON case, there is no need to add an extra file and we implement the kernel in the same NEReshapeLayerKernel.cpp file.
+In the Neon case, there is no need to add an extra file and we implement the kernel in the same NEReshapeLayerKernel.cpp file.
If the tests are already in place, the new kernel can be tested using the existing tests by adding the configure and run of the kernel to the compute_target() in the fixture.
@@ -137,13 +137,13 @@ If the tests are already in place, the new kernel can be tested using the existi
- (sub[n].start() - max[n].start()) % max[n].step() == 0
- (sub[n].end() - sub[n].start()) % max[n].step() == 0
-@ref CPPScheduler::schedule provides a sample implementation that is used for NEON kernels.
-%Memory management is the other aspect that the runtime layer is supposed to handle. %Memory management of the tensors is abstracted using TensorAllocator. Each tensor holds a pointer to a TensorAllocator object, which is used to allocate and free the memory at runtime. The implementation that is currently supported in Compute Library allows memory blocks, required to be fulfilled for a given operator, to be grouped together under a @ref MemoryGroup. Each group can be acquired and released. The underlying implementation of memory groups vary depending on whether NEON or CL is used. The memory group class uses memory pool to provide the required memory. It also uses the memory manager to manage the lifetime and a IPoolManager to manage the memory pools registered with the memory manager.
+@ref CPPScheduler::schedule provides a sample implementation that is used for Neon kernels.
+%Memory management is the other aspect that the runtime layer is supposed to handle. %Memory management of the tensors is abstracted using TensorAllocator. Each tensor holds a pointer to a TensorAllocator object, which is used to allocate and free the memory at runtime. The implementation that is currently supported in Compute Library allows memory blocks, required to be fulfilled for a given operator, to be grouped together under a @ref MemoryGroup. Each group can be acquired and released. The underlying implementation of memory groups vary depending on whether Neon or CL is used. The memory group class uses memory pool to provide the required memory. It also uses the memory manager to manage the lifetime and a IPoolManager to manage the memory pools registered with the memory manager.
We have seen the various interfaces for a kernel in the core library, the same structure the same file structure design exists in the runtime module. IFunction is the base class for all the functions, it has two child interfaces: ICLSimpleFunction and INESimpleFunction that are used as base class for functions which call a single kernel.
-The new operator has to implement %validate(), configure() and run(), these methods will call the respective function in the kernel considering that the multi-threading is used for the kernels which are parallelizable, by default std::thread::hardware_concurrency() threads are used. For NEON function can be used CPPScheduler::set_num_threads() to manually set the number of threads, whereas for OpenCL kernels all the kernels are enqueued on the queue associated with CLScheduler and the queue is then flushed.
+The new operator has to implement %validate(), configure() and run(), these methods will call the respective function in the kernel considering that the multi-threading is used for the kernels which are parallelizable, by default std::thread::hardware_concurrency() threads are used. For Neon function can be used CPPScheduler::set_num_threads() to manually set the number of threads, whereas for OpenCL kernels all the kernels are enqueued on the queue associated with CLScheduler and the queue is then flushed.
For the runtime functions, there is an extra method implemented: prepare(), this method prepares the function for the run, it does all the heavy operations that are done only once (reshape the weight, release the memory not necessary after the reshape, etc). The prepare method can be called standalone or in the first run, if not called before, after then the function will be marked as prepared.
The files we add are:
@@ -214,7 +214,7 @@ void CLAddReshapeLayer::run()
@endcode
-For NEON:
+For Neon:
@code{.cpp}
using namespace arm_compute;
diff --git a/docs/06_functions_list.dox b/docs/06_functions_list.dox
index 61712a29f5..96dce94a89 100644
--- a/docs/06_functions_list.dox
+++ b/docs/06_functions_list.dox
@@ -29,7 +29,7 @@ namespace arm_compute
@tableofcontents
-@section S6_1 NEON functions
+@section S6_1 Neon functions
- @ref IFunction
- @ref INESimpleFunction
diff --git a/docs/07_errata.dox b/docs/07_errata.dox
index 994b8c5bd7..7436f14bbc 100644
--- a/docs/07_errata.dox
+++ b/docs/07_errata.dox
@@ -42,7 +42,7 @@ namespace arm_compute
- Mali DDK r1p0 - r8p0, and
- Linux kernel >= 4.4
-- On Android with arm64-v8a/arm64-v8.2-a architecture, NEON validation tests can fail when compiled using Android Ndk
+- On Android with arm64-v8a/arm64-v8.2-a architecture, Neon validation tests can fail when compiled using Android Ndk
>= r18b in debug mode (https://github.com/android/ndk/issues/1135).
- Versions Affected: >= v19.11
- OSs Affected: Android
diff --git a/docs/ComputeLibrary.dir b/docs/ComputeLibrary.dir
index 7733e531cd..8b77ed4f02 100644
--- a/docs/ComputeLibrary.dir
+++ b/docs/ComputeLibrary.dir
@@ -64,15 +64,15 @@
*/
/** @dir src/core/NEON
- * @brief NEON backend core: kernels and utilities.
+ * @brief Neon backend core: kernels and utilities.
*/
/** @file src/core/NEON/NEKernels.h
- * @brief Includes all the NEON kernels at once
+ * @brief Includes all the Neon kernels at once
*/
/** @dir src/core/NEON/kernels
- * @brief Folder containing all the NEON kernels
+ * @brief Folder containing all the Neon kernels
*/
/** @dir arm_compute/core/utils
@@ -100,7 +100,7 @@
*/
/** @dir arm_compute/graph/backends/NEON
- * @brief NEON specific operations
+ * @brief Neon specific operations
*/
/** @dir arm_compute/graph/detail
@@ -160,7 +160,7 @@
*/
/** @file arm_compute/runtime/CPP/CPPScheduler.h
- * @brief Basic pool of threads to execute CPP/NEON code on several cores in parallel.
+ * @brief Basic pool of threads to execute CPP/Neon code on several cores in parallel.
*/
/** @dir arm_compute/runtime/CPP/functions
@@ -188,15 +188,15 @@
*/
/** @dir arm_compute/runtime/NEON
- * @brief NEON backend runtime interface.
+ * @brief Neon backend runtime interface.
*/
/** @file arm_compute/runtime/NEON/NEFunctions.h
- * @brief Includes all the NEON functions at once.
+ * @brief Includes all the Neon functions at once.
*/
/** @dir arm_compute/runtime/NEON/functions
- * @brief Folder containing all the NEON functions.
+ * @brief Folder containing all the Neon functions.
*/
/** @dir arm_compute/runtime/OMP
@@ -223,8 +223,8 @@
* -# cl_*.cpp --> OpenCL examples
* -# gc_*.cpp --> GLES compute shaders examples
* -# graph_*.cpp --> Graph examples
- * -# neoncl_*.cpp --> NEON / OpenCL interoperability examples
- * -# neon_*.cpp --> NEON examples
+ * -# neoncl_*.cpp --> Neon / OpenCL interoperability examples
+ * -# neon_*.cpp --> Neon examples
*/
/** @dir examples/gemm_tuner
@@ -252,11 +252,11 @@
*/
/** @dir src/core/NEON/wrapper
- * @brief NEON wrapper used to simplify code
+ * @brief Neon wrapper used to simplify code
*/
/** @file src/core/NEON/wrapper/traits.h
- * @brief Traits defined on NEON vectors
+ * @brief Traits defined on Neon vectors
*/
/** @file src/core/NEON/wrapper/wrapper.h
@@ -264,7 +264,7 @@
*/
/** @dir src/core/NEON/wrapper/intrinsics
- * @brief NEON intrinsics wrappers
+ * @brief Neon intrinsics wrappers
*/
/** @dir src/core/NEON/wrapper/scalar
@@ -300,7 +300,7 @@
*/
/** @dir tests/NEON
- * @brief NEON accessors.
+ * @brief Neon accessors.
*/
/** @dir tests/benchmark
@@ -316,7 +316,7 @@
*/
/** @dir tests/benchmark/NEON
- * @brief NEON benchmarking tests.
+ * @brief Neon benchmarking tests.
*/
/** @dir tests/benchmark_examples
@@ -352,7 +352,7 @@
*/
/** @dir tests/validation/NEON
- * @brief NEON validation tests.
+ * @brief Neon validation tests.
*/
/** @dir tests/validation/reference