path: root/docs/00_introduction.dox
diff options
Diffstat (limited to 'docs/00_introduction.dox')
1 files changed, 68 insertions, 68 deletions
diff --git a/docs/00_introduction.dox b/docs/00_introduction.dox
index 45824b1f42..389757e9d9 100644
--- a/docs/00_introduction.dox
+++ b/docs/00_introduction.dox
@@ -27,12 +27,12 @@ namespace arm_compute
-The Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies.
+The Computer Vision and Machine Learning library is a set of functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Several builds of the library are available using various configurations:
- - OS: Linux, Android, macOS or bare metal.
- - Architecture: armv7a (32bit) or arm64-v8a (64bit)
- - Technology: NEON / OpenCL / GLES_COMPUTE / NEON and OpenCL and GLES_COMPUTE
+ - OS: Android or Linux.
+ - Architecture: armv7a (32bit) or arm64-v8a (64bit).
+ - Technology: Neon / OpenCL / Neon and OpenCL.
- Debug / Asserts / Release: Use a build with asserts enabled to debug your application and enable extra validation. Once you are sure your application works as expected you can switch to a release build of the library for maximum performance.
@section S0_1_contact Contact / Support
@@ -161,11 +161,11 @@ v20.11 Public major release
- @ref CLLogicalNot
- @ref CLLogicalAnd
- @ref CLLogicalOr
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NELogicalNot
- @ref NELogicalAnd
- @ref NELogicalOr
- - Removed padding from NEON kernels:
+ - Removed padding from Neon kernels:
- @ref NEComplexPixelWiseMultiplicationKernel
- @ref NENonMaximaSuppression3x3Kernel
- @ref NERemapKernel
@@ -315,7 +315,7 @@ v20.11 Public major release
- CLWarpAffineKernel
- CLWarpPerspective
- CLWarpPerspectiveKernel
- - Deprecated NEON kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
+ - Deprecated Neon kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
- NELocallyConnectedLayer
- NELocallyConnectedMatrixMultiplyKernel
- NEAbsoluteDifference
@@ -449,7 +449,7 @@ v20.08 Public major release
- @ref CLScaleKernel
- New OpenCL kernels / functions:
- @ref CLMaxUnpoolingLayerKernel
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEMaxUnpoolingLayerKernel
- New graph example:
- graph_yolov3_output_detector
@@ -485,7 +485,7 @@ v20.08 Public major release
- Removed OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- - Removed NEON kernels / functions:
+ - Removed Neon kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- NEGEMMMatrixAccumulateBiasesKernel
- Deprecated functions / interfaces:
@@ -551,7 +551,7 @@ v20.05 Public major release
- New OpenCL kernels / functions:
- @ref CLQLSTMLayer
- @ref CLQLSTMLayerNormalizationKernel
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEQLSTMLayer
- @ref NEQLSTMLayerNormalizationKernel
- Added HARD_SWISH support in:
@@ -560,20 +560,20 @@ v20.05 Public major release
- Deprecated OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- - Deprecated NEON kernels / functions:
+ - Deprecated Neon kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- Removed CPP kernels / functions:
- CPPFlipWeightsKernel
- Removed PoolingLayerInfo constructors without Data Layout.
- Removed CLDepthwiseConvolutionLayer3x3
- Removed NEDepthwiseConvolutionLayerOptimized
- - Added support for Winograd 3x3,4x4 on NEON FP16:
+ - Added support for Winograd 3x3,4x4 on Neon FP16:
- @ref NEWinogradConvolutionLayer
- @ref NEWinogradLayerTransformInputKernel
- @ref NEWinogradLayerTransformOutputKernel
- @ref NEWinogradLayerTransformWeightsKernel
- Added CLCompileContext
- - Added NEON GEMM kernel with 2D window support
+ - Added Neon GEMM kernel with 2D window support
v20.02.1 Maintenance release
- Added Android-NN build script.
@@ -611,14 +611,14 @@ v20.02 Public major release
- New OpenCL kernels / functions:
- @ref CLFill
- CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEFill
- @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
- - Deprecated NEON functions / interfaces:
+ - Deprecated Neon functions / interfaces:
- CLDepthwiseConvolutionLayer3x3
- NEDepthwiseConvolutionLayerOptimized
- PoolingLayerInfo constructors without Data Layout.
- - Added support for quantization with multiplier greater than 1 on NEON and CL.
+ - Added support for quantization with multiplier greater than 1 on Neon and CL.
- Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
- Added the ability to build bootcode for bare metal.
- Added support for generating synthetic QASYMM8 graphs.
@@ -643,7 +643,7 @@ v19.11 Public major release
- CLDepthwiseSeparableConvolutionLayer
- CLDepthwiseVectorToTensorKernel
- CLDirectConvolutionLayerOutputStageKernel
- - Deprecated NEON kernels / functions:
+ - Deprecated Neon kernels / functions:
- NEDepthwiseWeightsReshapeKernel
- NEDepthwiseIm2ColKernel
- NEDepthwiseSeparableConvolutionLayer
@@ -654,7 +654,7 @@ v19.11 Public major release
- @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
OpenCL kernels / functions)
- @ref CLLogSoftmaxLayer
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
- @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
- @ref NEDetectionPostProcessLayer
@@ -693,8 +693,8 @@ v19.11 Public major release
- Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
- Improved performance for CL Inception V3 - FP16.
- Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
- - Improved NEON performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
- - Improved NEON performance for MobileNet-SSD by improving the output detection performance.
+ - Improved Neon performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
+ - Improved Neon performance for MobileNet-SSD by improving the output detection performance.
- Optimized @ref CLPadLayer.
- Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
- Reduced memory consumption by implementing weights sharing.
@@ -710,7 +710,7 @@ v19.08.1 Public maintenance release
v19.08 Public major release
- Various bug fixes.
- Various optimisations.
- - Deprecated NEON functions
+ - Deprecated Neon functions
- NEDepthConcatenateLayer
- NEWidthConcatenateLayer
- Deprecated OpenCL kernels / functions
@@ -718,7 +718,7 @@ v19.08 Public major release
- CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
- CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
- CLWidthConcatenateLayer
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEAbsLayer
- @ref NECast
- @ref NEElementwisePower
@@ -757,15 +757,15 @@ v19.08 Public major release
- Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
- Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
- Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
- - Re-factored the depthwise convolution layer kernel on NEON for generic cases
- - Added an optimized depthwise convolution layer kernel for 5x5 filters (NEON only)
+ - Re-factored the depthwise convolution layer kernel on Neon for generic cases
+ - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only)
- Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
- Altered @ref QuantizationInfo interface to support per-channel quantization.
- The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations.
- The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
- Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
- Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
- - Optimized the NEON assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
+ - Optimized the Neon assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
v19.05 Public major release
- Various bug fixes.
@@ -822,7 +822,7 @@ v19.05 Public major release
- Add support for QASYMM8 in NEArithmeticSubtractionKernel.
- Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
- Add support for QASYMM8 NEDeconvolution.
- - Add support for DequantizationLayer for NEON/CL.
+ - Add support for DequantizationLayer for Neon/CL.
- Add support for dilation in CLDepthwiseConvolution.
- Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
- Optimize CLDeconvolution.
@@ -898,7 +898,7 @@ v19.02 Public major release
- @ref NESoftmaxLayer
- Fused activation in @ref CLWinogradConvolutionLayer
- Extented @ref NEPermute to support more cases
- - Added NEON/SVE GEMM Hybrid kernels
+ - Added Neon/SVE GEMM Hybrid kernels
- Added u8 and s8 hybrid assembly kernels
- Introduced GEMM strategy name in NEGEMMAssemblyWrapper
- Improved @ref CLTuner
@@ -1012,7 +1012,7 @@ v18.05 Public major release
- Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
- Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions.
- Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
- - Moved neon assembly kernels to the folder src/core/NEON/kernels/arm_gemm.
+ - Moved neon assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
- Improved doxygen documentation.
- Improved memory management for layer's transitions.
- Added support for NHWC data layout in tensors.
@@ -1043,7 +1043,7 @@ v18.05 Public major release
- Port mobilenet example to NHWC data layout.
- Enabled Winograd method in @ref CLConvolutionLayer.
- Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer.
- - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/NEON/kernels/arm_gemm.
+ - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
- Added memory manager support in GLES functions.
- Major refactoring of the graph API.
- Added GLES backend in the graph API.
@@ -1053,7 +1053,7 @@ v18.05 Public major release
- Replaced NEDeconvolutionLayerUpsampleKernel with @ref NEScaleKernel in @ref NEDeconvolutionLayer.
- Added fast maths flag in @ref CLConvolutionLayer.
- Added new tests and benchmarks in validation and benchmark frameworks
- - Merge Activation layer with Convolution Layer (NEON. CL, GLES)
+ - Merge Activation layer with Convolution Layer (Neon. CL, GLES)
- Added support to OpenCL 2.0 SVM
- Added support to import memory in OpenCL tensors.
- Added the prepare() method to perform any one off pre-processing before running the function.
@@ -1072,7 +1072,7 @@ v18.03 Public maintenance release
- Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
v18.02 Public major release
- - Various NEON / OpenCL / GLES optimisations.
+ - Various Neon / OpenCL / GLES optimisations.
- Various bug fixes.
- Changed default number of threads on big LITTLE systems.
- Refactored examples and added:
@@ -1097,7 +1097,7 @@ v18.02 Public major release
- Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
- New OpenCL kernels / functions:
- CLDirectConvolutionLayerOutputStageKernel
- - New NEON kernels / functions
+ - New Neon kernels / functions
- Added name() method to all kernels.
- Added support for Winograd 5x5.
- NEPermuteKernel / @ref NEPermute
@@ -1124,9 +1124,9 @@ v18.01 Public maintenance release
- @ref GCGEMMInterleave4x4Kernel
- @ref GCGEMMTranspose1xWKernel
- @ref GCIm2ColKernel
- - Refactored NEON Winograd (NEWinogradLayerKernel)
+ - Refactored Neon Winograd (NEWinogradLayerKernel)
- Added @ref NEDirectConvolutionLayerOutputStageKernel
- - Added QASYMM8 support to the following NEON kernels:
+ - Added QASYMM8 support to the following Neon kernels:
- NEDepthwiseConvolutionLayer3x3Kernel
- @ref NEFillBorderKernel
- NEPoolingLayerKernel
@@ -1141,7 +1141,7 @@ v17.12 Public major release
- Introduced logging interface
- Introduced opencl timer
- Reworked GEMMLowp interface
- - Added new NEON assembly kernels for GEMMLowp, SGEMM and HGEMM
+ - Added new Neon assembly kernels for GEMMLowp, SGEMM and HGEMM
- Added validation method for most Machine Learning kernels / functions
- Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
- Added sgemm example for OpenCL
@@ -1168,7 +1168,7 @@ v17.12 Public major release
- @ref GCLogits1DMaxKernel / @ref GCLogits1DShiftExpSumKernel / @ref GCLogits1DNormKernel / @ref GCSoftmaxLayer
- @ref GCTransposeKernel / @ref GCTranspose
- - New NEON kernels / functions
+ - New Neon kernels / functions
- arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
- arm_compute::NEHGEMMAArch64FP16Kernel
- NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
@@ -1180,7 +1180,7 @@ v17.12 Public major release
- @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
- - New graph nodes for NEON and OpenCL
+ - New graph nodes for Neon and OpenCL
- graph::BranchLayer
- graph::DepthConvertLayer
- graph::DepthwiseConvolutionLayer
@@ -1204,8 +1204,8 @@ v17.09 Public major release
- Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
- Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
- New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
- - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both NEON and OpenCL.
- - New NEON kernels / functions:
+ - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Neon and OpenCL.
+ - New Neon kernels / functions:
- arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
- @ref NEDequantizationLayerKernel / @ref NEDequantizationLayer
- NEFloorKernel / @ref NEFloor
@@ -1231,12 +1231,12 @@ v17.09 Public major release
v17.06 Public major release
- Various bug fixes
- - Added support for fixed point 8 bit (QS8) to the various NEON machine learning kernels.
+ - Added support for fixed point 8 bit (QS8) to the various Neon machine learning kernels.
- Added unit tests and benchmarks (AlexNet, LeNet)
- Added support for sub tensors.
- Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
- - Added @ref OMPScheduler (OpenMP) scheduler for NEON
- - Added @ref SingleThreadScheduler scheduler for NEON (For bare metal)
+ - Added @ref OMPScheduler (OpenMP) scheduler for Neon
+ - Added @ref SingleThreadScheduler scheduler for Neon (For bare metal)
- User can specify his own scheduler by implementing the @ref IScheduler interface.
- New OpenCL kernels / functions:
- @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
@@ -1246,7 +1246,7 @@ v17.06 Public major release
- @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights
- New C++ kernels:
- @ref CPPDetectionWindowNonMaximaSuppressionKernel
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
- NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
- @ref NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
@@ -1284,11 +1284,11 @@ v17.04 Public bug fixes release
v17.03.1 First Major public release of the sources
- Renamed the library to arm_compute
- - New CPP target introduced for C++ kernels shared between NEON and CL functions.
+ - New CPP target introduced for C++ kernels shared between Neon and CL functions.
- New padding calculation interface introduced and ported most kernels / functions to use it.
- New OpenCL kernels / functions:
- CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- @ref NENormalizationLayerKernel / @ref NENormalizationLayer
- @ref NETransposeKernel / @ref NETranspose
- NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
@@ -1305,7 +1305,7 @@ v17.03 Sources preview
- @ref CLLKTrackerInitKernel, @ref CLLKTrackerStage0Kernel, @ref CLLKTrackerStage1Kernel, @ref CLLKTrackerFinalizeKernel / @ref CLOpticalFlow
- @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
- @ref CLLaplacianPyramid, @ref CLLaplacianReconstruct
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- NEActivationLayerKernel / @ref NEActivationLayer
- GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM
- NEPoolingLayerKernel / @ref NEPoolingLayer
@@ -1319,7 +1319,7 @@ v17.02.1 Sources preview
- @ref CLGaussianPyramidHorKernel, @ref CLGaussianPyramidVertKernel / @ref CLGaussianPyramid, @ref CLGaussianPyramidHalf, @ref CLGaussianPyramidOrb
- @ref CLMinMaxKernel, @ref CLMinMaxLocationKernel / @ref CLMinMaxLocation
- @ref CLNonLinearFilterKernel / @ref CLNonLinearFilter
- - New NEON FP16 kernels (Requires armv8.2 CPU)
+ - New Neon FP16 kernels (Requires armv8.2 CPU)
- @ref NEAccumulateWeightedFP16Kernel
- @ref NEBox3x3FP16Kernel
- @ref NENonMaximaSuppression3x3FP16Kernel
@@ -1331,7 +1331,7 @@ v17.02 Sources preview
- @ref CLDerivativeKernel / @ref CLChannelExtract
- @ref CLFastCornersKernel / @ref CLFastCorners
- @ref CLMeanStdDevKernel / @ref CLMeanStdDev
- - New NEON kernels / functions:
+ - New Neon kernels / functions:
- HOG / SVM: @ref NEHOGOrientationBinningKernel, @ref NEHOGBlockNormalizationKernel, @ref NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / @ref NEHOGDescriptor, @ref NEHOGDetector, @ref NEHOGGradient, @ref NEHOGMultiDetection
- @ref NENonLinearFilterKernel / @ref NENonLinearFilter
- Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
@@ -1524,11 +1524,11 @@ To see the build options available simply run ```scons -h```:
@b arch: The x86_32 and x86_64 targets can only be used with neon=0 and opencl=1.
@b os: Choose the operating system you are targeting: Linux, Android or bare metal.
-@note bare metal can only be used for NEON (not OpenCL), only static libraries get built and NEON's multi-threading support is disabled.
+@note bare metal can only be used for Neon (not OpenCL), only static libraries get built and Neon's multi-threading support is disabled.
@b build: you can either build directly on your device (native) or cross compile from your desktop machine (cross-compile). In both cases make sure the compiler is available in your path.
-@note If you want to natively compile for 32bit on a 64bit ARM device running a 64bit OS then you will have to use cross-compile too.
+@note If you want to natively compile for 32bit on a 64bit Arm device running a 64bit OS then you will have to use cross-compile too.
There is also an 'embed_only' option which will generate all the .embed files for the OpenCL kernels and / or OpenGLES compute shaders. This might be useful if using a different build system to compile the library.
@@ -1536,7 +1536,7 @@ In addittion the option 'compress_kernels' will compress the embedded OpenCL ker
@b Werror: If you are compiling using the same toolchains as the ones used in this guide then there shouldn't be any warning and therefore you should be able to keep Werror=1. If with a different compiler version the library fails to build because of warnings interpreted as errors then, if you are sure the warnings are not important, you might want to try to build with Werror=0 (But please do report the issue on Github).
-@b opencl / @b neon / @b gles_compute: Choose which SIMD technology you want to target. (NEON for ARM Cortex-A CPUs or OpenCL / GLES_COMPUTE for ARM Mali GPUs)
+@b opencl / @b neon / @b gles_compute: Choose which SIMD technology you want to target. (Neon for Arm Cortex-A CPUs or OpenCL / GLES_COMPUTE for Arm Mali GPUs)
@b embed_kernels: For OpenCL / GLES_COMPUTE only: set embed_kernels=1 if you want the OpenCL / GLES_COMPUTE kernels to be built in the library's binaries instead of being read from separate ".cl" / ".cs" files. If embed_kernels is set to 0 then the application can set the path to the folder containing the OpenCL / GLES_COMPUTE kernel files by calling CLKernelLibrary::init() / GCKernelLibrary::init(). By default the path is set to "./cl_kernels" / "./cs_shaders".
@@ -1564,11 +1564,11 @@ Example:
@b mali: Enable the collection of Mali hardware counters to measure execution time in benchmark tests. (Your device needs to have a Mali driver that supports it)
-@b openmp Build in the OpenMP scheduler for NEON.
+@b openmp Build in the OpenMP scheduler for Neon.
@note Only works when building with g++ not clang++
-@b cppthreads Build in the C++11 scheduler for NEON.
+@b cppthreads Build in the C++11 scheduler for Neon.
@sa Scheduler::set
@@ -1582,12 +1582,12 @@ In order to use this option, the external tests directory must have the followin
│   ├── CL
│   ├── datasets
│   ├── fixtures
- │   └── NEON
+ │   └── Neon
└── validation
   ├── CL
    ├── datasets
    ├── fixtures
-     └── NEON
+     └── Neon
Then, build the library with `external_tests_dir=<PATH_TO_EXTERNAL_TESTS_DIR>`.
@@ -1600,7 +1600,7 @@ For Linux, the library was successfully built and tested using the following Lin
- gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf
- gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu
-To cross-compile the library in debug mode, with NEON only support, for Linux 32bit:
+To cross-compile the library in debug mode, with Neon only support, for Linux 32bit:
scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=linux arch=armv7a
@@ -1612,12 +1612,12 @@ To cross-compile the library in asserts mode, with GLES_COMPUTE only support, fo
scons Werror=1 -j8 debug=0 asserts=1 neon=0 opencl=0 gles_compute=1 embed_kernels=1 os=linux arch=arm64-v8a
-You can also compile the library natively on an ARM device by using <b>build=native</b>:
+You can also compile the library natively on an Arm device by using <b>build=native</b>:
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=arm64-v8a build=native
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=armv7a build=native
-@note g++ for ARM is mono-arch, therefore if you want to compile for Linux 32bit on a Linux 64bit platform you will have to use a cross compiler.
+@note g++ for Arm is mono-arch, therefore if you want to compile for Linux 32bit on a Linux 64bit platform you will have to use a cross compiler.
For example on a 64bit Debian based system you would have to install <b>g++-arm-linux-gnueabihf</b>
@@ -1637,11 +1637,11 @@ The examples get automatically built by scons as part of the build process of th
@note The following command lines assume the arm_compute libraries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built libraries with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.
-To cross compile a NEON example for Linux 32bit:
+To cross compile a Neon example for Linux 32bit:
arm-linux-gnueabihf-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute -larm_compute_core -o neon_convolution
-To cross compile a NEON example for Linux 64bit:
+To cross compile a Neon example for Linux 64bit:
aarch64-linux-gnu-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -L. -larm_compute -larm_compute_core -o neon_convolution
@@ -1679,21 +1679,21 @@ i.e. to cross compile the "graph_lenet" example for Linux 64bit:
@note If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, arm_compute, arm_compute_core
-To compile natively (i.e directly on an ARM device) for NEON for Linux 32bit:
+To compile natively (i.e directly on an Arm device) for Neon for Linux 32bit:
g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -larm_compute -larm_compute_core -o neon_convolution
-To compile natively (i.e directly on an ARM device) for NEON for Linux 64bit:
+To compile natively (i.e directly on an Arm device) for Neon for Linux 64bit:
g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o neon_convolution
(notice the only difference with the 32 bit command is that we don't need the -mfpu option)
-To compile natively (i.e directly on an ARM device) for OpenCL for Linux 32bit or Linux 64bit:
+To compile natively (i.e directly on an Arm device) for OpenCL for Linux 32bit or Linux 64bit:
g++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o cl_convolution -DARM_COMPUTE_CL
-To compile natively (i.e directly on an ARM device) for GLES for Linux 32bit or Linux 64bit:
+To compile natively (i.e directly on an Arm device) for GLES for Linux 32bit or Linux 64bit:
g++ examples/gc_absdiff.cpp utils/Utils.cpp -I. -Iinclude/ -L. -larm_compute -larm_compute_core -std=c++14 -DARM_COMPUTE_GC -Iinclude/linux/ -o gc_absdiff
@@ -1768,7 +1768,7 @@ Here is a guide to <a href="https://developer.android.com/ndk/guides/standalone_
@subsubsection S3_3_1_library How to build the library ?
-To cross-compile the library in debug mode, with NEON only support, for Android 32bit:
+To cross-compile the library in debug mode, with Neon only support, for Android 32bit:
CXX=clang++ CC=clang scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=android arch=armv7a
@@ -1788,7 +1788,7 @@ The examples get automatically built by scons as part of the build process of th
Once you've got your Android standalone toolchain built and added to your path you can do the following:
-To cross compile a NEON example:
+To cross compile a Neon example:
#32 bit:
arm-linux-androideabi-clang++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o neon_convolution_arm -static-libstdc++ -pie
@@ -1850,7 +1850,7 @@ And finally to run the example:
For example:
adb shell /data/local/tmp/graph_lenet --help
-In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on NEON, 1 to run on OpenCL if available, 2 to run on OpenCL using the CLTuner), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run.
+In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on Neon, 1 to run on OpenCL if available, 2 to run on OpenCL using the CLTuner), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run.
@subsection S3_4_macos Building for macOS
@@ -1874,7 +1874,7 @@ Download linaro for <a href="https://releases.linaro.org/components/toolchain/bi
@subsubsection S3_5_1_library How to build the library ?
-To cross-compile the library with NEON support for baremetal arm64-v8a:
+To cross-compile the library with Neon support for baremetal arm64-v8a:
scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=bare_metal arch=arm64-v8a build=cross_compile cppthreads=0 openmp=0 standalone=1