aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorViet-Hoa Do <viet-hoa.do@arm.com>2022-11-16 16:11:45 +0000
committerViet-Hoa Do <viet-hoa.do@arm.com>2022-11-17 10:16:36 +0000
commit38ac410b14678c90cf1a2e8922ab3572b42d1c77 (patch)
tree2c85d1e461017e6fbc465649ce686d754aa25421
parent293ab603ab3c091167577524ba78cd52fef0a7f6 (diff)
downloadComputeLibrary-38ac410b14678c90cf1a2e8922ab3572b42d1c77.tar.gz
Fix documentation about BF16 acceleration
* Fix the heading and the code block. Resolves: COMPMID-5546 Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com> Change-Id: I60162b0e0aaf2a71a70e517aaeb8c75dd82d8dd9 Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8652 Benchmark: Arm Jenkins <bsgcomp@arm.com> Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
-rw-r--r--docs/user_guide/library.dox18
1 files changed, 10 insertions, 8 deletions
diff --git a/docs/user_guide/library.dox b/docs/user_guide/library.dox
index b95e0bace3..0501322254 100644
--- a/docs/user_guide/library.dox
+++ b/docs/user_guide/library.dox
@@ -54,14 +54,16 @@ When the fast-math flag is enabled, both Arm® Neon™ and CL convolution layers
- no-fast-math: No Winograd support
- fast-math: Supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7,5x5,7x7
-@section BF16 acceleration
-
-- Required toolchain: android-ndk-r23-beta5 or later
-- To build for BF16: "neon" flag should be set "=1" and "arch" has to be "=armv8.6-a", "=armv8.6-a-sve", or "=armv8.6-a-sve2" using following command:
-- scons arch=armv8.6-a-sve neon=1 opencl=0 extra_cxx_flags="-fPIC" benchmark_tests=0 validation_tests=0 validation_examples=1 os=android Werror=0 toolchain_prefix=aarch64-linux-android29
-- To enable BF16 acceleration when running FP32 "fast-math" has to be enabled and that works only for Neon convolution layer using cpu gemm.
- In this scenario on CPU: the CpuGemmConv2d kernel performs the conversion from FP32, type of input tensor, to BF16 at block level to exploit the arithmetic capabilities dedicated to BF16. Then transforms back to FP32, the output
- tensor type.
+@section bf16_acceleration BF16 acceleration
+
+Required toolchain: android-ndk-r23-beta5 or later.
+
+To build for BF16: "neon" flag should be set "=1" and "arch" has to be "=armv8.6-a", "=armv8.6-a-sve", or "=armv8.6-a-sve2". For example:
+
+ scons arch=armv8.6-a-sve neon=1 opencl=0 extra_cxx_flags="-fPIC" benchmark_tests=0 validation_tests=0 validation_examples=1 os=android Werror=0 toolchain_prefix=aarch64-linux-android29
+
+To enable BF16 acceleration when running FP32 "fast-math" has to be enabled and that works only for Neon convolution layer using cpu gemm.
+In this scenario on CPU: the CpuGemmConv2d kernel performs the conversion from FP32, type of input tensor, to BF16 at block level to exploit the arithmetic capabilities dedicated to BF16. Then transforms back to FP32, the output tensor type.
@section architecture_thread_safety Thread-safety