aboutsummaryrefslogtreecommitdiff
path: root/docs/00_introduction.dox
diff options
context:
space:
mode:
authorVidhya Sudhan Loganathan <vidhyasudhan.loganathan@arm.com>2019-04-29 11:44:11 +0100
committerVidhyaSudhan Loganathan <vidhyasudhan.loganathan@arm.com>2019-05-03 11:01:01 +0000
commitdc5d34319a673f6cbcd346a0c7046fb7fd0106ec (patch)
tree421e8b7c944c958f4fbcdf4273594de1ba1afd07 /docs/00_introduction.dox
parenta788c2f7b143731704cdbc6a7f0016e4f38896d9 (diff)
downloadComputeLibrary-dc5d34319a673f6cbcd346a0c7046fb7fd0106ec.tar.gz
COMPMID-2144 : Update documentation for the new OpenCL tuner
Change-Id: I9af45c25b80339daba52a45ca394bb9dbf80bade Signed-off-by: Vidhya Sudhan Loganathan <vidhyasudhan.loganathan@arm.com> Reviewed-on: https://review.mlplatform.org/c/1034 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Diffstat (limited to 'docs/00_introduction.dox')
-rw-r--r--docs/00_introduction.dox2
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/00_introduction.dox b/docs/00_introduction.dox
index d0f599f505..d0e9183386 100644
--- a/docs/00_introduction.dox
+++ b/docs/00_introduction.dox
@@ -1278,7 +1278,7 @@ SVM allocations are supported for all the underlying allocations in Compute Libr
The OpenCL tuner, a.k.a. CLTuner, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS).
The optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file.
-The OpenCL tuner performs a brute-force approach: it runs the same OpenCL kernel for a range of local workgroup sizes and keep the local workgroup size of the fastest run to use in subsequent calls to the kernel.
+The OpenCL tuner runs the same OpenCL kernel for a range of local workgroup sizes and keeps the local workgroup size of the fastest run to use in subsequent calls to the kernel. It supports three modes of tuning with different trade-offs between the time taken to tune and the kernel execution time achieved using the best LWS found. In the Exhaustive mode, it searches all the supported values of LWS. This mode takes the longest time to tune and is the most likely to find the optimal LWS. Normal mode searches a subset of LWS values to yield a good approximation of the optimal LWS. It takes less time to tune than Exhaustive mode. Rapid mode takes the shortest time to tune and finds an LWS value that is at least as good or better than the default LWS value. The mode affects only the search for the optimal LWS and has no effect when the LWS value is imported from a file.
In order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase.
If you wish to know more about LWS and the important role on improving the GPU cache utilization, we suggest having a look at the presentation "Even Faster CNNs: Exploring the New Class of Winograd Algorithms available at the following link: