aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVidhya Sudhan Loganathan <vidhyasudhan.loganathan@arm.com>2019-04-29 11:44:11 +0100
committerVidhyaSudhan Loganathan <vidhyasudhan.loganathan@arm.com>2019-05-03 11:01:01 +0000
commitdc5d34319a673f6cbcd346a0c7046fb7fd0106ec (patch)
tree421e8b7c944c958f4fbcdf4273594de1ba1afd07
parenta788c2f7b143731704cdbc6a7f0016e4f38896d9 (diff)
downloadComputeLibrary-dc5d34319a673f6cbcd346a0c7046fb7fd0106ec.tar.gz
COMPMID-2144 : Update documentation for the new OpenCL tuner
Change-Id: I9af45c25b80339daba52a45ca394bb9dbf80bade Signed-off-by: Vidhya Sudhan Loganathan <vidhyasudhan.loganathan@arm.com> Reviewed-on: https://review.mlplatform.org/c/1034 Tested-by: Arm Jenkins <bsgcomp@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
-rw-r--r--docs/00_introduction.dox2
-rw-r--r--docs/01_library.dox2
2 files changed, 2 insertions, 2 deletions
diff --git a/docs/00_introduction.dox b/docs/00_introduction.dox
index d0f599f505..d0e9183386 100644
--- a/docs/00_introduction.dox
+++ b/docs/00_introduction.dox
@@ -1278,7 +1278,7 @@ SVM allocations are supported for all the underlying allocations in Compute Libr
The OpenCL tuner, a.k.a. CLTuner, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS).
The optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file.
-The OpenCL tuner performs a brute-force approach: it runs the same OpenCL kernel for a range of local workgroup sizes and keep the local workgroup size of the fastest run to use in subsequent calls to the kernel.
+The OpenCL tuner runs the same OpenCL kernel for a range of local workgroup sizes and keeps the local workgroup size of the fastest run to use in subsequent calls to the kernel. It supports three modes of tuning with different trade-offs between the time taken to tune and the kernel execution time achieved using the best LWS found. In the Exhaustive mode, it searches all the supported values of LWS. This mode takes the longest time to tune and is the most likely to find the optimal LWS. Normal mode searches a subset of LWS values to yield a good approximation of the optimal LWS. It takes less time to tune than Exhaustive mode. Rapid mode takes the shortest time to tune and finds an LWS value that is at least as good or better than the default LWS value. The mode affects only the search for the optimal LWS and has no effect when the LWS value is imported from a file.
In order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase.
If you wish to know more about LWS and the important role on improving the GPU cache utilization, we suggest having a look at the presentation "Even Faster CNNs: Exploring the New Class of Winograd Algorithms available at the following link:
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 67adf9cc4d..359ca4794a 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -461,7 +461,7 @@ However, there is no universal rule regarding which LWS is best for a given kern
When the @ref CLTuner is enabled ( Target = 2 for the graph examples), the first time an OpenCL kernel is executed the Compute Library will try to run it with a variety of LWS values and will remember which one performed best for subsequent runs. At the end of the run the @ref graph::Graph will try to save these tuning parameters to a file.
-However this process takes quite a lot of time, which is why it cannot be enabled all the time.
+However this process takes quite a lot of time, which is why it cannot be enabled all the time. @ref CLTuner supports three modes of tuning with different trade-offs between the time taken to tune and the kernel execution time achieved using the best LWS found. In the Exhaustive mode, it searches all the supported values of LWS. This mode takes the longest time to tune and is the most likely to find the optimal LWS. Normal mode searches a subset of LWS values to yield a good approximation of the optimal LWS. It takes less time to tune than Exhaustive mode. Rapid mode takes the shortest time to tune and finds an LWS value that is at least as good or better than the default LWS value. The mode affects only the search for the optimal LWS and has no effect when the LWS value is imported from a file.
But, when the @ref CLTuner is disabled ( Target = 1 for the graph examples), the @ref graph::Graph will try to reload the file containing the tuning parameters, then for each executed kernel the Compute Library will use the fine tuned LWS if it was present in the file or use a default LWS value if it's not.