COMPMID-988: Update documentation regarding example arguments and CLTuner

Change-Id: Iab30694e8c20156d42bdb06ce42d3e641b328df4 Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/122996 Reviewed-by: Michalis Spyrou <michalis.spyrou@arm.com> Tested-by: Anthony Barbier <anthony.barbier@arm.com>
author: Anthony Barbier <anthony.barbier@arm.com> 2018-03-02 11:49:33 +0000
committer: Anthony Barbier <anthony.barbier@arm.com> 2018-11-02 16:48:33 +0000
commit: 3762e74da2eac34476d204cec360d1a0b6729307 (patch)
tree: 4c807068fe2995802479def941455345c56e8ef9 /docs/01_library.dox
parent: 317fa7f2a770c179692c20e10ebb9fe2dcb6c624 (diff)
download: ComputeLibrary-3762e74da2eac34476d204cec360d1a0b6729307.tar.gz
1 files changed, 17 insertions, 0 deletions
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 20d057c2c9..e3f673df82 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -366,5 +366,22 @@ mm->finalize();                // Finalize memory manager (Object lifetime check
 conv1.run();
 conv2.run();
 @endcode
+
+@section S4_8_opencl_tuner OpenCL Tuner
+
+OpenCL kernels when dispatched to the GPU take two arguments:
+- The Global Workgroup Size (GWS): That's the number of times to run an OpenCL kernel to process all the elements we want to process.
+- The Local Workgroup Size (LWS): That's the number of elements we want to run in parallel on a GPU core at a given point in time.
+
+The LWS can be required by an algorithm (For example if it contains memory barriers or uses local memory) but it can also be used for performance reasons to tweak the performance of a kernel: the execution time of the overall kernel might vary significantly depending on how the GWS is broken down.
+
+However, there is no universal rule regarding which LWS is best for a given kernel, so instead we created the @ref CLTuner.
+
+When the @ref CLTuner is enabled ( Target = 2 for the graph examples), the first time an OpenCL kernel is executed the Compute Library will try to run it with a variety of LWS values and will remember which one performed best for subsequent runs. At the end of the run the @ref graph::Graph will try to save these tuning parameters to a file.
+
+However this process takes quite a lot of time, which is why it cannot be enabled all the time.
+
+But, when the @ref CLTuner is disabled ( Target = 1 for the graph examples), the @ref graph::Graph will try to reload the file containing the tuning parameters, then for each executed kernel the Compute Library will use the fine tuned LWS if it was present in the file or use a default LWS value if it's not.
+
 */
 } // namespace arm_compute
author	Anthony Barbier <anthony.barbier@arm.com>	2018-03-02 11:49:33 +0000
committer	Anthony Barbier <anthony.barbier@arm.com>	2018-11-02 16:48:33 +0000
commit	3762e74da2eac34476d204cec360d1a0b6729307 (patch)
tree	4c807068fe2995802479def941455345c56e8ef9 /docs/01_library.dox
parent	317fa7f2a770c179692c20e10ebb9fe2dcb6c624 (diff)
download	ComputeLibrary-3762e74da2eac34476d204cec360d1a0b6729307.tar.gz