From 8abbabd6ad946441c8ef1a03896fa98f7801af1f Mon Sep 17 00:00:00 2001 From: SiCong Li Date: Fri, 3 Apr 2020 12:39:41 +0100 Subject: COMPMID-3233 Extend gemm tuner in the doxygen documentation * Add location to gemm tuner scripts. * Expanded on the description of the gemm shape and gemm config files. * Reformat the document. Change-Id: Idd23d98b02377b0619cb9f616aa6099321f492bc Signed-off-by: SiCong Li Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/2988 Tested-by: Arm Jenkins Comments-Addressed: Arm Jenkins Reviewed-by: Georgios Pinitas --- examples/gemm_tuner/README.md | 286 +++++++++++++++++++------ examples/gemm_tuner/benchmark_gemm_examples.sh | 14 +- 2 files changed, 228 insertions(+), 72 deletions(-) (limited to 'examples') diff --git a/examples/gemm_tuner/README.md b/examples/gemm_tuner/README.md index 3238a9dbda..a4cde10403 100644 --- a/examples/gemm_tuner/README.md +++ b/examples/gemm_tuner/README.md @@ -2,88 +2,232 @@ ## Introduction -This is a set of 2 script tools for tuning the performance of OpenCL GEMM -kernels (limited to Convolution layer functions only for now). Specifically, we -tune 3 GEMM kernels, each has a different implementation strategy of the GEMM -operation: native, reshaped, reshaped only rhs. The details of these strategies -can be found in the documentations of the corresponding kernels: -CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel and -CLGEMMMatrixMultiplyReshapedOnlyRHSKernel. +This is a set of 2 script tools for tuning the performance of OpenCL GEMM kernels (limited to Convolution layer +functions only for now). Specifically, we tune 3 GEMM kernels, each has a different implementation **strategy** of the +GEMM operation: **native**, **reshaped**, **reshaped only rhs**. The details of these strategies can be found in the +documentations of the corresponding kernels: **CLGEMMMatrixMultiplyNativeKernel**, +**CLGEMMMatrixMultiplyReshapedKernel** and **CLGEMMMatrixMultiplyReshapedOnlyRHSKernel**. -The outputs of the tuning process are 1 optimal configuration (called GEMM -Configuration or GEMMConfig) for each of the 3 strategies. +The outputs of the tuning process are 1 optimal configuration (called **GEMM Configuration** or **GEMMConfig**, for +more details see Approach section) for each of the 3 strategies. -## Approach +## Location +The 2 scripts **benchmark_gemm_examples.sh** and **GemmTuner.py** can be found under $ACL_ROOT/examples/gemm_tuner. -This section gives a brief description and rationale of the approach adopted by -the current version of GEMM Tuner. +## Pre-requisite +* A target device to be tuned, plus the following on the device: + * Android or Linux OS + * Bash shell + * Built ACL with benchmark examples binaries + * benchmark_gemm_examples.sh script + * gemm shape file -As explained in the Introduction section, the outputs of the tuner are 1 optimal -GEMMConfig for each strategy. This is because we can only integrate 1 GEMMConfig -for each strategy in ACL at compile time. In theory, however, the optimal -GEMMConfig also depends on different parameters of GEMM (called GEMM Parameter -or GEMMParam, e.g.: the shape of the operation); thus ideally, for each -strategy, the optimal configurations should be a mapping from GEMMParam to -GEMMConfig instead of a single GEMMConfig. + A csv file containing the **GEMMParam search list**. This is the list of GEMMParams/gemm shapes that we're + interested in (For more details see Approach section). The default list is prepared by ACL developers in advance + and can be provided on request. -To address this issue, we ensure the one single optimal GEMMConfig can -generalise well to all potential GEMMParams (or at least the ones that we care -about). The approach we adopt involves a preliminary stage where a collection of -common GEMMParams (GEMM shapes from popular networks) are compiled. Then, to -reduce the final tuning time, rather contradictorily, we spend a lot of time -searching for near-optimal GEMMConfigs for each GEMMParam first, and then -discard redundant GEMMParams which share similar optimal GEMMConfigs with -others. The resultant list of GEMMParams is called a __GEMMParam archetype -list__, as in these GEMMParams are typical enough to capture the space of -GEMMParams that we care about. - -During this preliminary stage we also produce a list of good GEMMConfigs that -can be used to search for the optimal one in the actual tuning stage. This, -again, is to reduce the tuning time, and the resultant list is called a -__GEMMConfig search list__. + The format is described as: -The GEMMParam archetype list and the GEMMConfig search list are investigated and -prepared by the developers; the users of GEMM tuner need not worry about -producing them, but they need to obtain them prior to running the tuner. + A headerless csv file with fields separated by commas and commas only (there cannot be whitespaces around each + field). -Once these two lists (2 for each strategy, so 6 in total) are obtained, they can -be fed to the tuner, to produce the optimal GEMMConfig(s). + Note also comments and extraneous empty lines are not permitted. -## Pre-requisite -* A target device (Android phones, Linux boards, e.t.c.), on which to tune the - GEMM kernels, plus these on the device: - * (Preferably) Bash shell - * Built ACL with benchmark examples - * GEMMParam archetype list - * GEMMConfig search list + A gemm shape is a list of 4 positive integers \ describing the shapes of the two matrices (LHS and + RHS) with: + + M - Number of lhs matrix rows + N - Number of rhs matrix columns + K - Number of lhs matrix columns/rhs matrix rows + B - Batch size + + An example gemm shape file looks like: + ``` + 100,100,30,1 + 100,100,30,3 + ... + ``` + * gemm config file + A csv file containing the **GEMMConfig search list**. This is the list of candidate GEMMConfigs among which we + search for the optimal one. **Note that we have a different list for each strategy.** + The default lists are prepared by ACL developers in advance and can be provided on request. + + The format of the file for each strategy is the same: + + A headerless csv file with fields separated by commas and commas only (there cannot be whitespaces around each + field). Note also comments and extraneous empty lines are not permitted. + + However the fields of GEMMConfig differ for each strategy: + * Strategy **native**: + A gemm config is a list of 3 positive integers \, with: + + m0 - Number of rows processed by the matrix multiplication + n0 - Number of columns processed by the matrix multiplication + k0 - Number of partial accumulations performed by the matrix multiplication + + Only the following configurations of M0, N0 and K0 are currently supported: + + M0 = 1, 2, 3, 4, 5, 6, 7, 8 + N0 = 2, 3, 4, 8, 16 + K0 = 2, 3, 4, 8, 16 + + An example gemm config file looks like: + ``` + 1,4,4 + 2,3,8 + ... + ``` + * Strategy **reshaped_rhs_only**: + + A gemm config is a list of 4 positive integers \ and 2 boolean values interleave_rhs and + transpose_rhs, with: + + m0 - Number of rows processed by the matrix multiplication + n0 - Number of columns processed by the matrix multiplication + k0 - Number of partial accumulations performed by the matrix multiplication + h0 - Number of horizontal blocks of size (k0xn0) stored on the same output row + interleave_rhs - Interleave rhs matrix (1) / Do not interleave rhs matrix (0) + transpose_rhs - Transpose rhs matrix (1) / Do not transpose rhs matrix (0) + + Only the following configurations of M0, N0 and K0 are currently supported: + + M0 = 1, 2, 3, 4, 5, 6, 7, 8 + N0 = 2, 3, 4, 8, 16 + K0 = 2, 3, 4, 8, 16 + H0 >= 1 + + An example gemm config file looks like: + ``` + 4,4,4,1,1,1 + 4,4,4,3,1,0 + ... + ``` + * Strategy **reshaped**: + + A gemm config is a list of 5 positive integers \ and 3 boolean values interleave_lhs, + interleave_rhs and transpose_rhs, with: + + m0 - Number of rows processed by the matrix multiplication + n0 - Number of columns processed by the matrix multiplication + k0 - Number of partial accumulations performed by the matrix multiplication + v0 - Number of vertical blocks of size (m0xk0) stored on the same output row + h0 - Number of horizontal blocks of size (k0xn0) stored on the same output row + interleave_lhs - Interleave lhs matrix (1) / Do not interleave lhs matrix (0) + interleave_rhs - Interleave rhs matrix (1) / Do not interleave rhs matrix (0) + transpose_rhs - Transpose rhs matrix but not lhs matrix (1) / Do not transpose rhs matrix but do transpose + lhs matrix (0) + + * If rhs matrix is transposed only the following configurations are currently supported: + + M0 = 2, 3, 4, 5, 6, 7, 8 + N0 = 2, 3, 4, 8, 16 + K0 = 2, 3, 4, 8, 16 + V0 >= 1 + H0 >= 1 + + * If lhs matrix is transposed only the following configurations are currently supported: + + M0 = 2, 3, 4, 8 + N0 = 2, 3, 4, 8, 16 + K0 = 2, 3, 4, 8, 16 + V0 >= 1 + H0 >= 1 + + An example gemm config file looks like: + ``` + 4,4,4,1,3,1,1,1 + 4,4,4,3,3,1,1,0 + ... + ``` * A host machine, plus these on the machine: * python >= 3.6 + * GemmTuner.py script ## Usage - The tuning stage consists of 2 steps: -1. Run benchmarks: Run the runner shell script (benchmark_gemm_examples.sh) on -your target device. Note that all the built benchmark examples have to be -present on your target device prior to running. The script will run the selected -strategy, over all configs defined in GEMMConfig search list, on all GEMMParams -inside the GEMMParam archetype list, and then save the benchmark results to json -files in an output directory. -``` -[$SHELL] ./benchmark_gemm_examples.sh -s \ -e \ --g \ -c \ [-o \] -``` -2. Run analyser: Run the python script (GemmTuner.py) on your host machine. -You'll need to transfer all the benchmark result json files generated from the -previous step to your host machine beforehand. Note that this requires python >= -3.6. The script will output the best configuration, along with some analysis -statistics for each strategy, and optionally save the parsed benchmark results -into csv files (one for each strategy) for further analysis. -An optional tolerance in milliseconds in OpenCl timer is provided to determine -how far apart in performance two GEMMConfigs have to be, to be considered -different. A default value of 0.01 ms is used, and it's recommended this value -should be < 0.1 ms. -``` -python GemmTuner.py -b \ [-t \] -[-o \] -``` +1. Run benchmarks: + + Run the shell script (**benchmark_gemm_examples.sh**) on your **target device**. Note that all the built benchmark + examples have to be present on your target device prior to running. The benchmark results will be saved to json + files in an output directory. + ``` + Usage: benchmark_gemm_examples.sh [-h] -s \ -e \ -g \ + -c \ [-o \] + + Options: + -h + Print help messages. If a strategy is specified with -s \, then only display messages relevant + to that strategy. Otherwise if no strategy is specified, display messages for all available strategies. + + -s \ + Strategy option. + Options: native reshaped_rhs_only reshaped. + + -e \ + Path to directory that holds all example binaries + + -g \ + Path to gemm shape csv file + + -c \ + Path to gemm config csv file + + -o \ + Path to output directory that holds output json files + Default: out + ``` +2. Run analyser: + + Run the python script (**GemmTuner.py**) on your **host machine**. + You'll need to transfer all the benchmark result json files generated from the previous step to your host machine + beforehand. The script will output the best configuration, along with some analysis statistics for each strategy, and + optionally save the parsed benchmark results into csv files (one for each strategy) for further analysis. + ``` + Usage: GemmTuner.py [-h] -b PATH [-o PATH] [-t TOLERANCE] [-D] + + CL GEMM Tuner + optional arguments: + -h, --help show this help message and exit + -b PATH, --benchmark_results PATH + Path to benchmark result directory, where benchmark + result json files have a file extension of + 'gemmtuner_benchmark' + -o PATH, --output_dir PATH + Path to directory that holds output csv files. One per + strategy + -t TOLERANCE, --tolerance TOLERANCE + For testing if two GEMMConfigs are equivalent in terms + of performance. The tolerance is OpenCL timer in + milliseconds. Recommended value: <= 0.1 ms + -D, --debug Enable script debugging output + + ``` + +## Approach + +This section gives a brief description and rationale of the approach adopted by the current version of GEMM Tuner. + +As explained in the Introduction section, the outputs of the tuner are 1 optimal GEMMConfig for each strategy. +This is because we can only integrate 1 GEMMConfig for each strategy in ACL at compile time. In theory, however, the +optimal GEMMConfig also depends on different parameters of GEMM (called GEMM Parameter or GEMMParam, e.g.: the shape +of the operation); thus ideally, for each strategy, the optimal configurations should be a mapping from GEMMParam to +GEMMConfig instead of a single GEMMConfig. + +To address this issue, we ensure the one single optimal GEMMConfig can generalise well to all potential GEMMParams +(or at least the ones that we care about). The approach we adopt involves a preliminary stage where a collection of +common GEMMParams (GEMM shapes from popular networks) are compiled. Then, to reduce the final tuning time, rather +contradictorily, we spend a lot of time searching for near-optimal GEMMConfigs for each GEMMParam first, and then +discard redundant GEMMParams which share similar optimal GEMMConfigs with others. The resultant list of GEMMParams is +called a __GEMMParam search list__, as in these GEMMParams are typical enough to capture the space of GEMMParams that +we care about. + +During this preliminary stage we also produce a list of good GEMMConfigs that can be used to search for the optimal one +in the actual tuning stage. This, again, is to reduce the tuning time, and the resultant list is called a +__GEMMConfig search list__. + +The GEMMParam search list and the GEMMConfig search list are investigated and prepared by the developers; the users of +GEMM tuner need not worry about producing them, but they need to obtain them prior to running the tuner. + +Once these two lists (2 for each strategy, so 6 in total) are obtained, they can be fed to the tuner, to produce the +optimal GEMMConfig(s). \ No newline at end of file diff --git a/examples/gemm_tuner/benchmark_gemm_examples.sh b/examples/gemm_tuner/benchmark_gemm_examples.sh index 95bb3677f3..d6f41cc22a 100755 --- a/examples/gemm_tuner/benchmark_gemm_examples.sh +++ b/examples/gemm_tuner/benchmark_gemm_examples.sh @@ -58,6 +58,9 @@ function help_gemm_shape_file() { Gemm shape file: Gemm shape file is a headerless csv file with fields separated by commas and commas only (there cannot be whitespaces around each field). + + Note also comments and extraneous empty lines are not permitted. + A gemm shape is a list of 4 positive integers describing the shapes of the two matrices (LHS and RHS) with: M - Number of lhs matrix rows @@ -87,7 +90,10 @@ function help_gemm_config_file_native() { Gemm config file (Strategy native): Gemm config file is a headerless csv file with fields separated by commas and commas only (there cannot be whitespaces around each field). - A gemm config is a list of 4 positive integers and 2 boolean values interleave_rhs and transpose_rhs, with: + + Note also comments and extraneous empty lines are not permitted. + + A gemm config is a list of 3 positive integers , with: m0 - Number of rows processed by the matrix multiplication n0 - Number of columns processed by the matrix multiplication k0 - Number of partial accumulations performed by the matrix multiplication @@ -119,6 +125,9 @@ function help_gemm_config_file_reshaped_rhs_only() { Gemm config file (Strategy reshaped_rhs_only): Gemm config file is a headerless csv file with fields separated by commas and commas only (there cannot be whitespaces around each field). + + Note also comments and extraneous empty lines are not permitted. + A gemm config is a list of 4 positive integers and 2 boolean values interleave_rhs and transpose_rhs, with: m0 - Number of rows processed by the matrix multiplication n0 - Number of columns processed by the matrix multiplication @@ -155,6 +164,9 @@ function help_gemm_config_file_reshaped() { Gemm config file (Strategy reshaped): Gemm config file is a headerless csv file with fields separated by commas and commas only (there cannot be whitespaces around each field). + + Note also comments and extraneous empty lines are not permitted. + A gemm config is a list of 5 positive integers and 3 boolean values interleave_lhs, interleave_rhs and transpose_rhs, with: m0 - Number of rows processed by the matrix multiplication n0 - Number of columns processed by the matrix multiplication -- cgit v1.2.1