COMPMID-3331 Remove y load padding from CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLGEMMMatrixMultiplyNativeKernel

Resolves: COMPMID-3333, COMPMID-3334 * Implement an "overlap load, but don't overlap store" strategy: - Change STORE_BLOCK_BOUNDARY_AWARE so that the partial block in y dimension is placed at the beginning instead of at the end. - Implement 3 auxiliary functions to calculate the lhs, bias and dst addresses, taking into account the potential partial block in y dimension. * Remove y load padding from Lhs and Bias tensors in CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLGEMMMatrixMultiplyNativeKernel * Modify config tests to assert zero-padding in new dimensions Change-Id: I8f8585c7c0f543d720c2c91b885417c7dad35af4 Signed-off-by: SiCong Li <sicong.li@arm.com> Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3576 Tested-by: Arm Jenkins <bsgcomp@arm.com> Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com> Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com> Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
author: SiCong Li <sicong.li@arm.com> 2020-07-15 12:09:58 +0100
committer: SiCong Li <sicong.li@arm.com> 2020-07-21 09:41:49 +0000
commit: 406a13f0b414d5c0375a46beec8dd9363a1cca56 (patch)
tree: 5f6fb7cfa1c7683d44de32840ffb541f450c8961 /src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
parent: f6f7876e9ee8b58a8a6b335b032d554412fa3983 (diff)
download: ComputeLibrary-406a13f0b414d5c0375a46beec8dd9363a1cca56.tar.gz
1 files changed, 5 insertions, 10 deletions
diff --git a/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp b/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
index 7d76ffd86c..27520c6072 100644
--- a/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
+++ b/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
@@ -155,19 +155,14 @@ std::pair<Status, Window> validate_and_configure_window(ITensorInfo *input0, ITe
     num_elems_processed_per_iteration_x = rhs_info.n0;
     num_elems_processed_per_iteration_y = lhs_info.m0;
 
-    // Note: bottom paddings are calculated manually as the output can be reinterpreted as 3D tensor
-    // The only way to set properly the paddings, it is to set those explicitly through the AccessWindowStatic
-    const unsigned int m          = reinterpret_output_as_3d ? gemm_info.m : output->dimension(1);
-    const unsigned int bottom_pad = (num_elems_processed_per_iteration_y - (m % num_elems_processed_per_iteration_y)) % num_elems_processed_per_iteration_y;
-
     win     = calculate_max_window(tmp_info, Steps(num_elems_processed_per_iteration_x, num_elems_processed_per_iteration_y));
     win_out = calculate_max_window(*output, Steps(num_elems_processed_per_iteration_x, num_elems_processed_per_iteration_y));
 
     AccessWindowStatic input0_access(input0, 0, 0,
                                      input0->dimension(0),
-                                     input0->dimension(1) + bottom_pad);
+                                     input0->dimension(1));
     AccessWindowStatic input1_access(input1, 0, 0,
-                                     input1->dimension(0),
+                                     ceil_to_multiple(input1->dimension(0), num_elems_processed_per_iteration_x),
                                      input1->dimension(1));
     AccessWindowStatic output_access(output, 0, 0,
                                      output->dimension(0),
@@ -175,11 +170,11 @@ std::pair<Status, Window> validate_and_configure_window(ITensorInfo *input0, ITe
 
     if(input2 != nullptr)
     {
-        const int          bias_processed_per_iteration_x = num_elems_processed_per_iteration_x;
-        const int          bias_processed_per_iteration_y = gemm_info.broadcast_bias ? 1 : num_elems_processed_per_iteration_y;
+        const int bias_processed_per_iteration_x = num_elems_processed_per_iteration_x;
+
         AccessWindowStatic input2_access(input2, 0, 0,
                                          ceil_to_multiple(input2->dimension(0), bias_processed_per_iteration_x),
-                                         ceil_to_multiple(input2->dimension(1), bias_processed_per_iteration_y));
+                                         input2->dimension(1));
 
         window_changed = update_window_and_padding(win, input0_access, input1_access, input2_access) || // window used by the execute_window_loop
                          update_window_and_padding(win_out, output_access);                             // window used to update the padding requirements of output tensor
author	SiCong Li <sicong.li@arm.com>	2020-07-15 12:09:58 +0100
committer	SiCong Li <sicong.li@arm.com>	2020-07-21 09:41:49 +0000
commit	406a13f0b414d5c0375a46beec8dd9363a1cca56 (patch)
tree	5f6fb7cfa1c7683d44de32840ffb541f450c8961 /src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
parent	f6f7876e9ee8b58a8a6b335b032d554412fa3983 (diff)
download	ComputeLibrary-406a13f0b414d5c0375a46beec8dd9363a1cca56.tar.gz