diff options
author | Michael Tyler <michael.tyler@arm.com> | 2023-07-07 12:01:32 +0100 |
---|---|---|
committer | michael.tyler <michael.tyler@arm.com> | 2023-07-13 12:21:29 +0000 |
commit | 4c30de056afe8680b42723b26a2241811715b989 (patch) | |
tree | 4f522a816a5ea1b58b51226eb685c786096f30e3 /src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp | |
parent | c8e1617807ef1985a39d8f8f5f69c113b758494d (diff) | |
download | ComputeLibrary-4c30de056afe8680b42723b26a2241811715b989.tar.gz |
Enable premultiplication for depthwise convolution
with fp16 and quantized types
Resolves: COMPMID-6337
Change-Id: I81542e51c9c0329f202ac8452f173b138e51a0f6
Signed-off-by: Michael Tyler <michael.tyler@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9883
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Diffstat (limited to 'src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp')
-rw-r--r-- | src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp | 16 |
1 files changed, 15 insertions, 1 deletions
diff --git a/src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp b/src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp index ad4c821cfb..8a49c775d3 100644 --- a/src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp +++ b/src/core/NEON/kernels/arm_conv/depthwise/premultiply.cpp @@ -45,7 +45,9 @@ void do_premultiply_float_6(const float *in_ptr, { const float *ip = ip2; float *op = op2; - for(unsigned int c = 0; c < input_channels; c += BLOCK_SIZE) + + unsigned int num_blocks = input_channels / BLOCK_SIZE; + for(unsigned int c = 0; c < num_blocks; c++) { float vals[BLOCK_SIZE]; for(unsigned int v = 0; v < BLOCK_SIZE; v++) @@ -63,6 +65,18 @@ void do_premultiply_float_6(const float *in_ptr, op += CHANNEL_MULTIPLIER; } } + + unsigned int rem = input_channels - num_blocks * BLOCK_SIZE; + for(unsigned int c = 0; c < rem; c++) + { + float val = ip[c]; + for(unsigned int r = 0; r < CHANNEL_MULTIPLIER; r++) + { + op[r] = val; + } + op += CHANNEL_MULTIPLIER; + } + ip2 += ld_col; op2 += out_ld_col; } |