Age | Commit message (Collapse) | Author |
|
-Adds NHWC support for FP16
Change-Id: I61addf8efecf511ac8cd5f8aa9afc3e09c476aaf
|
|
Changed random distribution to [1, 2] as values close to
zero generate mismatches.
Change-Id: I4a00fc4f445b123dea624dd8459efce945f06126
|
|
Change-Id: I376d29aa6ec1b52d978c4d49de63c6713d6036e3
|
|
FP mixed precision support added to GEMM kernel used for fp16 winograd conv on Midgard GPUs
Change-Id: I1619beb025fc484a1ac9d3e528d785edabbc7ee6
|
|
Change-Id: I770b044b67d93510ef65e556905135b34be7ea0a
|
|
Change-Id: I5f99e448e208be6ca819bf4ab2c7b367c874d3f5
|
|
Change-Id: I4c84a3156114d973fcff22a6b86a3c0044502fc8
|
|
Change-Id: I6e7dee8bd615a5eff01c523f208a218574ee5eab
|
|
Change-Id: Ieb6f0638174ea1deb8b457d8df81511651a246a9
|
|
kernels
Change-Id: I98183f95814442b6f3dbb67a1bdae99df05b9b01
|
|
Change-Id: I88c4a3b55943ed3bcf63cad63b0c7570770c2056
|
|
Change-Id: I54a3f34aa3657396e931f18c81af2941d3b56903
|
|
Change-Id: I5d2ed5dcc342abff8124762f7bdee587cdf20032
|
|
- Fixing a bug for which we did not scale the boxes before transforming them
- Adding the correct_transform_coords option to BoundingBoxTransformInfo
Change-Id: I40281254bcf87e7c8583c119e99562414fe59822
|
|
BoxWithNMSLimitKernel
COMPMID-1792: Accuracy issue in CLGenerateProposals
This patch does the following:
- Some fixes for GenerateProposals function and tests
- Adapting BoxWithNMSLimitKernel to only accept U32 tensors as keeps_size
- Update 3rdparty
- Adds a small tolerance for a GenerateProposals test
Change-Id: Ia8ec1cdfe941fe05003645e86deb9ea6a6044d74
|
|
Introduced F32 accumulation for F16 winograd gemm and output transform
WinogradConvolution will be available for F16 only if fast math flag is enabled
Change-Id: I215593c205236a0f9669218437bb40b184ec6a4f
|
|
With this patch we are able to dispatch a single GPU job also in case of
batched-flatten
Change-Id: I755e7af29d44b24f67fa04bad3c9b7646e8deefc
|
|
Increase tolerance for FP16
Change-Id: I88f95da5471bbceb7449f453e2e33cf0bc4da23e
|
|
Change-Id: I2c2250669829e399fdc2363f729dc5e68d8aac17
|
|
Change-Id: I8a9b1e16d90b9d99a6ff2a442347748432723b14
|
|
Change-Id: I99e1c3939cfea4b9cb0ddfa313706f31b213ca89
|
|
Change-Id: Ibc8d903c8d3c97b51dc8a3344197b56ad9d6c00e
|
|
Change-Id: Ia8d4e46ce5d9bb366af15726bc208dc14583c6ae
|
|
Change-Id: Icf813a0a87d4a07e180eafdb5fa916b2ea4028d2
|
|
num_elems_processed was passed as a scale instead of a step
Change-Id: I8c6d58fe4432f9f6beb31c0a1e02204c96775d98
|
|
AccessWindowRectangle::update_window_if_needed()
Change-Id: I56426cc9c9688a0aa0acdd439d5887c7ef208cd2
Note: The code to shrink the window hasn't been fixed yet.
|
|
in the install_dir
Change-Id: I5ba348d36325bcffb33b1e68435d5fe27cec8402
|
|
Change-Id: I69e995973597ba3927d29e4f6ed5438560e53d77
|
|
In case of CIFG optimisation scratch buffer should have a size of
[batch_size, num_units * 3] else [batch_size, num_units * 4].
Change-Id: I43e46f7b52e791472f1196f36e9142240ba76c5c
|
|
Added test cases to exercise the code path where the reshaping of B is performed on the fly.
Change-Id: Ifa4348e1054dc0019be3927f482adf64b18fd554
|
|
Change-Id: Ib0798cc17496b7817f5b5769b25d98913a33a69d
|
|
Change-Id: Id94fb9c88a498d7b938f4f707e2e7b9b6df94880
|
|
Change-Id: I5bf5d751ec7c02d96c26a769f49d03ea23a248b7
|
|
Change-Id: Ie13a9eb6d417388b5de533bffa895796d9d2cf62
|
|
Change-Id: Ibab049f09413258c99335b7da6b151530a1bd136
|
|
and 8 tensors (Part 1)
Creating special cases for concatening 2 and 4 tensors.
Change-Id: I6a739a494ae45011acb65369e353f9ef96970b90
|
|
NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
Change-Id: I1d5bc4d24059917f9ddef0873dd3043b1f2320a8
|
|
inside the namespace
Change-Id: I477f52a9adf06ba3730f94d411399977fce0f98a
|
|
-Use raw string literals in regexp in CPUUtils.cpp
-Avoid implicit cast bool->int
Change-Id: I45a403ab8d0be02bb8dec267fe59545ad1074292
|
|
Change-Id: I93b14106cda8a1f640cf5acf120d31e2ebdaf495
|
|
the test.
This is needed in order to calculate the offset between OpenCL timestamps and Wall Clock timestamps as they're using different clocks
Change-Id: I874b2a475bf98fd664a1e3e15045c80f0181af47
|
|
Some systems don't have enough memory to run the VGG networks, for example
on systems with only 2GB memory the VGG example fails throwing a bad_alloc exception.
This patch introduces the concept of global memory policy in ACL, the policy
is a mechanism which could be used by the library's functions to try to reduce
memory consumption on systems with limited memory.
In this specific case the VGG examples set the policy to MINIMIZE. The GEMM
function checks if the policy is MINIMIZE and in this case does not use the
pretransposed weights path as this requires considerable more memory.
Change-Id: I53abc3c9c64d045d8306793ffc9d24b28e228b7b
|
|
Adds 0.5f after scaling AVG pooling to be able to round to nearest as
vcvtq_u32_f32 rounds towards zero.
Change-Id: I22ce78f9e628cf4184a317edabce47211ab09456
|
|
Removed gemmlowp_mm_bifrost_transposed_dot8 kernel as not used
Change-Id: I43cf463a3a4c0cdb2808621c534ffd5c9fd47ca1
|
|
Increases the steps for calculating invsqrt used in L2 pool by 1 to increase accuracy.
Change-Id: Ib938a963809b07c30d47ec0675abae75bc086986
|
|
Change-Id: I57bbdbef85d1f6e8cf1d13324f9cc38a3e3f0cc3
|
|
Change-Id: If5be77602e37b14aea63d7ec6d8adab324628f04
|
|
Removes:
-sve_interleave_8way_block2_16bit
-sve_interleave_8way_block4_16bit
-sve_sgemm_3VLx8
Change-Id: I0aa35fe974d8e122937dfe8923ecf63ff5a52001
|
|
-Uses output quantization information for the activation layer.
-Updates checks for BoundedRelu at CL side.
Change-Id: I0447860e90f1c89b67b9ace3c8daad713f6c64e0
|
|
Change-Id: I953f3b63aa4910650a1a3f6faea31beb4f6f376a
|