Age | Commit message (Collapse) | Author |
|
-Fixes NEWinogradConvolution memory manager usage
-Moves allocations in prepare staged for GEMMDispatchWrapper.
Change-Id: Ic1c709ee473eb4968f5a081f2bc26960f882f8db
|
|
Change-Id: I9d4ba7d00d50a84f650f0449faa8a25226068fed
|
|
Change-Id: I29e35024e29781a6b943b568abec9c73649215e6
|
|
Change-Id: I6ee2c0b670727fc808fa636c53ddfaec3a0036c9
|
|
Change-Id: I49f1d865f5e7562f1d80db849353a89ef77e6a9e
|
|
Output of Priorbox should be independent of the input
data layout and should always be in NCHW format
Change-Id: Ie80cd4e51c78945b158c0db1af1923bdf8d7ea7b
|
|
Change-Id: I95fdf5bd85becfe081f6ae587284f3b294681308
|
|
Change-Id: I62d937533967b29505d3ac8a51b513f0c6de8cd0
|
|
Fixes for:
- ReduceMean, reduction on the X axis for FP16 with 8 elements was
performed only up to a certain point. The fix now takes into account the
number of elements of the vector and does as many reductions as
necessary.
- YOLOLayer, activation for FP16 has to be performed on 32 bits until
the FP16 approximations is fixed.
Change-Id: I75373f4edd37de476e6fe1a56de3ef386b65c619
|
|
NHWC reduction on 0 axis requires a lot of memory. Testing only
axis 1 and 2 for now.
Change-Id: I82e16a27b6dfc6b426e6294cde63c3d88cb41a09
|
|
-Simplifies import memory interface
-Changes the used of void** handles with appropriate interfaces.
Change-Id: I5918c855c11f46352058864623336b352162a4b7
|
|
-Adds NHWC support for FP16
Change-Id: I61addf8efecf511ac8cd5f8aa9afc3e09c476aaf
|
|
Changed random distribution to [1, 2] as values close to
zero generate mismatches.
Change-Id: I4a00fc4f445b123dea624dd8459efce945f06126
|
|
Change-Id: I376d29aa6ec1b52d978c4d49de63c6713d6036e3
|
|
FP mixed precision support added to GEMM kernel used for fp16 winograd conv on Midgard GPUs
Change-Id: I1619beb025fc484a1ac9d3e528d785edabbc7ee6
|
|
Change-Id: I770b044b67d93510ef65e556905135b34be7ea0a
|
|
Change-Id: I5f99e448e208be6ca819bf4ab2c7b367c874d3f5
|
|
Change-Id: I4c84a3156114d973fcff22a6b86a3c0044502fc8
|
|
Change-Id: I6e7dee8bd615a5eff01c523f208a218574ee5eab
|
|
Change-Id: Ieb6f0638174ea1deb8b457d8df81511651a246a9
|
|
kernels
Change-Id: I98183f95814442b6f3dbb67a1bdae99df05b9b01
|
|
Change-Id: I88c4a3b55943ed3bcf63cad63b0c7570770c2056
|
|
Change-Id: I54a3f34aa3657396e931f18c81af2941d3b56903
|
|
Change-Id: I5d2ed5dcc342abff8124762f7bdee587cdf20032
|
|
- Fixing a bug for which we did not scale the boxes before transforming them
- Adding the correct_transform_coords option to BoundingBoxTransformInfo
Change-Id: I40281254bcf87e7c8583c119e99562414fe59822
|
|
BoxWithNMSLimitKernel
COMPMID-1792: Accuracy issue in CLGenerateProposals
This patch does the following:
- Some fixes for GenerateProposals function and tests
- Adapting BoxWithNMSLimitKernel to only accept U32 tensors as keeps_size
- Update 3rdparty
- Adds a small tolerance for a GenerateProposals test
Change-Id: Ia8ec1cdfe941fe05003645e86deb9ea6a6044d74
|
|
Introduced F32 accumulation for F16 winograd gemm and output transform
WinogradConvolution will be available for F16 only if fast math flag is enabled
Change-Id: I215593c205236a0f9669218437bb40b184ec6a4f
|
|
With this patch we are able to dispatch a single GPU job also in case of
batched-flatten
Change-Id: I755e7af29d44b24f67fa04bad3c9b7646e8deefc
|
|
Increase tolerance for FP16
Change-Id: I88f95da5471bbceb7449f453e2e33cf0bc4da23e
|
|
Change-Id: I2c2250669829e399fdc2363f729dc5e68d8aac17
|
|
Change-Id: I8a9b1e16d90b9d99a6ff2a442347748432723b14
|
|
Change-Id: I99e1c3939cfea4b9cb0ddfa313706f31b213ca89
|
|
Change-Id: Ibc8d903c8d3c97b51dc8a3344197b56ad9d6c00e
|
|
Change-Id: Ia8d4e46ce5d9bb366af15726bc208dc14583c6ae
|
|
Change-Id: Icf813a0a87d4a07e180eafdb5fa916b2ea4028d2
|
|
num_elems_processed was passed as a scale instead of a step
Change-Id: I8c6d58fe4432f9f6beb31c0a1e02204c96775d98
|
|
AccessWindowRectangle::update_window_if_needed()
Change-Id: I56426cc9c9688a0aa0acdd439d5887c7ef208cd2
Note: The code to shrink the window hasn't been fixed yet.
|
|
in the install_dir
Change-Id: I5ba348d36325bcffb33b1e68435d5fe27cec8402
|
|
Change-Id: I69e995973597ba3927d29e4f6ed5438560e53d77
|
|
In case of CIFG optimisation scratch buffer should have a size of
[batch_size, num_units * 3] else [batch_size, num_units * 4].
Change-Id: I43e46f7b52e791472f1196f36e9142240ba76c5c
|
|
Added test cases to exercise the code path where the reshaping of B is performed on the fly.
Change-Id: Ifa4348e1054dc0019be3927f482adf64b18fd554
|
|
Change-Id: Ib0798cc17496b7817f5b5769b25d98913a33a69d
|
|
Change-Id: Id94fb9c88a498d7b938f4f707e2e7b9b6df94880
|
|
Change-Id: I5bf5d751ec7c02d96c26a769f49d03ea23a248b7
|
|
Change-Id: Ie13a9eb6d417388b5de533bffa895796d9d2cf62
|
|
Change-Id: Ibab049f09413258c99335b7da6b151530a1bd136
|
|
and 8 tensors (Part 1)
Creating special cases for concatening 2 and 4 tensors.
Change-Id: I6a739a494ae45011acb65369e353f9ef96970b90
|
|
NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
Change-Id: I1d5bc4d24059917f9ddef0873dd3043b1f2320a8
|
|
inside the namespace
Change-Id: I477f52a9adf06ba3730f94d411399977fce0f98a
|
|
-Use raw string literals in regexp in CPUUtils.cpp
-Avoid implicit cast bool->int
Change-Id: I45a403ab8d0be02bb8dec267fe59545ad1074292
|