Age | Commit message (Collapse) | Author |
|
Enables LUT for LeakyRelu with int8/uint8 even if input scale
is different from the output scale.
Fusing LUT with a previous operator for this situation
requires further work.
Change-Id: I9eddfe36f457e763d44eb3e05fbe240eac7cfec9
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Processing reshapes at the end of NPU subgraphs selected NHCWB16
tensor format before handing over to the CPU. This commit detects
end-of-subgraph during the reshape-consumers compatibility check
and chooses NHWC format instead.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ieefdbecdba1a6183d79d3ac4d2505503dbf321cb
|
|
Allows int64 data type to be used as long as all values can be packed
into a int40 value.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I0e25ec482e3ea765a5fd00bcf7e212a9e65a1461
|
|
Fixed serialisation of scalar ifm tensors with values larger than
byte sized.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I2714398db91b83f24e5271c1d5de1c0e8211f9ab
|
|
Added checks for not using NHCWB16 for reduce_sum int32 which makes
int8/uint8 softmax work.
Also enabled softmax graph rewrite by default and fixed a saturation
problem.
Change-Id: Ic01bd9ece7e5c3edb2900b7915cc747efe9e5760
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I287c24725126c169afec779b921e43c3ab26f739
|
|
- Setup ifm/ifm2 based on primary op's inputs
Change-Id: I727eab473165d7cc876b70fa8873fbc0c1480fb5
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Updated kernel size check, width and height was swapped
and added weight sum check.
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
Change-Id: Idb18cf258ac19b3a0d71134dab5a117bcd778b59
|
|
- Reshapes that merely add/remove dimensions, rather than re-layout the
data need not fall back to NHWC. This commit allows reshapes betweeen
NPU operators to use NHCWB16.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ieb7745e586bf324e92e741a04b74caf7285f4b8b
|
|
Signed-off-by: Stefan Nannesson <stefan.nannesson@arm.com>
Change-Id: I7ad0b8e5b2431b46b53f51d809ca2642039a0012
|
|
For int16, using LeakyRelu (with bug fix) gives exactly
the same results as Mul+Max if input/output scales are the same.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I4f4db464d77b0aaf0d25ddfca534f91d08db548d
|
|
Added --weight-estimation-scaling, which enables
additional scaling of weight compression scale estimate.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Idcda41257f44901d3a3f345341e07fb1ae8585a9
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I2cb3f6639e4bb8a984fa3647ee7b4678ed6f5890
|
|
LUT related updates specific for 16K SHRAM:
- prevent LUT DMA transfer from overwriting accumulator SHRAM of an ongoing operation
- do not use the last 2K of SHRAM as accumulator during LUT operations
Change-Id: I17066e0410c6f07b125ed245002d7b19269a7a8a
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit fixes a bug wherein Split operators
are being erroneously placed on the CPU due to
a 0-dimensional input that disqualifies it from
NPU placement; a restriction introduced in a
recent commit.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I83c047ddf071d662343087c69bdb2a014dd209c3
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ida307afc33cd7963bdeb505df400732a3efcc846
|
|
Replaces LeakyRelu operations with LUT activation function when possible,
else to a combination of multiplication/maximization.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I3d2eb2dba7145997c3cc711d0ef18ab355fbb416
|
|
- Minor cleanup of register command stream generator too
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0514622402ee9b0557769dd7c7decfddecc87ffa
|
|
- Fixed bug with the supported operator check rejecting operators based
upon an incorrect comparison of the tensor quantisations
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibd0eb50077465d2c515c6ee10394d9b43cdf730c
|
|
Includes a number of changes:
* Handle non-existing optional inputs
* Handle disabled optional inputs (-1 indexed)
* Added unit tests for parsing operators
* Add bias tensor to the different Convolutions + FullyConnected if
it's missing.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Ib88d2b610314b1c886fc0aef4f9da87430ce6ae5
|
|
Implemented LUT generation for softmax uint8/int8 to match the
reference.
Change-Id: Ib9acaa295ee1066591e800023d75f364520b44c1
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Very small quantization scales, below around 2^-31, would return
negative shift values.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I4ca368284c097820f83e5ae53412a08c34516c7f
|
|
-Make it clear that --permanent-storage option, only is valid
for Ethos-U55.
-Removed Shram from allowed values
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ice6cacd509713e33bcb380c16dcd3c3b34a82a33
|
|
Added that NHCWB16 is accounted for in the sram estimates
in the scheduler, for intermediate buffers in ifm streaming.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Icda5e05dd3663935f528f1a06d36d9e1de123cc8
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ia83ab5ba28d193215e3f8fbc52552b0356111723
|
|
There may be cases where after optimisations, there are no operators
contained within the subgraph. Upon serialising and writing out the vela
optimised tflite file, it would crash for such a corner case. This fixes
it allowing it to not crash but instead write out the empty tflite file.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ia879d1ffdbab21706b15e99aa107fb2d8d4dd3de
|
|
This commit adds an entry in the tflite_mapping.py
for the ROUND operator, which was previously missing.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I22d6c60969eea6a785366c6741893718ba3cb8ae
|
|
- Removed some of the clutter
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I9a12f681247befd44dbbc9d7fbd135f0603d2fbd
|
|
- Fixed. It only affected operators with striding greater than 1x1
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I129e46586aa16079ddbce3898569676ba9891372
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I04f299e2d3319113fedf2fa401b88bae64fea66d
|
|
This commit adds missing entries and options in the
tflite_mapping which should in theory allow every
existing TensorFlow Lite operator to be passed through Vela
without crashing.
Previously some entries were missing and was crashing
with a custom error whenever encountered.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Ia69b7a84164bb57c52ceaf7380160794b7f0d9ee
|
|
Vela often fails when encountering operators that have
inputs or outputs with shape == []. Only for elementwise
ops where shape is broadcasted from IFM2 to IFM1 is this
supported.
This commit adds a restriction which places ops with
shape [] tensors on the CPU except in the special case
of broadcasting for elemwise ops.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I5b0855233e3b83870209f4da00fb2dbd0184fee0
|
|
DMA transfer of weights is prevented when the weight
double buffer is assumed to not fit Sram.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I9809dca1d4b335436e1a0b81093640361ada255e
|
|
NHCWB16 is avoided for the input tensor for SplitSliceRead,
when any of the consumers has an start offset in C-dimension
that is not a multiple of 16.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I333e2acfbeb02b9c34ee5ea28074baff12ea7b24
|
|
Added graph rewrite of Softmax for uint8/int8.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iecdd5d2cd3156a601b3313debba4a3562e6be5d7
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: If22fd21f9953a62305620a4e804e5caacb342c89
|
|
This commit fixes a bug where CPU ops were getting
passed on as NPU ops in weight_compressor.py due to
Operation.find_npu_op() incorrectly returning any
op with an 'npu_block_type' attribute (which every
op has) as an NPU op.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I7a758f8d1b1237907816bc1be7b77aff765ae688
|
|
4 dimensions where assumed in check if NHCWB16 should be avoided.
Changed check so that if axis corresponds to C-dimension,
NHCWB16 should be avoided.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I7784a7a813a3c3438d6142523bf0a3ba81742aca
|
|
- This commit removes unnecessary dependency checks and implements
on-demand calculation of the NPU/DMA dependencies.
Signed-off-by: <tim.hall@arm.com>
Change-Id: I85e681d1ab133bd88f64296dc00500f3c188e777
|
|
Added complex64 datatype to allow pass through without crashing.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I8beeceafb32182d4877a9880d21d51ba21033030
|
|
- Support for more than one 256-byte LUT in SHRAM
- No DMA is performed for a LUT that is already located in SHRAM
- Added MemArea.Shram, used for LUT, to avoid false address collision
asserts during SRAM tensor allocation
- Added read access to LUT in memory access calculation
Change-Id: If4d1eded5ed029d253f4f5efb2d80495fc3eac99
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Avoid usage of NHCWB16 when Stack/Pack/Concat is performed in axis 3,
and the "concat start" of each slice to be combined is not a multiple
of 16.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: If3f7b4a3424be3c86fc2dc48e8649ce4c4f49485
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Id762ee2c03cd8f162cd0c450511ee5b2e0624586
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I5b8db6430e79ec7a5836d8dd00a03413647de8ba
|
|
*the decorator is causing the verification tests to fail when using TF
2.1, but not with TF 2.2, hence removing it for now.
Change-Id: I07357c0fef383d9a65278fe99ad8e4d3f7dc6d9b
Signed-off-by: Manupa Karunaratne <manupa.karunaratne@arm.com>
|
|
This commit adds a missing entry for TensorPurpose.Unknown,
mapping to MemType.Unknown in the tensor_storage_mem_type
dictionary in the ArchitectureFeatures class in
architecture_features.py
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I6c3d942e8c6f1c71c6496bdd621ca8d46ea76147
|
|
This commit amends a mistake where the resample_mode
attribute of a tensor would be accessed without checking
if the tensor in question was actually there first.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Id2ceb1d6e38133611fcecfc2ac97150c927ceee2
|
|
Avoid concat op as predecessor in ifm streaming,
when Sram spilling is to be applied.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I2ba6283a7561a12d54a06552a15e122bb082b7a1
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I566abd5a1ffc367c6b9b8f37d5a26b61d27e840b
|
|
Fixed an issue with Fully Connected weights' shape used for compression
scale calculations causing incorrect performance estimates.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Id3a5c187ad3e942b8e3d4c690b3dbba3c6fda922
|