Age | Commit message (Collapse) | Author |
|
- A reshape operator was bypassed so that the ofm shape for the
mean operator was changed.
- This caused a faulty decomposition for the mean operator with
an output diff as a result.
- The fix is to insert a memcpy operator to preserve the correct
shape. This memcpy will in most case end up as a nop so performance
is still intact.
Change-Id: Ibf6781d090e9eadbb4c874181a7a8c63bb557351
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Concat is implemented by several avgpool ops, all of them
writing to the same ofm but with a slice offset. If a compiled
network contains cpu fallbacks the avgpool ops might end up
running in different custom ops. This works fine as long as the
runtime provides the same scratch area. If not the output from
the concat might be corrupt.
- This fix adds an extra step to the pass packing so that all
avgpool ops for a concat is group together and run within the
same custom op in order to prevent possible corruption.
Change-Id: I343e08d7b4046f969b3d9ec3479db6490cbe4170
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Fix regression caused by too strict constraints on
SplitSpliceRead causing output diff for LSTM.
- As long as the SplitSpliceRead shape fits within the
consumer ifm shape it is ok to move the read.
Change-Id: Ia6f508f99638c3aedccc7fd9f31405527bb64f87
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- When possible, a read slice from a split or stride is moved to
the following op. The problem in this case was that the following
op was a Maxpool op (from Softmax). The Maxpool op is using a
different input shape compared to the original Softmax op, and
this input shape was then changed when the read slice was applied
to the Maxpool op.
- The result is a faulty Maxpool op with an output diff.
- The fix is to prevent moving the slice read when the consumer
input shape differs from the Split/Stride ofm shape
Change-Id: I649d89c38645fa51c20c3602954e2b8af9372076
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Add function description and type annotations to the optimization
functions missing them.
Fix type annotation issue when re-assigning variable
value to a different type.
Change-Id: I1ee442ff7a29cc07708fdd013430131eff599dd5
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
* Fix bug that caused filter padding to not be added proportionally
compared to the hardware padding added to IFM.
* Update needed_total_padding function that calculates hardware padding
to also account for the cases in which IFM width is not divisible by
the stride width.
* Update supported ops constraint on strides for conv2d to mark ops with
stride width > 3 and IFM width that is not divisible by the
optimization resize factor as not supported.
* Update unit tests that verify correct functionality when checking
whether ops are supported or not.
Change-Id: I62f14cca890b779ca787a9603fa37c873ad522f8
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
- Added int8 and int16 Exp support, implemented as LUT.
- Added generic 8bit and 16bit LUT table functions following
the implementation in the latest reference. If new ops are added
by the reference, they can easily be implemented in Vela using
the generic functions.
- Moved convert_to_lut to lut.py to have all LUT related code in
one file.
- Updated SUPPORTED_OPS.md
Change-Id: I388e76ea4b39162313599a5341cfb9bad71a782c
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Added int8 and int16 UNIDIRECTIONAL_SEQUENCE_LSTM support.
The implementation does not include support for:
* CIFG
* Peephole
* Projection
* Normalisation
This change also:
* Removed unused Op.BlockLSTM operation type.
* Removed the only one consumer limitation on putting the SplitSliceRead
on the tensor consumer(s), if all consumers fullfills the requirements
* Added Op.VariableTensorWrite as a Operation.memory_function to make
sure writes to variable tensors:
* Always use linear mode
* Are not moved to fast scratch
* Are not fused with other elementwise operation tensor ranges
Change-Id: Ief831738924ac3d1f2ba6d41f10bd6dc969911f3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Remove op_index constraint and force linear format for all Conv2D that
have strides that can be optimised.
Change-Id: Idef3508ab074ea9abeacac030eaaa15a00ad1211
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
- The logic when bypassing memory only ops is
complicated and it still does not fix all corner cases.
- This patch simplifies the logic by always bypassing
the op by replacing the IFM with the OFM. If that is not
possible the memory only op is changed to an memcpy op.
- The bypassing was previously done in two steps but
is now reduced to one.
Change-Id: I545dd65e0ec77c70be479a5ada2d277cac3a027c
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Reshape ops can be bypassed and there is no need to process them by the NPU.
There are use cases when the IFM must be preserved so a memcpy is needed.
This is implemented by an AvgPool.
- In order to reduce the cost of the AvgPool the IFM can be copied by DMA.
This is faster and also it can be turned into a real NOP in cases where
the IFM and the OFM can use the same memory space.
- Added new memcpy op. Only NHWC format supported since DMA can not change
the format on the fly.
- Allow ofm to reuse ifm for memcpy op
- Make sure the DMA copy size is 16 byte aligned
Change-Id: I3605a48d47646ff60d2bb3644dd3a23f872235a7
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Additional overflow checks are performed when running under
Microsoft Windows compared to Linux. These checks happen when
converting from Python int to NumPy int/uint
- The problem is that the lut activation values are int32 type,
however they are defined as Python ints. If these are converted to
numpy.int32 it could result in an overflow error
- The fix is to convert these values to uint32 but keep the
operator's IFM tensor type the same (as this will allow them to be
interpreted correctly)
- Fixing this highlighted another problem where convert_to_lut
always calls create_lut_tensor() with an int8 datatype, whereas it
should be using the IFM datatype
Change-Id: I781a9d850f654267aa4a67754438607c4bb95685
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
- The issue is due to undefined behaviour when casting a NumPy float
to a NumPy unsigned integer which occurs in create_const_tensor()
- The fix is to make sure that the values are first cast to a Python
float
- In addition, the values datatype argument has been removed from
create_const_tensor() to stop the tensor and values datatypes getting
out of sync
Change-Id: I134b9be8c941b361929a5ae7db8cb35f2e9728f2
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
Fixed an assert that was caused by a model that has a reshape operator
followed by another reshape operator. This structure has never been
thought of. However, since there is no need for the first reshape just
remove it from the path while traversing the graph.
Change-Id: I2a939df37502028ffc07115ac87e85375484efee
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Adds missing operators and type conversion recording to DebugDB
Change-Id: If76b0b430bbe73ae1469024c3160ecf0eea26abe
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Update copyright notices to use SPDX format and add OSS mail as contact.
- Update years on files where it had been missed.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f
|
|
- The previous patch the always replaced ifm with ofm
introduced unnecessary avg pool ops for some cases.
That patch has been reverted and this is a new solution.
- Replace ifm with ofm for the following condition:
a) Ops that are dependent that the original ifm tensor
shape is not changed by the bypass memory op function.
b) When the memory op has different IFM and OFM rank.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I16a023e169ae64c5db46f6f88516a5e1ca7ed7ef
|
|
This reverts commit 5060ff53f5ac2382e04a68d7772bd71a36f63845.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8dd7e9ed8325fd2e8c17509fd9757292706f5ee7
|
|
- In order to solve output diffs, the Reshape op was pushed
to the CPU. The problem was that the Mean op ifm shape
was replaced by the Reshape op ifm shape.
- This limitation is now removed. Changed implementation
how memory only ops are bypassed. Always replace the memory
only op ifm tensor with its ofm tensor. By doing this
the ifm tensor for the operator that is after the memory only
op is never changed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibcdebf33fd9b7a37f90984a129500b5dac52e5ea
|
|
- Added support for Resize Bilinear with half pixel centers for int8 and
uint8.
- Utilizes the new "TILE" padding mode.
- Utilizes ofm stride multipliers and modified tile base offsets to
write OFMs interleaved.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I37fa77c022a368f05fda0ead75d8696c9205f833
|
|
- Ethos-U65-512 requires the input to REDUCE_SUM to use NHWC format
- Updated the graph optimiser format check to cover this condition
- Added a exception check to the backend of the compiler to verify that
this condition is not been violated by the external api or Vela internals
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2f1fabcbd264daf77d5822349d855a3a32b12c64
|
|
- Issue was due to a previous patch to fix MLBEDSW-4350
- Manually reverted that fix 5fabfcaa2b636b02899b4d6e0ccf95d853986475
- Made a new fix for MLBEDSW-4350 that calculates the padding and
skirt by taking into account the split read offsets and shapes
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I96010c1b977011aecbc411a3c91ab3e61af22db4
|
|
* fix indices for tflite mapping of EXP operator
* fix indices for tflite mapping of Transpose operator
* ensure read offset after slice is aligned to 16 bytes for NHCWB16 or force linear format
* add unit test to ensure mapping of indices is consistent across TFLite, TOSA and NNG
Signed-off-by: James Ward <james.ward@arm.com>
Change-Id: I17b6e44bc06853325d5eea62a558418ee1ebefe8
|
|
Added support for Identity operation.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: If00b30528932f7531807ce3914d6c1875ab72fa4
|
|
Added support to map TABLE operator to LUT.
Limitations:
-Only supported for int8
-TABLE input must be constant
This also adds the support for TFLite legalisation of
Tanh/Sigmoid (int8/uint8).
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I1a95f61fb02fdd42c4a690494418cc0765c8b275
|
|
Memory only operators such as Reshape, Squeeze and ExpandDims are
removed in the graph optimiser step.
- Added semantic check that memory only operators have same
quantisation parameters on ifm/ofm.
- Added support for the ExpandDims operator.
- Addition and cleanup of related unit tests.
- Removed TOSA from the generated SUPPORTED_OPS.md documentation.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: If848d8afc58c18806e10997ed94e4dae83f30879
|
|
Added support for Data layout ops
RESHAPE, SLICE and CONCAT.
-No support for bool_t
-Support limited to Rank <= 4 and N = 1
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I487ac494b6506a2a6ba947ee758aa193194dd796
|
|
This is mainly to add support for depthwise conv2d
with dephmultiplier = 1.
(But there are no testcases suited, all I have sourced
has depth_multiplier set to 2, which is not supported.)
-Added support for depthwise conv2d.
-Added support for removing Transpose of constant data
-Added support for removing reshape
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I143e6246becfa78fd9f7510af0bf0d6b3fbbf2c7
|
|
Added support for
-AVGPOOL and CONV2D with TFLite correspondence
-MAXPOOL
-additional support for replacing RESCALE ops with avgpool.
No support for breaking down tensors over the
size supported by NPU.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I1d2aa50ac30a26283b3e6f1fe88cba1544b7c189
|
|
Fix inception_v1/v3 output diffs.
Removing the Squeeze operator in the graph optimisation step.
The squeeze operator removes dimensions of size 1 from tensor shape.
The memory layout is preserved.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I4ceffcbb141af5ed50b0d1a9d1d67622e638c2a1
|
|
Added basic TOSA support, enabling Vela to
read and compile a .tosa file corresponding to
CONV2D + Rescale + Clamp, and writing it to an
optimized .tflite file.
The optimized .tflite file, will in this case, hold
a commandstream where the Rescale and Clamp has been
fused into the CONV2D.
The optimized tflite file is not output from Vela.
-Added support to read .tosa file into Vela
internal structure.
- Added tosa_reader.py, tosa_mapper.py and
helper files stored under tosa/
- Support for this limited to ~10 ops
-Added reader_util.py for functions common
for TOSA and TFLite
-Added tosa_graph_optimiser.py
-Added support to fuse Rescale into convolution
-Modified handling for padding
-Added support to fuse Clamp to previous op
-Added graph_optimiser_util.py
-Moved functions common for TOSA/TFLite graph
optimization to this file.
-Renamed graph_optimiser.py to tflite_graph_optmiser.py
-Added separate tosa_supported_operators.py
-Added supported_operator_util.py
-For functions in common for TOSA/TFLite
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic3c540504ec8c5eb4771397fdc6882050ecf33ab
|