Age | Commit message (Collapse) | Author |
|
- Also changed to use Ethos-U where appropriate
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ie45ba2bb3935b305abe897b78b498681296cb7c1
|
|
Vela only supports per-channel scaling for
convolution ops. This commit adds a check that
puts ops with per-channel scaling on the CPU.
A caveat worth mentioning is that neither
TensorFlow Lite or TensorFlow Lite Micro support
per-channel scaling for the CPU placed op,
however the problem is moved away from Vela.
This commit also changes a small utility function
in supported_operators.py used for docstring
formatting.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I9ed090592f1d05dd4566d3e54dba1ef405299383
|
|
- Improved tensor and scaling query functions
- Fixed bug in convert_batched_fc_to_conv
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibc3d14036540f27cf5e993beb2163d3e0f5e5933
|
|
Change-Id: If63acbc3bcb986db6b81afa4078d5abed05d8afa
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
- Improve the conv estimation when the block size is very small
- Estimate cycles on bias/scale channel
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I275770b7f013b0812fc1ffe91f42ad07727c9dc7
|
|
Added version to the external API
-Added CLI-option --api_version
-Added API function to get the API version
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I0143b50adf884a2b05145912a1c7bef8cecc5f02
|
|
Fixed DepthwiseConv2D fails when bias tensor quant_values are None.
Also fixed DepthwiseConv2D fails with implicit depth multiplier.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I799a565eefa498ccf7ac626fcd472b8cbd908931
|
|
Fixed Reshape operator fails with TypeError during deserialization
in some cases.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ib34142f64295de4524e52a7a28eb36e503047bc0
|
|
EXPAND_DIMS is not yet supported by vela, and so should not be in the
list of supported ops.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I5eca13eb52eb9b40ecc6592cda978614c71db99d
|
|
Updated SRAM size calculation for scale tensors.
Change-Id: Idaecc3bf0c83d58ea70163bfd194c594295b66db
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Fix for setting rounding to TFL for fused Quantized
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic203f95f8916e330bcbf5792b52661b6f3e99bfc
|
|
A new CLI has been added that allows the generation of a report
containing a summary table of all TFLite ops that can be placed on the
NPU, and what the constraints are for that operator to be successfully
scheduled on the NPU.
This option will generate a new file, SUPPORTED_OPS.md containing this
information, in the current working directory.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I6a7e2a49f251b76b2ea1168fff78e00da1910b25
|
|
Usage of shape[-2] could cause index out of range.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I1b64b117f8236ce9ba321ca03bdb25e5a03a6589
|
|
None inputs and unsupported tensor shapes caused asserts when
marking tensor purpose/format.
Change-Id: I4498b61576f529c1a594341cfbb6ba278c6e7ec5
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Do not convert batched fully connected operators to avoid moving
weights from flash to SRAM.
Change-Id: I873c9ce05377de3f16e4cee9a0863f29d9ec3ad4
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Bug fix for a regression: Vela could crash for operators placed on CPU.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I99dcfdb4d3029ad86ffd2c8b3fd2547554794b79
|
|
Put softmax on CPU if beta < 0
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I4ec866dd44d14e2737c4cd96474e54bb770bfb3e
|
|
When encountering a sparse string buffer, Vela fails
both due to missing a mapping for a Numpy string type
and also for not being able to read sparse buffers.
The failing line is attempting to reshape a [100]
buffer into a [3, 5] tensor which does not work due
to Vela treating the buffer as non-sparse.
The solution here is to simply not do the reshape
for string buffers (which all appear to be sparse)
since it is not something that will be supported in
the future anyway.
The related operator can then be pushed to the CPU
as expected.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Iea0af6cd60a691f975209014b6aa098dde8d6a4b
|
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit reverts a control flow path where
already modified StridedSlice operators are
left untouched.
If not, Vela would recurse infinitely and crash.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Iaf3ae916325bedd3dd1edd3395fb4a9ecf832590
|
|
mlw_codec is part of the codebase and has build flags.
README has been updated to include these.
Also, added -Werror to the list, as we must build without any warnings,
so treat warnings as errors.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I10114bb013fad1ec1685fafc2e41c18ff12d9f9d
|
|
- Added mechanism to track input to output graph transforms for
debugging the resultant command stream.
- Provides base implementation for MLBEDSW-2661
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2dfe8a409fbde7ad0282bfab5acb11ba1c8b82d8
|
|
Change-Id: I9e00afe0eef0e13fe990e021bcbe3dd0eda4c471
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Change-Id: I8f139381d0e01e8ac70d89c4a312ee3000fb5fa1
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- DMA ops cycle estimation for the first pass
- fix a bug in ifm_blk_depth calculation
- fix a bug in sram bandwidth calculation
- merge dpu and elementwise cycles into npu cycles
- use str.format() in performance print
Change-Id: I78895416f47fc3c652743c5da13fc45630322371
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
(cherry picked from commit 5245e97a62c2fe54250f99b06e778f3e0c6dc376)
(cherry picked from commit 16e415677403fc04a90b1a7ec554761d38315640)
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: Ic6ae795a1626d1cdf63a69d2ff86f7cd898f3134
|
|
For IFM streamed cascades bias tensors are read several times.
Moves these tensors to fast storage and add DMA commands.
Change-Id: I630f6275986c1b5e3f126c925b11e22500fb1128
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
This commit removes the constraint on all tensor
shapes matching the OFM shape.
The motivation is that this constraint essentially
only checks that the fixup function has run.
This means that it removes the possibility for the
fixup function to run after the supported operator
check and this effectively means that any
StridedSlice operator that would be placed on the
CPU is still modified by the fixup function.
Because the fixup function is moved to after the
supported operators check, some unreachable cases
are removed from the fixup function.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I7a82126b7de73bd67873b4e6daf53a6767e33d16
|
|
Changed so that there is an option to set if Tensor clone should be
seen as unique or not.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ie51c1a5e84b535380d498b105aa18ccba1c8b27c
|
|
Improve u65 softmax performance by selecting more feature map
tensors as SRAM candidates.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I239c9dbebbf2a929004eb01bb0f3efe77f5b97aa
|
|
Previously the internal operator type was printed when checking the
supported operator checks. This now converts that back to the external
type name.
Additionally removed dead code and changed the message for cpu-only ops
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ib2b0cbcb49fdf63edb835828e266b079e63bae37
|
|
Removed the CLI opt ifm-ofm-overlap
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I23faa0d10c3e71972c543e22e8155086fce73556
|
|
All existing constraints have now been refactored using the new
framework.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ic9ba0d7040cb9f114b959a949bfdf777f86752c7
|
|
Added a supported_operators check for Relu activation functions. If the
scaling value overflows to infinity, it will be placed on the CPU.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I66b7bec062599609aadcbb7531caebbc45a7451f
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
|
|
Set the actual size of the Scratch and Fast Scratch buffer and remove both
Scratch buffers from the subgraph inputs.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I9e4213f48289d9136cdd4cd43c668d37c6af8530
|
|
Separate scale+bias tensors by different equivilence_id.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I674341950bc001ac6e4015206995f048a0dfee75
|
|
- copy bandwidth compression rate when weight tensor is cloned
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I41c4c1f7001e8dc12af35695f5f5d02815e28351
|
|
Enable overlap of elementwise input/output
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I6e6f11953319c843c8203bf038f96778df194332
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I91a3b277cda91dca3bad38908d4ed11a4f5d7d5f
|
|
- Fixed typo in Tensor.is_quantized()
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I36156a6aa5aaff01c4f271a6a8325636173225f3
|
|
- Normalise kernel availability by requiring all operators offer a kernel
describing how much data they consume from the source, per OFM element,
regardless of whether kernels are relevant to the operation.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idbcff64879fc2eccf292b6208a7d2038eb388017
|
|
- Fixed and documented both tensor and quant params scaling checks
- Added quant params validity check and tensor quantisation check
- Added valid tensor checks to some graph optimisation functions
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I8d6e8f03a603d28886dde511672c8399c85b794c
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I9f3671041c2b1497519cf42b5f52e3cd01d9c10a
(cherry picked from commit e8c989f5236cce12d07a6644329935dbbf0ee8e6)
|
|
- Refactored mark_tensor_purpose
- Initial weight compression is now always done in insert_dma
- Removed mark_tensor_format
Change-Id: Ic719b9bcd1d27e1390d7b9ce8cd21795139ec814
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
Change-Id: Ie404a0c13e7c7de0eff649f77e0147a0f3d73acd
|
|
Using a new system to report constraints, replaced existing
functionality for checking conv-like ops.
This new system will allow reporting of all constraints regardless of
any input network.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: If81177deca2a3b57c9dd9a3a08868cbc9cef0c23
|
|
This commit fixes a bug where a rewritten Unpack
operator is placed on the CPU and crashes Vela
during serialisation due to the type having
changed and there not being a mapping for the
modified op type.
The solution is to move the fixup_unpack_output
function to the graph optimisation pass B,
allowing the supported op check to run before it.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Ic6bd4c70a478fd61adf377cb487f5b9253130314
|
|
Suppress info print that Const/Placeholder/SubgraphInput are not supported
on the NPU.
Change-Id: I6f323b64185b01b619b584c1473ae61d010ab3a4
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This reverts commit 04986c0016e59993563490fe67052371fc0e1ad2.
Reason for revert: Merged by mistake
Change-Id: I150ad9ba7074ad1e80f21180aeba56a454d9f748
|
|
Suppress info print that Const/Placeholder/SubgraphInput are not supported
on the NPU.
Change-Id: I689d25481df0cd10487484c9f639e4253df081ee
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|