Age | Commit message (Collapse) | Author |
|
Minor fix in SPLITV tensor indexing for supported operators check.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: If8fa702bfbb25a4a7e5bdb136a19ef72eec7e1c2
|
|
- Changed to --cache-bias-scale-tensor
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I285fe253f03ba98eff36dbe996ad3a57e2ee3d99
|
|
Default arch instances are cached as they are expensive to create,
and they are created often when using the external APIs.
Change-Id: I16802fa767e6750da4227c6266d7c4453c047001
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Blockdep calculation can now handle different sized IFM/OFM.
Change-Id: I898a3c1c3a6778916802f3dbfa658328e5093096
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Do not use DMA for weights of a FullyConnected op that has
been converted to a Conv2D.
Change-Id: Ibf6710c0a1723c8b48c563ca204f274af5ca88ce
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit adds a constraint to FullyConnected
ops in supported_operators.py that puts any
such op on the CPU if tensor dimensions of the
output(s) are not 2D.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I8c898a780b40fc4a1383c09213f0696ea6699b7d
|
|
Added public API function npu_find_block_configs.
Change-Id: Ib0925a62d7c5d19a9b9fbd8d808943c2ea2df02f
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
-Fix for end_coord for upsampling
-Remove restriction for ifm streaming
-Added restriction for cascading on ResizeBilinear
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I384abf12cfe8ac9ce7b76066b709600ea901b248
|
|
- Added API.md that describes the external APIs.
- Renamed npu_get_api_version
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I6e6e6103a889da656b4e00c3cce3eee60dfa844a
|
|
- Improve conv estimation by adding delay cycles
- Estimate minimal block cmd cycles
Change-Id: Ibea818e8e820731fc7d05c948d5d1abd22e17089
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Changed so it is not allowed to do ifm-streaming for
TransposeConv and ResizeBilinear
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I85da279fae6202830c46e4a5500fb1b0dd6ef542
|
|
When printing a set in the docstrings for the SUPPORTED_OPS.md file, the
order is random.
Reuse existing sorted string repr for the operator list and apply to
other printed sets (data types)
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I2ac12ea91c2637219e5c24f9a863aa0fc2086e77
|
|
mlplatform uses gitiles, which in turn renders markdown differently:
"There must be at least three hyphens in each column of the header row"
Updated the generation code and the snapshot file to respect this,
as well as changed the link from commonmark (which does not support
tables)
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: If31860ce8e38ebe7d68bfec61faff805fc00345b
|
|
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I2e8384a044ee5458bc8c92562153b6383de5f17a
|
|
Added external API to add driver actions to a command stream.
Change-Id: Ie4779c1c745defc5769fa694358470cd6aea191c
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
All external APIs are now exposed by api.py.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I33f480e424692ac30e9c7d791f583199f31164a7
|
|
After weight compressor weights has correct sizes. Placing move of scale
tensors after weight compressor gives more accurate estimate of available
SRAM for scale tensors.
Change-Id: I4571780180778ef43e943c4e98048e17d6f33580
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
This reverts commit 15a8e803844b286fe9533e1cf703c76a77b090a8.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I64169443f473c9ba42551281ad6ac4b45856f420
|
|
Change-Id: Ifbd6c053ac618bedce0f56fe5c4c647a71d9cc46
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
- Updated and aligned the --help and setup.py descriptions
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I78c11b1b3dd51284b34d57a6caca45cd222b4678
|
|
- Fixed bug due to typo in Op.type refactor
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55916d90bf792648f496a45c358b7e897c6730ba
|
|
- Removed unused --show-minimum-possible-allocation
- Change --allocation-alignment to --cpu-tensor-alignment
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I00e367c3190aeea08a3f136332711e9accc85ba3
|
|
- Added sample vela.ini config file
- Changed vela config format, split into system config and memory mode
- Removed unused CPU cycle performance estimation
- Added new CLI options for --memory-mode and --verbose-config
- Changed CLI option --config to take multiple files
- Removed CLI option --global-memory-clock-scales
- Changed error helper functions to raise a VelaError exception
- Refactored to create a new is_spilling_enabled function
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I27c41577e37a3859edb9524cd99784be10ef0a0d
|
|
- Also changed to use Ethos-U where appropriate
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ie45ba2bb3935b305abe897b78b498681296cb7c1
|
|
Vela only supports per-channel scaling for
convolution ops. This commit adds a check that
puts ops with per-channel scaling on the CPU.
A caveat worth mentioning is that neither
TensorFlow Lite or TensorFlow Lite Micro support
per-channel scaling for the CPU placed op,
however the problem is moved away from Vela.
This commit also changes a small utility function
in supported_operators.py used for docstring
formatting.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I9ed090592f1d05dd4566d3e54dba1ef405299383
|
|
- Improved tensor and scaling query functions
- Fixed bug in convert_batched_fc_to_conv
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibc3d14036540f27cf5e993beb2163d3e0f5e5933
|
|
Change-Id: If63acbc3bcb986db6b81afa4078d5abed05d8afa
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
- Improve the conv estimation when the block size is very small
- Estimate cycles on bias/scale channel
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I275770b7f013b0812fc1ffe91f42ad07727c9dc7
|
|
Added version to the external API
-Added CLI-option --api_version
-Added API function to get the API version
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I0143b50adf884a2b05145912a1c7bef8cecc5f02
|
|
Fixed DepthwiseConv2D fails when bias tensor quant_values are None.
Also fixed DepthwiseConv2D fails with implicit depth multiplier.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I799a565eefa498ccf7ac626fcd472b8cbd908931
|
|
Fixed Reshape operator fails with TypeError during deserialization
in some cases.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ib34142f64295de4524e52a7a28eb36e503047bc0
|
|
EXPAND_DIMS is not yet supported by vela, and so should not be in the
list of supported ops.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I5eca13eb52eb9b40ecc6592cda978614c71db99d
|
|
Updated SRAM size calculation for scale tensors.
Change-Id: Idaecc3bf0c83d58ea70163bfd194c594295b66db
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Fix for setting rounding to TFL for fused Quantized
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic203f95f8916e330bcbf5792b52661b6f3e99bfc
|
|
A new CLI has been added that allows the generation of a report
containing a summary table of all TFLite ops that can be placed on the
NPU, and what the constraints are for that operator to be successfully
scheduled on the NPU.
This option will generate a new file, SUPPORTED_OPS.md containing this
information, in the current working directory.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I6a7e2a49f251b76b2ea1168fff78e00da1910b25
|
|
Usage of shape[-2] could cause index out of range.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I1b64b117f8236ce9ba321ca03bdb25e5a03a6589
|
|
None inputs and unsupported tensor shapes caused asserts when
marking tensor purpose/format.
Change-Id: I4498b61576f529c1a594341cfbb6ba278c6e7ec5
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Do not convert batched fully connected operators to avoid moving
weights from flash to SRAM.
Change-Id: I873c9ce05377de3f16e4cee9a0863f29d9ec3ad4
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Bug fix for a regression: Vela could crash for operators placed on CPU.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I99dcfdb4d3029ad86ffd2c8b3fd2547554794b79
|
|
Put softmax on CPU if beta < 0
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I4ec866dd44d14e2737c4cd96474e54bb770bfb3e
|
|
When encountering a sparse string buffer, Vela fails
both due to missing a mapping for a Numpy string type
and also for not being able to read sparse buffers.
The failing line is attempting to reshape a [100]
buffer into a [3, 5] tensor which does not work due
to Vela treating the buffer as non-sparse.
The solution here is to simply not do the reshape
for string buffers (which all appear to be sparse)
since it is not something that will be supported in
the future anyway.
The related operator can then be pushed to the CPU
as expected.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Iea0af6cd60a691f975209014b6aa098dde8d6a4b
|
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit reverts a control flow path where
already modified StridedSlice operators are
left untouched.
If not, Vela would recurse infinitely and crash.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Iaf3ae916325bedd3dd1edd3395fb4a9ecf832590
|
|
mlw_codec is part of the codebase and has build flags.
README has been updated to include these.
Also, added -Werror to the list, as we must build without any warnings,
so treat warnings as errors.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I10114bb013fad1ec1685fafc2e41c18ff12d9f9d
|
|
- Added mechanism to track input to output graph transforms for
debugging the resultant command stream.
- Provides base implementation for MLBEDSW-2661
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2dfe8a409fbde7ad0282bfab5acb11ba1c8b82d8
|
|
Change-Id: I9e00afe0eef0e13fe990e021bcbe3dd0eda4c471
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Change-Id: I8f139381d0e01e8ac70d89c4a312ee3000fb5fa1
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- DMA ops cycle estimation for the first pass
- fix a bug in ifm_blk_depth calculation
- fix a bug in sram bandwidth calculation
- merge dpu and elementwise cycles into npu cycles
- use str.format() in performance print
Change-Id: I78895416f47fc3c652743c5da13fc45630322371
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
(cherry picked from commit 5245e97a62c2fe54250f99b06e778f3e0c6dc376)
(cherry picked from commit 16e415677403fc04a90b1a7ec554761d38315640)
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: Ic6ae795a1626d1cdf63a69d2ff86f7cd898f3134
|
|
For IFM streamed cascades bias tensors are read several times.
Moves these tensors to fast storage and add DMA commands.
Change-Id: I630f6275986c1b5e3f126c925b11e22500fb1128
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|