Age | Commit message (Collapse) | Author |
|
- Updated release notes and setup.py tag for 3.4
- Regenerated supported ops information
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4ec88544b84cab168cb3e5cbc6bc392b6b3d8a39
|
|
One level deep relative paths (ie ./vela.ini) were treated as the name of a
folder in config_files was ".". They are now treated as relative paths.
The warning message when using an absolute path has also been moved to
to the error message instead for a better user experience.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925
|
|
The argument to the lstrip function is a list of all characters that
should be stripped from the beginning of the string, in any order. To
remove the actual prefix, check if the string starts with the string
instead and then remove that amount of characters. The function
"removeprefix" was added in python3.9 which does exactly this, but
that is not yet available to vela since it supports python 3.7.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39
|
|
- The latest numpy versions require Python 3.8
- This can cause issues if Python 3.7 is installed which is the version that
Vela is tested against
- The fix is to limit the numpy version to those that support Python 3.7
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3a388976d5aa76395ca93202e496640c8de9f6f4
|
|
- For allocations that have a hard memory limit the Hill Climb allocator
should be given more attempts to find a solution that would fit
- The fix is to use a memory limit when there is a hard constraint, and
a minimum iteration count, reset on every improvement, when there is a soft
constraint
- Added maximum number iterations CLI option
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
|
|
- Problem is due to a divide by zero
- Fix is simply to detect and assign zero. This could also affect
improvement_sram
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
|
|
- The print_performance function that is called when using the
--verbose-performance option crashed with KeyError when no SRAM was
used.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
|
|
- The npu cycles are not correct calculated when only
one weight buffer is used, since weights can not
be fetched in parallel.
- Added new calculation in the single buffer case.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
|
|
Update to the "Vela splitting network into two ethos operators" patch
allowing the CPU pass to be moved last in the pass_list.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2e8a299101e5d65e963327bed7c8d891fff6523e
|
|
- Due to how the graph is traversed, the final pass list contained unnecessary
multiple Ethos-U operators. Functionality wise not a problem but it adds extra
context switching between CPU and NPU.
- By applying sorting rules to the pass list, it is possible to create a more
optimal pass list that reduces the numbers of Ethos-U operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib556f902e1f321b5c50238fada7aa92b9810b27a
|
|
Add directory structure to support third party config files. Config
files should now be placed in an appropriately named directory under
the config_files directory, but can also be accessed by providing its
absolute path to vela --config.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I2fcf52e7b2ddd2c4491dc370c85c0b3937d18062
|
|
- Added support to print per operator sram usage and performance
information
- Added new CLI option --verbose-performance to control this feature
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
|
|
Allow schedule do be used when calculations says zero total improvement
but calculations on the other hand shows there are dram improvement.
When testing on real target, total performance is improvement.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
- Due to that bigger weight buffer sizes are being used, there are use cases
when feature maps are evicted from SRAM, causing the total performance to drop.
- A way to improve this is to limit the memory for those weight buffer ops,
to get the feature maps back to SRAM, and see if total performance is improved.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
|
|
Removing constraint for negative alpha value in ReLu
for int8 and uint8.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Id7a3a30bf5d1f0a591f990bd04cd0dbbad5819c6
|
|
This commit downgrades the required Python version
to 3.7 from 3.8.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I07057908b97bcd94663f001474d877ba41411ae1
|
|
- Added the offset address to the command stream disassembly
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55c6ef59878c90c21d41051c076da6c1f0fa4201
|
|
This reverts commit d2b5510697e7789f5a416f9d80d3cb640eecc092.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ia3043bc9c27fe2f72f3ab2f6f7341b3a9adb4231
|
|
- Cascading a slice operator with read offsets is not
supported by the rolling buffer mechanism causing the
address to get out of range.
- The fix is to prevent ops to be cascaded if they have
read offsets.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iea7f054ac4b5a7dadf905bbe947033247284c27e
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Generate flatbuffer files with relative imports.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.8
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba
|
|
*Added generic function which checks if underlying shape of
FullyConnected operation is 2D and performs shape reduction
*Fully connected operation >2 dimensions now run on NPU if the above
case is satisfied
*constraint_fc_output_2d and rewrite_fully_connected_input refactored
*Added unit test to confirm this functionality
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38
|
|
- This is due to calling range() on a non-integer value which in turn is due
to a change in the behaviour of round() on numpy.float64 values
- The fix is to always force the output of the round() to be an integer and
thereby stop whole number floating point values propagating into the kernel
dimensions which later feed into the range().
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92
|
|
- Modify the operator clone function to also clone resampling mode
attribute.
A previous patch changed the ifm resampling mode to be an attribute of
an operator rather than a tensor but did not modify the operator clone
function to clone the new attribute.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7a2f6103666a0997f657de20ad962e849976b904
|
|
Corrected calculation for used bufferering depth. Before change there
were scenarios when it was set to smaller sizes than needed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I162859ade78487e848510c6a605685e4568c7068
|
|
Removed numpy version limit.
Change-Id: I01e4d27754fe037be227d7329c4e1a8f1cea6315
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- Changed comments to docstring on QuantizationParams
- Simplified op type to op name conversion
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2fdf5922cc17944c9bd37917a85fdfe50a1e651d
|
|
- Added optional name attributes to operators and tensors
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3b5d881a7b1043a6ba4b58fff5d7532b271ba536
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added a mechanism that reduces the risk for getting stuck
if the current best allocation cannot be improved by only
swapping 2 indices.
Change-Id: Ife379757752f0c1ed54af7bd826e0a9390d54267
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added checks in the cascade builder to ensure that scheduled operations
are in the correct order.
Change-Id: Ic1765a6a1cb8335ff222bfe3b2d2e642980967d7
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Fixed a bug due to ResizeBilinear modifying the attributes of a
shared IFM
- The ifm_resampling_mode is now an attribute of an operator rather
than a tensor
- Changed all calls to try_block_config() to use the attribute rather
than recalculating it in multiple places
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4641e9cd6b049bd4186776d98e3e751c5e5bcc06
|
|
Add mypy to pre-commit and clean up all reported errors.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
|
|
- The number of accumulators is doubled in an Ethos-U configuration with
2 cores
- Likewise, for elementwise, depthwise and pooling operations
the IFM buffer depth capacity is doubled
- FindBlock: step the search space depth in multiples of ublock * ncores
Change-Id: I923cc347a2f252876d405ed93095d39181103f81
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added check that horizontal padding is unaffected when applying
graph optimization "optimise_strided_conv".
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I7032a44163e300cdf62cf615b4b10a1417e38eaa
|
|
Fast storage allocator did not always return an optimal
allocation.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: Ic758b6c4a82dc2633c4752b0c204a27ed36f651b
|
|
Fix bug when storing the encoded NPU weight UUID in the
NPU performance estimation.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I92127b0020f12352d923c0c9aa2b6f47e6110764
|
|
- Extend ifm/ofm dimensions explicitly in mean op
This fix a bug when ifm/ofm shape has different dimensions
e.g. IFM=1x19x18x25 axis=2 OFM=1x19x25,
the ofm_shape should be 1x19x1x25, not 1x1x19x25
- Fix wrong weight shape
Change-Id: I269eb71ea56c09deee2aa6c6433d9b2baa98a113
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- Corrected rounding error
- Number of elements depends on ofm format
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I568d660b7571b6e0ffb131211b3a89c8be4b9295
|
|
Update the version of flake8 used in pre-commit to facilitate
adding mypy to pre-commit.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I457dec87b77487ca6f14ff4a679c4cc927b272b0
|
|
- Bump minor release version and add release notes
- Update README and SUPPORTED_OPS versions
Change-Id: Ic14d028483c12d281e69515b25f66346d9a3afeb
Signed-off-by: James Peet <james.peet@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
- Updated the Memory Modes section in OPTIONS.md
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibfd3d2d6e1bf4a070d2af705878a5cc49381ce29
|
|
- The bug is that TransposeConv does not support explicit padding
which is needed in order to combine it with a proceeding Pad op
- The fix is to exclude such combination
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ide03d034dc32b5fc9bcaaf291ab713482223a042
|
|
*Corrected calculation where use of the
_estimate_memory_transfer_efficiency function when calculating the
scaled bandwidth for LUT transfers resulted in a divide by zero error.
Change-Id: I2356e924d9ca2f315ca1988f465f58b13a8fa4c9
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Original weights and encoded NPU weight now report correct size instead
of zero when running vela with --verbose-weights flag
(Code to update the aforementioned attributes was missing)
*Removed print references to unencoded NPU weight size
Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
Reduce memory footprint when using optimization strategy Size
for elementwise operations.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I30380aed587c31adbf7615f74179b4c5da686773
|
|
Signed-off-by: James Peet <james.peet@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4c9acb04a9df2181829e3a98aab840f32ae6458e
|