Age | Commit message (Collapse) | Author |
|
One level deep relative paths (ie ./vela.ini) were treated as the name of a
folder in config_files was ".". They are now treated as relative paths.
The warning message when using an absolute path has also been moved to
to the error message instead for a better user experience.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925
|
|
The argument to the lstrip function is a list of all characters that
should be stripped from the beginning of the string, in any order. To
remove the actual prefix, check if the string starts with the string
instead and then remove that amount of characters. The function
"removeprefix" was added in python3.9 which does exactly this, but
that is not yet available to vela since it supports python 3.7.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39
|
|
- For allocations that have a hard memory limit the Hill Climb allocator
should be given more attempts to find a solution that would fit
- The fix is to use a memory limit when there is a hard constraint, and
a minimum iteration count, reset on every improvement, when there is a soft
constraint
- Added maximum number iterations CLI option
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
|
|
- Problem is due to a divide by zero
- Fix is simply to detect and assign zero. This could also affect
improvement_sram
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
|
|
- The print_performance function that is called when using the
--verbose-performance option crashed with KeyError when no SRAM was
used.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
|
|
- The npu cycles are not correct calculated when only
one weight buffer is used, since weights can not
be fetched in parallel.
- Added new calculation in the single buffer case.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
|
|
Update to the "Vela splitting network into two ethos operators" patch
allowing the CPU pass to be moved last in the pass_list.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2e8a299101e5d65e963327bed7c8d891fff6523e
|
|
- Due to how the graph is traversed, the final pass list contained unnecessary
multiple Ethos-U operators. Functionality wise not a problem but it adds extra
context switching between CPU and NPU.
- By applying sorting rules to the pass list, it is possible to create a more
optimal pass list that reduces the numbers of Ethos-U operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib556f902e1f321b5c50238fada7aa92b9810b27a
|
|
Add directory structure to support third party config files. Config
files should now be placed in an appropriately named directory under
the config_files directory, but can also be accessed by providing its
absolute path to vela --config.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I2fcf52e7b2ddd2c4491dc370c85c0b3937d18062
|
|
- Added support to print per operator sram usage and performance
information
- Added new CLI option --verbose-performance to control this feature
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
|
|
Allow schedule do be used when calculations says zero total improvement
but calculations on the other hand shows there are dram improvement.
When testing on real target, total performance is improvement.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
- Due to that bigger weight buffer sizes are being used, there are use cases
when feature maps are evicted from SRAM, causing the total performance to drop.
- A way to improve this is to limit the memory for those weight buffer ops,
to get the feature maps back to SRAM, and see if total performance is improved.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
|
|
Removing constraint for negative alpha value in ReLu
for int8 and uint8.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Id7a3a30bf5d1f0a591f990bd04cd0dbbad5819c6
|
|
- Added the offset address to the command stream disassembly
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55c6ef59878c90c21d41051c076da6c1f0fa4201
|
|
This reverts commit d2b5510697e7789f5a416f9d80d3cb640eecc092.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ia3043bc9c27fe2f72f3ab2f6f7341b3a9adb4231
|
|
- Cascading a slice operator with read offsets is not
supported by the rolling buffer mechanism causing the
address to get out of range.
- The fix is to prevent ops to be cascaded if they have
read offsets.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iea7f054ac4b5a7dadf905bbe947033247284c27e
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Generate flatbuffer files with relative imports.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.8
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba
|
|
*Added generic function which checks if underlying shape of
FullyConnected operation is 2D and performs shape reduction
*Fully connected operation >2 dimensions now run on NPU if the above
case is satisfied
*constraint_fc_output_2d and rewrite_fully_connected_input refactored
*Added unit test to confirm this functionality
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38
|
|
- This is due to calling range() on a non-integer value which in turn is due
to a change in the behaviour of round() on numpy.float64 values
- The fix is to always force the output of the round() to be an integer and
thereby stop whole number floating point values propagating into the kernel
dimensions which later feed into the range().
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92
|
|
- Modify the operator clone function to also clone resampling mode
attribute.
A previous patch changed the ifm resampling mode to be an attribute of
an operator rather than a tensor but did not modify the operator clone
function to clone the new attribute.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7a2f6103666a0997f657de20ad962e849976b904
|
|
Corrected calculation for used bufferering depth. Before change there
were scenarios when it was set to smaller sizes than needed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I162859ade78487e848510c6a605685e4568c7068
|
|
- Changed comments to docstring on QuantizationParams
- Simplified op type to op name conversion
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2fdf5922cc17944c9bd37917a85fdfe50a1e651d
|
|
- Added optional name attributes to operators and tensors
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3b5d881a7b1043a6ba4b58fff5d7532b271ba536
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added a mechanism that reduces the risk for getting stuck
if the current best allocation cannot be improved by only
swapping 2 indices.
Change-Id: Ife379757752f0c1ed54af7bd826e0a9390d54267
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added checks in the cascade builder to ensure that scheduled operations
are in the correct order.
Change-Id: Ic1765a6a1cb8335ff222bfe3b2d2e642980967d7
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Fixed a bug due to ResizeBilinear modifying the attributes of a
shared IFM
- The ifm_resampling_mode is now an attribute of an operator rather
than a tensor
- Changed all calls to try_block_config() to use the attribute rather
than recalculating it in multiple places
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4641e9cd6b049bd4186776d98e3e751c5e5bcc06
|
|
Add mypy to pre-commit and clean up all reported errors.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
|
|
- The number of accumulators is doubled in an Ethos-U configuration with
2 cores
- Likewise, for elementwise, depthwise and pooling operations
the IFM buffer depth capacity is doubled
- FindBlock: step the search space depth in multiples of ublock * ncores
Change-Id: I923cc347a2f252876d405ed93095d39181103f81
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added check that horizontal padding is unaffected when applying
graph optimization "optimise_strided_conv".
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I7032a44163e300cdf62cf615b4b10a1417e38eaa
|
|
Fast storage allocator did not always return an optimal
allocation.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: Ic758b6c4a82dc2633c4752b0c204a27ed36f651b
|
|
Fix bug when storing the encoded NPU weight UUID in the
NPU performance estimation.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I92127b0020f12352d923c0c9aa2b6f47e6110764
|
|
- Extend ifm/ofm dimensions explicitly in mean op
This fix a bug when ifm/ofm shape has different dimensions
e.g. IFM=1x19x18x25 axis=2 OFM=1x19x25,
the ofm_shape should be 1x19x1x25, not 1x1x19x25
- Fix wrong weight shape
Change-Id: I269eb71ea56c09deee2aa6c6433d9b2baa98a113
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- Corrected rounding error
- Number of elements depends on ofm format
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I568d660b7571b6e0ffb131211b3a89c8be4b9295
|
|
Update the version of flake8 used in pre-commit to facilitate
adding mypy to pre-commit.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I457dec87b77487ca6f14ff4a679c4cc927b272b0
|
|
- The bug is that TransposeConv does not support explicit padding
which is needed in order to combine it with a proceeding Pad op
- The fix is to exclude such combination
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ide03d034dc32b5fc9bcaaf291ab713482223a042
|
|
*Corrected calculation where use of the
_estimate_memory_transfer_efficiency function when calculating the
scaled bandwidth for LUT transfers resulted in a divide by zero error.
Change-Id: I2356e924d9ca2f315ca1988f465f58b13a8fa4c9
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Original weights and encoded NPU weight now report correct size instead
of zero when running vela with --verbose-weights flag
(Code to update the aforementioned attributes was missing)
*Removed print references to unencoded NPU weight size
Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
Reduce memory footprint when using optimization strategy Size
for elementwise operations.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I30380aed587c31adbf7615f74179b4c5da686773
|
|
- Combine two MEAN operator checks for single axis averages into one
- Only apply that check if the single axis is the height dimension
(previously checks were also applied to width averages)
- Rephrase some MEAN operator constraint descriptions
Signed-off-by: James Peet <james.peet@arm.com>
Change-Id: Ie0577f2b99aba1f3d6a4c39f8934eafe3813b736
|
|
Make sure output from subgraph is write protected and
not overwritten by an element wise op.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ie26979913843c62794c5346a315b7089206850e0
|
|
Fixed problem when ofm is produced by different NPU nodes by
making sure that output is always in NHWC format.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I00e55c989d5860499fbaf4f4318661b17b4bda7e
|
|
Ported the improved spilling behaviour from Regor
into Vela. This replaces use_fast_storage_for_feature_maps
with allocate_feature_maps and introduces the class called
FastStorageComponentAllocator.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I34785840c905a79750a62863773015b00fb43387
|
|
This change will allow the subgraph's input tensor
to be reused/overwritten by the output from an elementwise op
if there is only one consumer attached to the input tensor.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I317188af11a5470614770e18dc8973462fd5f21c
|
|
The root cause of this diff is precision errors caused by rounding
several times when performing a resize bilinear upscaling to more than
twice the initial size. This is solved by rewriting the algorithm to
perform nearest neighbour upscaling to the correct size and then
applying one larger average pool instead of several 2x2 pools. Avgpool
with padding is limited to kernel size 8x8, which constraints the
largest possible bilinear upscaling to 8 times the input size.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I846232f309ba26aab6c385e593cbe25b646c6668
|
|
- Issue was due to a previous patch to fix MLBEDSW-5582
- Revert fix for MLBEDSW-5582
commit 849ff81f82c10a68898e5101930b92372bec5565,
- Made new fix for MLBEDSW-5582 that enforce
output tensor from NPU graphs to be in NHWC format.
This information is otherwise lost in the case when
parts of a concatenation are placed in different custom operators
resulting in mismatch bewteen NHWC and NHCWB16.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iab3ba29d348353c854f357836e6aa7c338ae1572
|