Age | Commit message (Collapse) | Author |
|
Remove resize ops completely from being cascaded since there
are corner cases which are not currently handled.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9923f8e119af7bdc0e93b0e69b521b399e0629af
|
|
Output diffs were found to be caused by odd input stripe heights,
despite the input being an upscaling operator.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ia3791d815250364cfe7a38c3ed0e30768d64ca08
|
|
- When compiling for shared SRAM the old scheduler has an option so
that it produces less SRAM than what the new scheduler manages to
produce. The old scheduler was able to creates more/longer cascades.
In order to improve the new scheduler, the following has been
implemented:
- Take persistent IFM's into account when creating the min schedule.
- Choose longer cascades when it is possible to reduce the total
SRAM usage compared to using shorter cascades.
- Updated calculation for estimated SRAM usage for elementwise ops.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I209bbf2d94425e4f6aacb1d151b3b2aa65c0870b
|
|
- The compiler will assert when compiling a faulty concat op.
In the reported use case, there were 3 inputs with shape 1x1x2
but the output shape was 1x1x2 (expected to be 1x1x6)
- The solution is to add constraints to the concat operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I94a505c51a9fd54d1aa92531a0415031db52378a
|
|
There is an issue with using NumPy 1.21.4 or above in setup.py with
python 3.7. Restriction can most likely be removed when upgrading to
python 3.8.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I9f826201d68bb5ab61f5bf76c7796442d34447b9
|
|
Limit relative cost to 1 for elementwise operations since increasing
block size when the full ofm already fits gives no additional benefits.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6128f6346834fd916efa59adbe07a069dbda0ae
|
|
With the errors caused by the previous TensorFlow 2.9 update
being fixed, we can proceed with the upgrade.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ie1f025e8d984efaebc68b8d051126d49bee6b2b8
|
|
- Changed ResizeBilinear to support ResizeNearestNeighbor as well for
1x1 IFM, IFM equal OFM, and non-align corners
- Added support for ResizeNearestNeighbor with align corners by
converting to a DepthwiseConv
- Updated supported operator unit tests
- Added is_resize() helper function and some associated refactoring
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Id5bdf2a25e8aa6a4f28b7236250abf768141ce37
|
|
- Fixed align corners support when converting in to upscale and average
pool. The problem was due to the wrong ratio ifm to ofm size, causing an
scaling factor that was not 2x/4x/8x. Works for uint8, int8 and int16.
- Fixed checking of align corners in supported operators check
- Added additional supported operators check for the size tensor
- Updated and added more supported operators unit tests
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idb78fa9e76ede2c37e8ac6cb1c322154bd156898
|
|
- Minor rework at the register command stream level
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I58495e40efa3a95bdf6febde530f9f73fa8be30b
|
|
If an elemenwise op is part of a cascade, the ifm can not
be overwritten by the ofm.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I1e5f7ee501be17e76684b33c6e86ab8af0f3e61f
|
|
Tensorflow 2.9 contains a bug for int16x8 without biases.
Revert "MLBEDSW-6635: Update to TensorFlow 2.9"
This reverts commit 93f492bae9c4dd16a1f64b851b237263695ee03e.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I366d201ce4134a877d333be2aade546dfcb5d6d7
|
|
Added SHAPE operator to the supported operators report.
Updated the constraints for QUANTIZE and SHAPE operator.
Also fixed RESHAPE consuming statically optimised shape.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I1d964d602d3f361a0f16dae8133197280dd84c48
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.9
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I6bf506ffb85da2d4a57a32198b471513deeaca73
|
|
Added check to see if additional stripe data is needed from producer op
when cascading to make sure the stripes are not overwriting data still
being used. Also changed scheduler to make sure ResizeBilinear always
runs with even stripe height.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328
|
|
Fixed static optimisation of Quantize operator by running unsupported
formats on CPU. Also added support for int16 and corrected the
calculation.
Change-Id: I861c712aa6258dba53fcf4d5dae45d1d416e6141
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Hardswish activation function gets converted to LUT in graph optimizer. The case for it was removed, as it was never called.
Signed-off-by: oliper01 <oliver.perssonbogdanovski@arm.com>
Change-Id: I376e8d7b81489c06b66d4e49f59b207600c0ccce
|
|
Enabled elementwise cascading for binary/single variable IFM operators.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd
|
|
*Quantise op becomes constant if input is known at compile time
*Quantised values calculated if input of op is const and float
*Const inputs to quant op that are int are requantized
Change-Id: Ic94a72a392af709fe6a640d7dacbb5dc2334f16f
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Shape OP value is available at compile time hence
it can be optimised
*Disconnected shape OP at compile time from parent
tensor
*Transformed shape OP tensor into constant
Change-Id: I0a024269e2b592c6146dd72e62d7a41951fb727a
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
- The fast storage allocator is supposed to add all feature maps
that does not fit in SRAM to an evicted list. However, in the
case when conflicting tensors were handled the list was not updated.
-This patch makes sure to update the list correctly.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibeb3b4e4927f22a8206784a478f1ac38bd7f5a87
|
|
- The fast storage allocator only looked at tensor size, giving priority
to larger tensors. The problem with this method is that it does not
consider the actual read/write access of the tensor. So, a smaller
tensor size can cause higher memory transactions than a bigger one.
- The solution is to calculate the read/write access of the tensor and
add that score to the decision when deciding where to place the tensors.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I59eb9bd3a44a0238b576cfd8f09ff27012b99070
|
|
Improved block size selection by favouring larger
block sizes for elementwise operations.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I5b30b358d84fcd672935b863c2154bd8f4ccd928
|
|
Vela was not able to parse config file paths entered with forward
slashes. This patch will make it possible to use both forward and
backslashes when specifying paths.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I0f4cfc16bde5738c73059af6216d2bdc3821c68b
|
|
- Updated release notes and setup.py tag for 3.4
- Regenerated supported ops information
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4ec88544b84cab168cb3e5cbc6bc392b6b3d8a39
|
|
One level deep relative paths (ie ./vela.ini) were treated as the name of a
folder in config_files was ".". They are now treated as relative paths.
The warning message when using an absolute path has also been moved to
to the error message instead for a better user experience.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925
|
|
The argument to the lstrip function is a list of all characters that
should be stripped from the beginning of the string, in any order. To
remove the actual prefix, check if the string starts with the string
instead and then remove that amount of characters. The function
"removeprefix" was added in python3.9 which does exactly this, but
that is not yet available to vela since it supports python 3.7.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39
|
|
- The latest numpy versions require Python 3.8
- This can cause issues if Python 3.7 is installed which is the version that
Vela is tested against
- The fix is to limit the numpy version to those that support Python 3.7
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3a388976d5aa76395ca93202e496640c8de9f6f4
|
|
- For allocations that have a hard memory limit the Hill Climb allocator
should be given more attempts to find a solution that would fit
- The fix is to use a memory limit when there is a hard constraint, and
a minimum iteration count, reset on every improvement, when there is a soft
constraint
- Added maximum number iterations CLI option
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
|
|
- Problem is due to a divide by zero
- Fix is simply to detect and assign zero. This could also affect
improvement_sram
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
|
|
- The print_performance function that is called when using the
--verbose-performance option crashed with KeyError when no SRAM was
used.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
|
|
- The npu cycles are not correct calculated when only
one weight buffer is used, since weights can not
be fetched in parallel.
- Added new calculation in the single buffer case.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
|
|
Update to the "Vela splitting network into two ethos operators" patch
allowing the CPU pass to be moved last in the pass_list.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2e8a299101e5d65e963327bed7c8d891fff6523e
|
|
- Due to how the graph is traversed, the final pass list contained unnecessary
multiple Ethos-U operators. Functionality wise not a problem but it adds extra
context switching between CPU and NPU.
- By applying sorting rules to the pass list, it is possible to create a more
optimal pass list that reduces the numbers of Ethos-U operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib556f902e1f321b5c50238fada7aa92b9810b27a
|
|
Add directory structure to support third party config files. Config
files should now be placed in an appropriately named directory under
the config_files directory, but can also be accessed by providing its
absolute path to vela --config.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I2fcf52e7b2ddd2c4491dc370c85c0b3937d18062
|
|
- Added support to print per operator sram usage and performance
information
- Added new CLI option --verbose-performance to control this feature
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
|
|
Allow schedule do be used when calculations says zero total improvement
but calculations on the other hand shows there are dram improvement.
When testing on real target, total performance is improvement.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
- Due to that bigger weight buffer sizes are being used, there are use cases
when feature maps are evicted from SRAM, causing the total performance to drop.
- A way to improve this is to limit the memory for those weight buffer ops,
to get the feature maps back to SRAM, and see if total performance is improved.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
|
|
Removing constraint for negative alpha value in ReLu
for int8 and uint8.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Id7a3a30bf5d1f0a591f990bd04cd0dbbad5819c6
|
|
This commit downgrades the required Python version
to 3.7 from 3.8.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I07057908b97bcd94663f001474d877ba41411ae1
|
|
- Added the offset address to the command stream disassembly
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55c6ef59878c90c21d41051c076da6c1f0fa4201
|
|
This reverts commit d2b5510697e7789f5a416f9d80d3cb640eecc092.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ia3043bc9c27fe2f72f3ab2f6f7341b3a9adb4231
|
|
- Cascading a slice operator with read offsets is not
supported by the rolling buffer mechanism causing the
address to get out of range.
- The fix is to prevent ops to be cascaded if they have
read offsets.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iea7f054ac4b5a7dadf905bbe947033247284c27e
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Generate flatbuffer files with relative imports.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.8
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba
|
|
*Added generic function which checks if underlying shape of
FullyConnected operation is 2D and performs shape reduction
*Fully connected operation >2 dimensions now run on NPU if the above
case is satisfied
*constraint_fc_output_2d and rewrite_fully_connected_input refactored
*Added unit test to confirm this functionality
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38
|
|
- This is due to calling range() on a non-integer value which in turn is due
to a change in the behaviour of round() on numpy.float64 values
- The fix is to always force the output of the round() to be an integer and
thereby stop whole number floating point values propagating into the kernel
dimensions which later feed into the range().
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92
|
|
- Modify the operator clone function to also clone resampling mode
attribute.
A previous patch changed the ifm resampling mode to be an attribute of
an operator rather than a tensor but did not modify the operator clone
function to clone the new attribute.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7a2f6103666a0997f657de20ad962e849976b904
|