Age | Commit message (Collapse) | Author |
|
- Fixed align corners support when converting in to upscale and average
pool. The problem was due to the wrong ratio ifm to ofm size, causing an
scaling factor that was not 2x/4x/8x. Works for uint8, int8 and int16.
- Fixed checking of align corners in supported operators check
- Added additional supported operators check for the size tensor
- Updated and added more supported operators unit tests
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idb78fa9e76ede2c37e8ac6cb1c322154bd156898
|
|
- Minor rework at the register command stream level
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I58495e40efa3a95bdf6febde530f9f73fa8be30b
|
|
If an elemenwise op is part of a cascade, the ifm can not
be overwritten by the ofm.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I1e5f7ee501be17e76684b33c6e86ab8af0f3e61f
|
|
Tensorflow 2.9 contains a bug for int16x8 without biases.
Revert "MLBEDSW-6635: Update to TensorFlow 2.9"
This reverts commit 93f492bae9c4dd16a1f64b851b237263695ee03e.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I366d201ce4134a877d333be2aade546dfcb5d6d7
|
|
Added SHAPE operator to the supported operators report.
Updated the constraints for QUANTIZE and SHAPE operator.
Also fixed RESHAPE consuming statically optimised shape.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I1d964d602d3f361a0f16dae8133197280dd84c48
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.9
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I6bf506ffb85da2d4a57a32198b471513deeaca73
|
|
Added check to see if additional stripe data is needed from producer op
when cascading to make sure the stripes are not overwriting data still
being used. Also changed scheduler to make sure ResizeBilinear always
runs with even stripe height.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328
|
|
Fixed static optimisation of Quantize operator by running unsupported
formats on CPU. Also added support for int16 and corrected the
calculation.
Change-Id: I861c712aa6258dba53fcf4d5dae45d1d416e6141
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Hardswish activation function gets converted to LUT in graph optimizer. The case for it was removed, as it was never called.
Signed-off-by: oliper01 <oliver.perssonbogdanovski@arm.com>
Change-Id: I376e8d7b81489c06b66d4e49f59b207600c0ccce
|
|
Enabled elementwise cascading for binary/single variable IFM operators.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd
|
|
*Quantise op becomes constant if input is known at compile time
*Quantised values calculated if input of op is const and float
*Const inputs to quant op that are int are requantized
Change-Id: Ic94a72a392af709fe6a640d7dacbb5dc2334f16f
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Shape OP value is available at compile time hence
it can be optimised
*Disconnected shape OP at compile time from parent
tensor
*Transformed shape OP tensor into constant
Change-Id: I0a024269e2b592c6146dd72e62d7a41951fb727a
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
- The fast storage allocator is supposed to add all feature maps
that does not fit in SRAM to an evicted list. However, in the
case when conflicting tensors were handled the list was not updated.
-This patch makes sure to update the list correctly.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibeb3b4e4927f22a8206784a478f1ac38bd7f5a87
|
|
- The fast storage allocator only looked at tensor size, giving priority
to larger tensors. The problem with this method is that it does not
consider the actual read/write access of the tensor. So, a smaller
tensor size can cause higher memory transactions than a bigger one.
- The solution is to calculate the read/write access of the tensor and
add that score to the decision when deciding where to place the tensors.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I59eb9bd3a44a0238b576cfd8f09ff27012b99070
|
|
Improved block size selection by favouring larger
block sizes for elementwise operations.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I5b30b358d84fcd672935b863c2154bd8f4ccd928
|
|
Vela was not able to parse config file paths entered with forward
slashes. This patch will make it possible to use both forward and
backslashes when specifying paths.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I0f4cfc16bde5738c73059af6216d2bdc3821c68b
|
|
- Updated release notes and setup.py tag for 3.4
- Regenerated supported ops information
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4ec88544b84cab168cb3e5cbc6bc392b6b3d8a39
|
|
One level deep relative paths (ie ./vela.ini) were treated as the name of a
folder in config_files was ".". They are now treated as relative paths.
The warning message when using an absolute path has also been moved to
to the error message instead for a better user experience.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925
|
|
The argument to the lstrip function is a list of all characters that
should be stripped from the beginning of the string, in any order. To
remove the actual prefix, check if the string starts with the string
instead and then remove that amount of characters. The function
"removeprefix" was added in python3.9 which does exactly this, but
that is not yet available to vela since it supports python 3.7.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39
|
|
- The latest numpy versions require Python 3.8
- This can cause issues if Python 3.7 is installed which is the version that
Vela is tested against
- The fix is to limit the numpy version to those that support Python 3.7
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3a388976d5aa76395ca93202e496640c8de9f6f4
|
|
- For allocations that have a hard memory limit the Hill Climb allocator
should be given more attempts to find a solution that would fit
- The fix is to use a memory limit when there is a hard constraint, and
a minimum iteration count, reset on every improvement, when there is a soft
constraint
- Added maximum number iterations CLI option
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
|
|
- Problem is due to a divide by zero
- Fix is simply to detect and assign zero. This could also affect
improvement_sram
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
|
|
- The print_performance function that is called when using the
--verbose-performance option crashed with KeyError when no SRAM was
used.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
|
|
- The npu cycles are not correct calculated when only
one weight buffer is used, since weights can not
be fetched in parallel.
- Added new calculation in the single buffer case.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
|
|
Update to the "Vela splitting network into two ethos operators" patch
allowing the CPU pass to be moved last in the pass_list.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2e8a299101e5d65e963327bed7c8d891fff6523e
|
|
- Due to how the graph is traversed, the final pass list contained unnecessary
multiple Ethos-U operators. Functionality wise not a problem but it adds extra
context switching between CPU and NPU.
- By applying sorting rules to the pass list, it is possible to create a more
optimal pass list that reduces the numbers of Ethos-U operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib556f902e1f321b5c50238fada7aa92b9810b27a
|
|
Add directory structure to support third party config files. Config
files should now be placed in an appropriately named directory under
the config_files directory, but can also be accessed by providing its
absolute path to vela --config.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I2fcf52e7b2ddd2c4491dc370c85c0b3937d18062
|
|
- Added support to print per operator sram usage and performance
information
- Added new CLI option --verbose-performance to control this feature
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
|
|
Allow schedule do be used when calculations says zero total improvement
but calculations on the other hand shows there are dram improvement.
When testing on real target, total performance is improvement.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
- Due to that bigger weight buffer sizes are being used, there are use cases
when feature maps are evicted from SRAM, causing the total performance to drop.
- A way to improve this is to limit the memory for those weight buffer ops,
to get the feature maps back to SRAM, and see if total performance is improved.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
|
|
Removing constraint for negative alpha value in ReLu
for int8 and uint8.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Id7a3a30bf5d1f0a591f990bd04cd0dbbad5819c6
|
|
This commit downgrades the required Python version
to 3.7 from 3.8.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I07057908b97bcd94663f001474d877ba41411ae1
|
|
- Added the offset address to the command stream disassembly
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55c6ef59878c90c21d41051c076da6c1f0fa4201
|
|
This reverts commit d2b5510697e7789f5a416f9d80d3cb640eecc092.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ia3043bc9c27fe2f72f3ab2f6f7341b3a9adb4231
|
|
- Cascading a slice operator with read offsets is not
supported by the rolling buffer mechanism causing the
address to get out of range.
- The fix is to prevent ops to be cascaded if they have
read offsets.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iea7f054ac4b5a7dadf905bbe947033247284c27e
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Generate flatbuffer files with relative imports.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.8
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba
|
|
*Added generic function which checks if underlying shape of
FullyConnected operation is 2D and performs shape reduction
*Fully connected operation >2 dimensions now run on NPU if the above
case is satisfied
*constraint_fc_output_2d and rewrite_fully_connected_input refactored
*Added unit test to confirm this functionality
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38
|
|
- This is due to calling range() on a non-integer value which in turn is due
to a change in the behaviour of round() on numpy.float64 values
- The fix is to always force the output of the round() to be an integer and
thereby stop whole number floating point values propagating into the kernel
dimensions which later feed into the range().
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92
|
|
- Modify the operator clone function to also clone resampling mode
attribute.
A previous patch changed the ifm resampling mode to be an attribute of
an operator rather than a tensor but did not modify the operator clone
function to clone the new attribute.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7a2f6103666a0997f657de20ad962e849976b904
|
|
Corrected calculation for used bufferering depth. Before change there
were scenarios when it was set to smaller sizes than needed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I162859ade78487e848510c6a605685e4568c7068
|
|
Removed numpy version limit.
Change-Id: I01e4d27754fe037be227d7329c4e1a8f1cea6315
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- Changed comments to docstring on QuantizationParams
- Simplified op type to op name conversion
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2fdf5922cc17944c9bd37917a85fdfe50a1e651d
|
|
- Added optional name attributes to operators and tensors
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3b5d881a7b1043a6ba4b58fff5d7532b271ba536
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added a mechanism that reduces the risk for getting stuck
if the current best allocation cannot be improved by only
swapping 2 indices.
Change-Id: Ife379757752f0c1ed54af7bd826e0a9390d54267
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added checks in the cascade builder to ensure that scheduled operations
are in the correct order.
Change-Id: Ic1765a6a1cb8335ff222bfe3b2d2e642980967d7
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|