Age | Commit message (Collapse) | Author |
|
Added LeakyRelu to supported activation ops.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Icca27730946d02ec16159f988782567be716b594
|
|
Setting bias tensor dtype to DataType.int32 solves rounding issues for
RB HPC int16.
Removing the input data type check also solves the issue of resize
nearest neighbor int16 ops incorrectly getting placed on the CPU.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Iee352bcb78e581c0cde3c203dfbe866f1f6fae18
|
|
- Added support for Resize Bilinear with half pixel centers for int8 and
uint8.
- Utilizes the new "TILE" padding mode.
- Utilizes ofm stride multipliers and modified tile base offsets to
write OFMs interleaved.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I37fa77c022a368f05fda0ead75d8696c9205f833
|
|
The issue was that the AveragePool in these test cases were
translated to DepthwiseConv2DBias and int16 convolutions
always runs with reduced scale. Fixed so that reduced scale
is not used in this case.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ice956eabbb37c8aa1991464870006971c6ecec43
|
|
Fixed PReLU optimisation to LeakyReLU with negative alpha.
Added optimisation of LeakyReLU to ReLU when alpha is zero.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I5e66f79b29908fffd95b6115799021138ebb401a
|
|
Allow sparse writing of OFM by multiplying H/W/C of the OFM with the
values of ofm_stride_multiplier
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I65d742ad36ad3154e9914cdd22e2da928ad1f095
|
|
Fixed LeakyReLU regressions for int16 due to scaling introduced
for handling negative alpha.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I84a494fedf54bd4b47c4632645ded7d6cda445f8
|
|
Removed duplicate code and moved constraint to
the correct file.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2da3c5b88e1af351751c481217b8183b5948f0f8
|
|
Remove Pipfile support due to lack of testing and maintenance.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I93786cdbf22bfa2130601291d23cead177bd8f81
|
|
Added support for int16 LeakyRelu for negative alpha and alpha
greater than one.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I7f522ebfe014786d0a1d96172e75c7d9bdd76921
|
|
Implement new padding mode which pads two edges of the IFM with the
current values of those edges
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I8523e0cabdac80b48710703859003e33050cc150
|
|
Changed acc type from int16 to int32. This will solve
saturation problems and the constraint added in
commit "MLBEDSW-5029: Output diff for Mean op"
can be removed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I05ec8835b43313b1a264d61a2b147fa62da123fe
|
|
- Ethos-U65-512 requires the input to REDUCE_SUM to use NHWC format
- Updated the graph optimiser format check to cover this condition
- Added a exception check to the backend of the compiler to verify that
this condition is not been violated by the external api or Vela internals
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2f1fabcbd264daf77d5822349d855a3a32b12c64
|
|
Added optimisations for PReLU when the alpha values allows it.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iff9124e691663ee495379f89900e7c35dbc5f948
|
|
Fixed three test cases causing output diff compared to
the reference kernel for the Mean operator.
- If there is a possibility that the accumulator could saturate
the Mean op must run CPU
- Use correct rounding for the bias term
- If a Reshape op is followed by a Mean op, push the Reshape op
to the CPU since this cannot be handled by the NPU
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I734465730372105821a5e2f73a6a125b9eb7d7f4
|
|
Dump the current per-layer performance estimation information
that appears on the terminal to a CSV file.
Change-Id: I00e94168704be8c3c674c8779fb807ed28607ccd
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
Added PReLU support in graph optimiser.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I3a188675e3edcdf0b4a4bfcdd134fda0bf8a560f
|
|
- The optimisation of the SHAPE operator resulted in a divide by zero
when printing the percentage of npu/cpu operators in the final output
summary
- The fix is to detect when there are no operators in the output tflite
and then avoid the division
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I5bd2342335e9468a8b7028e6e2291a03960e2e55
|
|
- Updated SUPPORT_OPERATORS.md with Resize operators
- Updated release notes with the main changes and bug fixes
- Updated version numbers
Signed-off-by: oliper01 <oliver.perssonbogdanovski@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: If25b5fab708098bc3e7eb243924b55a50f148c3a
|
|
Mypy and pylint was previously not included in TESTING.md.
Also, installation of pre-commit, pytest and pytest-cov outside
of a virtual environment was not detailed.
CONTRIBUTIONS.md had an old Python version listed in the conding standard section.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Idff9454083e41d719e6d75e90cb2be2861500eb9
|
|
Remove resize ops completely from being cascaded since there
are corner cases which are not currently handled.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9923f8e119af7bdc0e93b0e69b521b399e0629af
|
|
Output diffs were found to be caused by odd input stripe heights,
despite the input being an upscaling operator.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ia3791d815250364cfe7a38c3ed0e30768d64ca08
|
|
- When compiling for shared SRAM the old scheduler has an option so
that it produces less SRAM than what the new scheduler manages to
produce. The old scheduler was able to creates more/longer cascades.
In order to improve the new scheduler, the following has been
implemented:
- Take persistent IFM's into account when creating the min schedule.
- Choose longer cascades when it is possible to reduce the total
SRAM usage compared to using shorter cascades.
- Updated calculation for estimated SRAM usage for elementwise ops.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I209bbf2d94425e4f6aacb1d151b3b2aa65c0870b
|
|
- The compiler will assert when compiling a faulty concat op.
In the reported use case, there were 3 inputs with shape 1x1x2
but the output shape was 1x1x2 (expected to be 1x1x6)
- The solution is to add constraints to the concat operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I94a505c51a9fd54d1aa92531a0415031db52378a
|
|
There is an issue with using NumPy 1.21.4 or above in setup.py with
python 3.7. Restriction can most likely be removed when upgrading to
python 3.8.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I9f826201d68bb5ab61f5bf76c7796442d34447b9
|
|
Limit relative cost to 1 for elementwise operations since increasing
block size when the full ofm already fits gives no additional benefits.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6128f6346834fd916efa59adbe07a069dbda0ae
|
|
With the errors caused by the previous TensorFlow 2.9 update
being fixed, we can proceed with the upgrade.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ie1f025e8d984efaebc68b8d051126d49bee6b2b8
|
|
- Changed ResizeBilinear to support ResizeNearestNeighbor as well for
1x1 IFM, IFM equal OFM, and non-align corners
- Added support for ResizeNearestNeighbor with align corners by
converting to a DepthwiseConv
- Updated supported operator unit tests
- Added is_resize() helper function and some associated refactoring
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Id5bdf2a25e8aa6a4f28b7236250abf768141ce37
|
|
- Fixed align corners support when converting in to upscale and average
pool. The problem was due to the wrong ratio ifm to ofm size, causing an
scaling factor that was not 2x/4x/8x. Works for uint8, int8 and int16.
- Fixed checking of align corners in supported operators check
- Added additional supported operators check for the size tensor
- Updated and added more supported operators unit tests
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idb78fa9e76ede2c37e8ac6cb1c322154bd156898
|
|
- Minor rework at the register command stream level
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I58495e40efa3a95bdf6febde530f9f73fa8be30b
|
|
If an elemenwise op is part of a cascade, the ifm can not
be overwritten by the ofm.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I1e5f7ee501be17e76684b33c6e86ab8af0f3e61f
|
|
Tensorflow 2.9 contains a bug for int16x8 without biases.
Revert "MLBEDSW-6635: Update to TensorFlow 2.9"
This reverts commit 93f492bae9c4dd16a1f64b851b237263695ee03e.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I366d201ce4134a877d333be2aade546dfcb5d6d7
|
|
Added SHAPE operator to the supported operators report.
Updated the constraints for QUANTIZE and SHAPE operator.
Also fixed RESHAPE consuming statically optimised shape.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I1d964d602d3f361a0f16dae8133197280dd84c48
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.9
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I6bf506ffb85da2d4a57a32198b471513deeaca73
|
|
Added check to see if additional stripe data is needed from producer op
when cascading to make sure the stripes are not overwriting data still
being used. Also changed scheduler to make sure ResizeBilinear always
runs with even stripe height.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328
|
|
Fixed static optimisation of Quantize operator by running unsupported
formats on CPU. Also added support for int16 and corrected the
calculation.
Change-Id: I861c712aa6258dba53fcf4d5dae45d1d416e6141
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Hardswish activation function gets converted to LUT in graph optimizer. The case for it was removed, as it was never called.
Signed-off-by: oliper01 <oliver.perssonbogdanovski@arm.com>
Change-Id: I376e8d7b81489c06b66d4e49f59b207600c0ccce
|
|
Enabled elementwise cascading for binary/single variable IFM operators.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd
|
|
*Quantise op becomes constant if input is known at compile time
*Quantised values calculated if input of op is const and float
*Const inputs to quant op that are int are requantized
Change-Id: Ic94a72a392af709fe6a640d7dacbb5dc2334f16f
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Shape OP value is available at compile time hence
it can be optimised
*Disconnected shape OP at compile time from parent
tensor
*Transformed shape OP tensor into constant
Change-Id: I0a024269e2b592c6146dd72e62d7a41951fb727a
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
- The fast storage allocator is supposed to add all feature maps
that does not fit in SRAM to an evicted list. However, in the
case when conflicting tensors were handled the list was not updated.
-This patch makes sure to update the list correctly.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibeb3b4e4927f22a8206784a478f1ac38bd7f5a87
|
|
- The fast storage allocator only looked at tensor size, giving priority
to larger tensors. The problem with this method is that it does not
consider the actual read/write access of the tensor. So, a smaller
tensor size can cause higher memory transactions than a bigger one.
- The solution is to calculate the read/write access of the tensor and
add that score to the decision when deciding where to place the tensors.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I59eb9bd3a44a0238b576cfd8f09ff27012b99070
|
|
Improved block size selection by favouring larger
block sizes for elementwise operations.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I5b30b358d84fcd672935b863c2154bd8f4ccd928
|
|
Vela was not able to parse config file paths entered with forward
slashes. This patch will make it possible to use both forward and
backslashes when specifying paths.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I0f4cfc16bde5738c73059af6216d2bdc3821c68b
|
|
- Updated release notes and setup.py tag for 3.4
- Regenerated supported ops information
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4ec88544b84cab168cb3e5cbc6bc392b6b3d8a39
|
|
One level deep relative paths (ie ./vela.ini) were treated as the name of a
folder in config_files was ".". They are now treated as relative paths.
The warning message when using an absolute path has also been moved to
to the error message instead for a better user experience.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925
|
|
The argument to the lstrip function is a list of all characters that
should be stripped from the beginning of the string, in any order. To
remove the actual prefix, check if the string starts with the string
instead and then remove that amount of characters. The function
"removeprefix" was added in python3.9 which does exactly this, but
that is not yet available to vela since it supports python 3.7.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39
|
|
- The latest numpy versions require Python 3.8
- This can cause issues if Python 3.7 is installed which is the version that
Vela is tested against
- The fix is to limit the numpy version to those that support Python 3.7
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3a388976d5aa76395ca93202e496640c8de9f6f4
|
|
- For allocations that have a hard memory limit the Hill Climb allocator
should be given more attempts to find a solution that would fit
- The fix is to use a memory limit when there is a hard constraint, and
a minimum iteration count, reset on every improvement, when there is a soft
constraint
- Added maximum number iterations CLI option
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
|
|
- Problem is due to a divide by zero
- Fix is simply to detect and assign zero. This could also affect
improvement_sram
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
|