Age | Commit message (Collapse) | Author |
|
Added check to see if additional stripe data is needed from producer op
when cascading to make sure the stripes are not overwriting data still
being used. Also changed scheduler to make sure ResizeBilinear always
runs with even stripe height.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328
|
|
Enabled elementwise cascading for binary/single variable IFM operators.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
The output diff is caused by not including the kernel dilation when
calculating the bottom padding to be used on the last h_stripe. This
only shows up when using dedicated_sram since shared_sram does not split
into multiple h_stripes and thus uses the padding specified by the skirt
instead.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f643748b153004d65be2124c0ac6c9d21cd803f
|
|
This commit moves a piece of code back into a loop
but with a flag to make sure that the code is only
executed once per loop rather than potentially every
iteration. This solves the issue of an output diff
because of LUT DMAs occurring before weight DMAs.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I3e597f0a955154af3d87febacea1b3920d53b7c2
|
|
Added basic TOSA support, enabling Vela to
read and compile a .tosa file corresponding to
CONV2D + Rescale + Clamp, and writing it to an
optimized .tflite file.
The optimized .tflite file, will in this case, hold
a commandstream where the Rescale and Clamp has been
fused into the CONV2D.
The optimized tflite file is not output from Vela.
-Added support to read .tosa file into Vela
internal structure.
- Added tosa_reader.py, tosa_mapper.py and
helper files stored under tosa/
- Support for this limited to ~10 ops
-Added reader_util.py for functions common
for TOSA and TFLite
-Added tosa_graph_optimiser.py
-Added support to fuse Rescale into convolution
-Modified handling for padding
-Added support to fuse Clamp to previous op
-Added graph_optimiser_util.py
-Moved functions common for TOSA/TFLite graph
optimization to this file.
-Renamed graph_optimiser.py to tflite_graph_optmiser.py
-Added separate tosa_supported_operators.py
-Added supported_operator_util.py
-For functions in common for TOSA/TFLite
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic3c540504ec8c5eb4771397fdc6882050ecf33ab
|
|
Fixed a bug where a DMA command for the activation LUT would be issued
for every depth-slice of an operator. This caused multiple
unnecessary DMA commands.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I9c291692d8002f05656bb88214836ab389a56cdb
|
|
- Deepspeech reuses identical weights and biases throughout
the network. Since biases are now interleaved with weights
there is a scaling issue when the ifm scales differ between
operations using the same weight and scale tensor.
- This commit uses interleaved weights/scales on their first use
but separates scales to source memory on subsequent use (if
the ifm scale is different).
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I7aae163438160a919cae04e235966e75355a6148
|
|
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
|
|
Signed-off-by: Henrik G Olsson <henrik.olsson@arm.com>
Change-Id: I0e6bb46b7b91ed10f5bda34fba66d8b714560f47
|
|
IFM box calculation was wrong because 2 variables were
referencing/updating the same list.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: Ibed4e94c474682e14a6dd898029f14af11c9479a
|
|
- Added full support for PAD operator
- Hardware padding is still used whenever possible
- Bug fix Pad followed by max pool if IFM contains negative values
Change-Id: Ifc64d1943737d94466f5e2821009dab12a49a965
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Fix avoid cascading for spilling.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: If86189bd1566eaa14387dfc2c02e3324ea6c184e
|
|
Removed SplitSliceRead from subgraph during
graph optimisation.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I9315d4c2a6767828dd2b4e66823d73b10ebee99c
|
|
-Removed ConcatSliceWrite from the optimised graph.
Always executed as avgpool, which is equivalent with
before the patch.
-Added copy op to enable more removal of reshapes.
Sg input/outputs need to remain. When Reshape input and
outut, are sg input/outputs a copy op is needed to
be inserted, in order to remove the reshape.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Id7be9966673ae34499e8518a5544104493fe326b
|
|
Fixed two issues:
- Cmd stream can be out of order in Ifmstreaming
- In H32, LUT could be corrupted if blockdep is not 0
Change-Id: I2edd84429b93d83b2794f14937ce3fd279fd4a24
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
-Removed reshapes in the original graph
-Removed the addition of reshapes to the
optimized graph
-Reshapes with different ifm/ofm quantisation will remain
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I94862be53dac0d7434815e2aee5ca678228495f8
|
|
This reverts commit df0a5905177f3a1b836076bc3f9f39b2e86f1794.
Reason for revert: <INSERT REASONING HERE>
Change-Id: I891c66fb29db9d25e942947e8d1c29a10610de51
|
|
This reverts commit bf31d647dc5df47410ee577b12427ddf076d816b.
Reason for revert: <INSERT REASONING HERE>
Change-Id: I7b6c585b7658f94dbaa916c2b6bfe9fb463b8d37
|
|
Add 4D shape class for op Ifm/ofm shapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic0a98da9d2f9d085605e39a9ab5a26bad6e702a3
|
|
Add ifm/ofm shapes to op
Changed to rely on these shapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I571535a1dcadc2bdb04a3c727a8e1c49703b174d
|
|
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I2e8384a044ee5458bc8c92562153b6383de5f17a
|
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
For IFM streamed cascades bias tensors are read several times.
Moves these tensors to fast storage and add DMA commands.
Change-Id: I630f6275986c1b5e3f126c925b11e22500fb1128
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Removed the CLI opt ifm-ofm-overlap
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I23faa0d10c3e71972c543e22e8155086fce73556
|
|
- Incorrect length check in high level command stream generator
- Improved tensor names related to LUT based operations
Change-Id: Ib8844a35a986e2dbef095df23f143f4633b255f9
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- op.type is now an enum instead of a string
- Removed unused operator codes
- Refactored some attributes like npu_block_type, fused_activation_function
- Refactored operator index calculation
- Refactored a number of operator sets
Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I3c3ed73a6db39615ddf5987dc5696b6b09682be0
|
|
Split mapping to tensor
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic143f3b4d37f6904edd8f119eff1d108f70b5026
|
|
- Support for more than one 256-byte LUT in SHRAM
- No DMA is performed for a LUT that is already located in SHRAM
- Added MemArea.Shram, used for LUT, to avoid false address collision
asserts during SRAM tensor allocation
- Added read access to LUT in memory access calculation
Change-Id: If4d1eded5ed029d253f4f5efb2d80495fc3eac99
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I566abd5a1ffc367c6b9b8f37d5a26b61d27e840b
|
|
Added graph rewrite of Softmax for int16.
Change-Id: Id7885af6056a23e8b8362fb61ae94283251eb398
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I39cff126dda89d71426ab731427ca1d64d02590d
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I53d9d56acee57cff208dccb4822c1f1a461c416d
|
|
- No functional change
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I5ab1198b9d092cd041fa9b85b2dee9900d299bfc
|
|
Fixed a coordinate issue which caused the compiler to crash when
cascading upscaling operators such as ResizeBilinear.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I982863573b0e5829e6d0c255dbbc308cb332a37a
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ib8d66f8b3c0467966165c1b53aeb7da7c8764c89
|
|
If same weight tensor was used with different block configs,
errors would occur.
Fixed by always cloning weight tensors, using a global weight
compression cache and modifying the linear allocator to
detect multiple usage of same weight compression.
Change-Id: I91ca59176e1c59c66e0ac7a4227f2b5f0b47053f
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Write the constant scalars into flash. In case it's Dram
or OffChipFlash, DMA the scalars from flash to sram.
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I42300a05dfe968d623b8aec8549644549e0f54b5
|
|
Also updated README.md
Change-Id: I118309c61f4d00e8508d6b888c606995490fba39
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
Use pre-commit framework [1] to run black and flake8 before the commit.
black and flake8 are managed by the pre-commit framework and they can be
run manually by the user using `pre-commit run` command.
Fix the code base with the help of black and flake8.
Fix import statements according to PEP8 guidelines [1]
Both tools have the following settings (specified in the pre-commit
configuration file):
* line length: 120 characters
* directory to exclude: ethosu/vela/tflite/ and ethosu/vela/ethos_u55_regs
Updated README.md on how to install pre-commit and how to run sanity checks.
Pipenv files have been updated including new dependencies for pre-commit.
[1]: https://www.python.org/dev/peps/pep-0008/#imports
[2]: https://github.com/pre-commit/pre-commit
Change-Id: I304d9fffdf019d390ffa396a529c8a7c2437f63d
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
- Added modules ethosu.vela and ethosu.mlw_codec.
- Added README and various configuration files.
Change-Id: I3690f8c8f5966306ecddaeb2793c30ca9c6e2eee
|