Age | Commit message (Collapse) | Author |
|
Add --verbose-progress CLI option used to enable printing progress
information in the compiler driver and scheduler.
Change-Id: I99ac8c6a654e60391d5c11e28b89250405daa53a
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
- The array allocated in get_temporal_memory_usage is too small
so the first error is that not all LiveRange elements are added
to the temporal mem usage. The second error happens due to that
use_fast_storage_for_feature_maps is correctly trying to update
the temporal mem usage array but an assert happens due to out of
bounds. The array is too small since the LiveRangeClass is
reporting the wrong end time because of some inconsistencies in
how the mark usage is done for subgraph tensors.
- The fix is to mark the tensors with the current_time value.
Also changed so that tenors are marked consistently in both
extract functions. This means that the end time value to use
in get_temporal_memory_usage is the current_time + 1.
- Also made a small update to avoid updating current_time twice
when handling subgraphs.
Change-Id: Ib7e3681e370e097e433acb235740dfd69fa3ce8b
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Added int8 and int16 UNIDIRECTIONAL_SEQUENCE_LSTM support.
The implementation does not include support for:
* CIFG
* Peephole
* Projection
* Normalisation
This change also:
* Removed unused Op.BlockLSTM operation type.
* Removed the only one consumer limitation on putting the SplitSliceRead
on the tensor consumer(s), if all consumers fullfills the requirements
* Added Op.VariableTensorWrite as a Operation.memory_function to make
sure writes to variable tensors:
* Always use linear mode
* Are not moved to fast scratch
* Are not fused with other elementwise operation tensor ranges
Change-Id: Ief831738924ac3d1f2ba6d41f10bd6dc969911f3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- Reshape ops can be bypassed and there is no need to process them by the NPU.
There are use cases when the IFM must be preserved so a memcpy is needed.
This is implemented by an AvgPool.
- In order to reduce the cost of the AvgPool the IFM can be copied by DMA.
This is faster and also it can be turned into a real NOP in cases where
the IFM and the OFM can use the same memory space.
- Added new memcpy op. Only NHWC format supported since DMA can not change
the format on the fly.
- Allow ofm to reuse ifm for memcpy op
- Make sure the DMA copy size is 16 byte aligned
Change-Id: I3605a48d47646ff60d2bb3644dd3a23f872235a7
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- There is a problem with large networks containing many NPU
subgraphs. The scheduling takes too long time since the snapshot
memory calculation is always doing a complete update for the
full graph.
- A complete run is needed in the end to calculate all the
time indexes correctly. However, when scheduling a NPU subgraph
it is enough to extract live ranges for the current schedule
and its operators.
Change-Id: Iccb7d6728119c1428ad0b45a2ac34e92158c15bd
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The problem was that networks with resource variables have
not been thought of. The major problem was the graph traversal
where these ops were not visited resulting in an empty subgraph
that resulted in the crash.
- Fixed the problem by attaching virtual tensors to the ops simulating
subgraph output. These tensors are only used to get the graph
traversal to work.
- Fixed serializing of attribute container and shared_name
- Fixed subgraph index for operator CallOnce
- All resource variable ops are pushed to the CPU
Change-Id: I815f9c81baf7a3fbb686e895980b462f58208b6e
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Added support for Variable Tensor, including offline planning.
Change-Id: I39f33fee207f1f1a4574a0f53f7377eec8709e15
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- Update copyright notices to use SPDX format and add OSS mail as contact.
- Update years on files where it had been missed.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f
|
|
Fixed by adding an operation to copy the statically optimised
data to the subgraph output.
Change-Id: Ica757e37d5460237973444ffd39c7d2850f319e3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Change code in cascade builder to instead
use common functionality in live range.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I7bbd7ea3d1e7e085813e9d93256a54e6bab2267b
|
|
- Vela failed to compile networks with multiple subgraphs because
only cascaded passes in the root subgraph were used when
extracting the live ranges. The fix is to extract the subgraph
range live on Ops that have connected subgraphs.
- The tf_writer did not handle multiple subgraphs in a correct way
resulting in corrupt buffer data in the optimized tflite file. The buffer
index must be unique for every tensor.
-Added support to handle multiple subgraphs for the OfflineMemoryAllocation
meta data. The change will not change behavior for single graphs.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2328dfc1f07e2e4faf43a75423ea95423096ffa3
|
|
If an elemenwise op is part of a cascade, the ifm can not
be overwritten by the ofm.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I1e5f7ee501be17e76684b33c6e86ab8af0f3e61f
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Ported the improved spilling behaviour from Regor
into Vela. This replaces use_fast_storage_for_feature_maps
with allocate_feature_maps and introduces the class called
FastStorageComponentAllocator.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I34785840c905a79750a62863773015b00fb43387
|
|
This change will allow the subgraph's input tensor
to be reused/overwritten by the output from an elementwise op
if there is only one consumer attached to the input tensor.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I317188af11a5470614770e18dc8973462fd5f21c
|
|
Added checks to avoid merging elementwise op live ranges for subgraph
inputs and outputs, which sometimes caused problems when parts of the
network run on CPU.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Id07ab277a205b8550d19a276559f8904b9a4b4be
|
|
- Fixed index error in memory_snapshot
- When removing a cascade, also references are removed
Change-Id: I2b35dc52671d8ce115eb32bfdd93584391d1fc6d
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Fixed typo with not using ifm.mem_type
- Fixed bug with using ifm1 properties when only ifm2 is a potential match
- Removed restriction on not considering SHL and SHR for overlap
- Removed some dead reshape code
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Id9bcc3c2b3ee9ac7b6276187d3e2f513b4acd4b5
|
|
Reinstated allowing the IFM and OFM tensor to overlap for Elementwise
operations.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Ide6db7781f3ca7a36c8ff9e3efdc7943a7bf6d7f
|
|
- Deepspeech reuses identical weights and biases throughout
the network. Since biases are now interleaved with weights
there is a scaling issue when the ifm scales differ between
operations using the same weight and scale tensor.
- This commit uses interleaved weights/scales on their first use
but separates scales to source memory on subsequent use (if
the ifm scale is different).
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I7aae163438160a919cae04e235966e75355a6148
|
|
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
|
|
- Tensor allocation verification was O(N^2), is now closer to O(N)
- Removed a sort in HillClimb allocator
Change-Id: I286a269881490c485cc2b0eeab3b1ecffa8f3df0
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Add ifm/ofm shapes to op
Changed to rely on these shapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I571535a1dcadc2bdb04a3c727a8e1c49703b174d
|
|
Pylint W0102:
When a mutable value as list or dictionary is detected in a
default value for an argument.
Replace detected instances with None, and upon checking for None, sets
the default accordingly
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I4eb73d07d01d4cdefa586eb71b9c76746eee3b11
|
|
- Removed unused --show-minimum-possible-allocation
- Change --allocation-alignment to --cpu-tensor-alignment
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I00e367c3190aeea08a3f136332711e9accc85ba3
|
|
Removed the CLI opt ifm-ofm-overlap
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I23faa0d10c3e71972c543e22e8155086fce73556
|
|
Enable overlap of elementwise input/output
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I6e6f11953319c843c8203bf038f96778df194332
|
|
- op.type is now an enum instead of a string
- Removed unused operator codes
- Refactored some attributes like npu_block_type, fused_activation_function
- Refactored operator index calculation
- Refactored a number of operator sets
Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added a static class TensorAddressMap that stores all Tensor addresses
based on their equivalence_id. Made the "address" field into a property
which getter and setter looks up/sets the tensor's address in
TensorAddressMap.
This makes the references to cpu_tensor/npu_tensor obsolete and they
have been removed.
Addition to scheduler: avoid SRAM spilling if an op has consumers in
other subgraphs.
Minor rework in LUTState; it will now assign a unique equivalence_id to
the SHRAM lut tensor to avoid issues with addressing. The equivalent
checks in LUTState now compares the values of the LUT instead of the the
equivalence_id.
Updated LUT unit tests accordingly.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I41de5a8a4e5f07b77d6544d8d4034b754993e503
|
|
Added the CLI option. Only applies to CPU tensors. Added an
AllocationError which is raised when Allocation fails.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I89164dea3ac7b7add7bc40aec2ce8fe50600105d
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I53d9d56acee57cff208dccb4822c1f1a461c416d
|
|
Additional supported memory configurations:
-Permanent_storage = DRAM
-Tensor arena either in DRAM or SRAM
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I20beb7151e306bfdba540e7c0b2a7b478b4d94e1
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ia7127148d00280bf9c3759dd6dcbe500a4cfcc78
|
|
Also updated README.md
Change-Id: I118309c61f4d00e8508d6b888c606995490fba39
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
Use pre-commit framework [1] to run black and flake8 before the commit.
black and flake8 are managed by the pre-commit framework and they can be
run manually by the user using `pre-commit run` command.
Fix the code base with the help of black and flake8.
Fix import statements according to PEP8 guidelines [1]
Both tools have the following settings (specified in the pre-commit
configuration file):
* line length: 120 characters
* directory to exclude: ethosu/vela/tflite/ and ethosu/vela/ethos_u55_regs
Updated README.md on how to install pre-commit and how to run sanity checks.
Pipenv files have been updated including new dependencies for pre-commit.
[1]: https://www.python.org/dev/peps/pep-0008/#imports
[2]: https://github.com/pre-commit/pre-commit
Change-Id: I304d9fffdf019d390ffa396a529c8a7c2437f63d
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
- Added modules ethosu.vela and ethosu.mlw_codec.
- Added README and various configuration files.
Change-Id: I3690f8c8f5966306ecddaeb2793c30ca9c6e2eee
|