Age | Commit message (Collapse) | Author |
|
- Deepspeech reuses identical weights and biases throughout
the network. Since biases are now interleaved with weights
there is a scaling issue when the ifm scales differ between
operations using the same weight and scale tensor.
- This commit uses interleaved weights/scales on their first use
but separates scales to source memory on subsequent use (if
the ifm scale is different).
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I7aae163438160a919cae04e235966e75355a6148
|
|
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
|
|
- Moved new tensor allocation info under --verbose-allocation flag
- Tidied up and added histogram to --verbose--allocation print
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I76fb5187319aedf86f599f57b766220cafc17326
|
|
Improved weight information showed in summary if --verbose-weights
option is used.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iac142f2a813bf1c05aa9da3f8a384466e2914d06
|
|
- Tensor allocation verification was O(N^2), is now closer to O(N)
- Removed a sort in HillClimb allocator
Change-Id: I286a269881490c485cc2b0eeab3b1ecffa8f3df0
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added the theoretically minimum max memory usage and
the allocator overhead to the Vela summary.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: If373dfeaac50d6f8b56554d435bf22af2c3acda3
|
|
Made HillClimb allocation results reproducible between runs.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I0535947e9cd9c6e0cf896e81b127d93cab54ebc8
|
|
- Straight port of the C++ implementation to python.
- Renamed the allocator from "Search" to "HillClimb"
Change-Id: I50797d541f326d0264daf79bf7866aef32350a60
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Change-Id: I06feeb98fb48badf06097f377a9504e6f4eeae91
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- Also removed the original bit_per_element
Change-Id: I51bfbd28e14f316aae2d542bb610a3ed57b8b53b
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Minor refactoring to use fstrings.
Improve Error classes to correctly inherit the base class.
Use existing exception classes instead of plain exceptions where it
makes sense.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I0941c04e91010da1db77299517a8e2d896371e77
|
|
Added a new tensor allocator that is based on searching,
implemented in C++ (C++11 compatible).
Change-Id: Ie96e9fcfc8e6c58d1fa53911f37de290eeba88cf
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Removed unused --show-minimum-possible-allocation
- Change --allocation-alignment to --cpu-tensor-alignment
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I00e367c3190aeea08a3f136332711e9accc85ba3
|
|
Removed the CLI opt ifm-ofm-overlap
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I23faa0d10c3e71972c543e22e8155086fce73556
|
|
Enable overlap of elementwise input/output
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I6e6f11953319c843c8203bf038f96778df194332
|
|
Attempts to use fast storage for feature maps used in between
cascaded passes.
This is only relevant for system configurations where feature maps
are by default not placed in SRAM, but there is SRAM for fast storage.
Change-Id: I207b7cf32cfcb5bea3e6b93c2da1161c4af5221d
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Allocate live ranges with longer life time first.
On average this gives better memory usage.
Change-Id: Id89e9e36a944169a2f10ce7f6e869397ef0abaf0
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added the CLI option. Only applies to CPU tensors. Added an
AllocationError which is raised when Allocation fails.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I89164dea3ac7b7add7bc40aec2ce8fe50600105d
|
|
- Support for more than one 256-byte LUT in SHRAM
- No DMA is performed for a LUT that is already located in SHRAM
- Added MemArea.Shram, used for LUT, to avoid false address collision
asserts during SRAM tensor allocation
- Added read access to LUT in memory access calculation
Change-Id: If4d1eded5ed029d253f4f5efb2d80495fc3eac99
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Additional supported memory configurations:
-Permanent_storage = DRAM
-Tensor arena either in DRAM or SRAM
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I20beb7151e306bfdba540e7c0b2a7b478b4d94e1
|
|
If same weight tensor was used with different block configs,
errors would occur.
Fixed by always cloning weight tensors, using a global weight
compression cache and modifying the linear allocator to
detect multiple usage of same weight compression.
Change-Id: I91ca59176e1c59c66e0ac7a4227f2b5f0b47053f
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Also updated README.md
Change-Id: I118309c61f4d00e8508d6b888c606995490fba39
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
Use pre-commit framework [1] to run black and flake8 before the commit.
black and flake8 are managed by the pre-commit framework and they can be
run manually by the user using `pre-commit run` command.
Fix the code base with the help of black and flake8.
Fix import statements according to PEP8 guidelines [1]
Both tools have the following settings (specified in the pre-commit
configuration file):
* line length: 120 characters
* directory to exclude: ethosu/vela/tflite/ and ethosu/vela/ethos_u55_regs
Updated README.md on how to install pre-commit and how to run sanity checks.
Pipenv files have been updated including new dependencies for pre-commit.
[1]: https://www.python.org/dev/peps/pep-0008/#imports
[2]: https://github.com/pre-commit/pre-commit
Change-Id: I304d9fffdf019d390ffa396a529c8a7c2437f63d
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
- Added modules ethosu.vela and ethosu.mlw_codec.
- Added README and various configuration files.
Change-Id: I3690f8c8f5966306ecddaeb2793c30ca9c6e2eee
|