Age | Commit message (Collapse) | Author |
|
- The print_performance function that is called when using the
--verbose-performance option crashed with KeyError when no SRAM was
used.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
|
|
- The npu cycles are not correct calculated when only
one weight buffer is used, since weights can not
be fetched in parallel.
- Added new calculation in the single buffer case.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
|
|
- Added support to print per operator sram usage and performance
information
- Added new CLI option --verbose-performance to control this feature
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
|
|
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Add mypy to pre-commit and clean up all reported errors.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
|
|
Fix bug when storing the encoded NPU weight UUID in the
NPU performance estimation.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I92127b0020f12352d923c0c9aa2b6f47e6110764
|
|
- Corrected rounding error
- Number of elements depends on ofm format
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I568d660b7571b6e0ffb131211b3a89c8be4b9295
|
|
*Corrected calculation where use of the
_estimate_memory_transfer_efficiency function when calculating the
scaled bandwidth for LUT transfers resulted in a divide by zero error.
Change-Id: I2356e924d9ca2f315ca1988f465f58b13a8fa4c9
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Original weights and encoded NPU weight now report correct size instead
of zero when running vela with --verbose-weights flag
(Code to update the aforementioned attributes was missing)
*Removed print references to unencoded NPU weight size
Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
- Update block config selection to take into account partial
IFM fetches at edge of non-whole OFM block data.
- Change to scheduler depth slicing for networks in MLBEDSW-4637
for improved buffering. This helps general performance by buffering
larger depth slices.
- Bug fix for opt_max_schedule always being fitted to SRAM which
prevented the optimisation step running in some cases.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I97642c5adec3bb684b1daabf2b81574c27d4eef2
|
|
Putting back the estimates related to unbuffered
weight transfer.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I2072066bc1e01814fe3b0b87a912f69646da861c
|
|
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
|
|
Check if non linear tensor format can be used is
refactored.
-Flag avoid_NHCWB16 replaced with needs_linear_format
-Checking restrictions located to one function in graph optimiser.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Iec5c7996a1a6039cad052197f1ae56f7c0290440
|
|
This commit adds support for the MEAN operator,
with some caveats.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I165cb26cb5aefd68e70d2cfc68291ccf7b778921
|
|
All files which have been updated in 2021 and contain a copyright header have had their headers updated.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ia682111a719d16e690433398ccfb69c7e93c1cd1
|
|
- Removed requirement for cloning shapes when unique values required
by forcing top-level immutability. This alleviates issues with Shapes
being unintentionally shared and then mutated as if value-types.
- Shape4D fields can no longer be assigned without replication.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic0dbfa349eb0215eabefb4f4e2cf99f12d83699c
|
|
-Removed reshapes in the original graph
-Removed the addition of reshapes to the
optimized graph
-Reshapes with different ifm/ofm quantisation will remain
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I94862be53dac0d7434815e2aee5ca678228495f8
|
|
- Reshape/rearrange IFM and weight tensor for better HW utilization
- Update estimator to cover this case
Change-Id: I4be70a69fa600a1951bf1c247f9973e6cc9b03f4
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
This reverts commit df0a5905177f3a1b836076bc3f9f39b2e86f1794.
Reason for revert: <INSERT REASONING HERE>
Change-Id: I891c66fb29db9d25e942947e8d1c29a10610de51
|
|
This reverts commit bf31d647dc5df47410ee577b12427ddf076d816b.
Reason for revert: <INSERT REASONING HERE>
Change-Id: I7b6c585b7658f94dbaa916c2b6bfe9fb463b8d37
|
|
Add 4D shape class for op Ifm/ofm shapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ic0a98da9d2f9d085605e39a9ab5a26bad6e702a3
|
|
Add ifm/ofm shapes to op
Changed to rely on these shapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I571535a1dcadc2bdb04a3c727a8e1c49703b174d
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I4a5c53d0c5957595fc639b174b2b227ea043d409
|
|
- Combine conv and vector_product calculation
- Remove internal bandwidth
- Remove blocks and hw_macs from report
- Use scaled_bws for cycle estimation
Related to: MLBEDSW-3598
Change-Id: I1927a8311ec563f68115e0f2ed077806b86fd717
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Replace conditional checks against sets with tuples.
If not requiring uniqueness, or complex set operations, it is quicker to
use tuples instead.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ie8732c8d46067244963936c53f0ec81adda50372
|
|
Pylint W0102:
When a mutable value as list or dictionary is detected in a
default value for an argument.
Replace detected instances with None, and upon checking for None, sets
the default accordingly
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I4eb73d07d01d4cdefa586eb71b9c76746eee3b11
|
|
- Improve conv estimation by adding delay cycles
- Estimate minimal block cmd cycles
Change-Id: Ibea818e8e820731fc7d05c948d5d1abd22e17089
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- Added sample vela.ini config file
- Changed vela config format, split into system config and memory mode
- Removed unused CPU cycle performance estimation
- Added new CLI options for --memory-mode and --verbose-config
- Changed CLI option --config to take multiple files
- Removed CLI option --global-memory-clock-scales
- Changed error helper functions to raise a VelaError exception
- Refactored to create a new is_spilling_enabled function
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I27c41577e37a3859edb9524cd99784be10ef0a0d
|
|
- Improve the conv estimation when the block size is very small
- Estimate cycles on bias/scale channel
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I275770b7f013b0812fc1ffe91f42ad07727c9dc7
|
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Change-Id: I9e00afe0eef0e13fe990e021bcbe3dd0eda4c471
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Change-Id: I8f139381d0e01e8ac70d89c4a312ee3000fb5fa1
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- DMA ops cycle estimation for the first pass
- fix a bug in ifm_blk_depth calculation
- fix a bug in sram bandwidth calculation
- merge dpu and elementwise cycles into npu cycles
- use str.format() in performance print
Change-Id: I78895416f47fc3c652743c5da13fc45630322371
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
(cherry picked from commit 5245e97a62c2fe54250f99b06e778f3e0c6dc376)
(cherry picked from commit 16e415677403fc04a90b1a7ec554761d38315640)
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: Ic6ae795a1626d1cdf63a69d2ff86f7cd898f3134
|
|
- Normalise kernel availability by requiring all operators offer a kernel
describing how much data they consume from the source, per OFM element,
regardless of whether kernels are relevant to the operation.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idbcff64879fc2eccf292b6208a7d2038eb388017
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I9f3671041c2b1497519cf42b5f52e3cd01d9c10a
(cherry picked from commit e8c989f5236cce12d07a6644329935dbbf0ee8e6)
|
|
- op.type is now an enum instead of a string
- Removed unused operator codes
- Refactored some attributes like npu_block_type, fused_activation_function
- Refactored operator index calculation
- Refactored a number of operator sets
Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Fixed. It only affected operators with striding greater than 1x1
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I129e46586aa16079ddbce3898569676ba9891372
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ief50c934b9e9b0bd3024d3ed0bbaa7b655971952
|
|
- No functional change
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I5ab1198b9d092cd041fa9b85b2dee9900d299bfc
|
|
- Removed --inter-pass-cycle-delay
- Removed --dram-bandwidth
- Removed --batch-size
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ib613f47a9e911c652e522b5aa9ec58ae5391b0fd
|
|
Kernel height was not correctly calculated for pooling
operations in rolling_buffer_dims_from_passes.
Change-Id: I48763b4b3276538c111e6699f66636327e569705
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Extend IFM to full dimension for the performance
metrics calculation.
Change-Id: Iae923e37280ab0f22b7a272f28970973a5142534
Signed-off-by: Charles Xu <charles.xu@arm.com>
|
|
Also updated README.md
Change-Id: I118309c61f4d00e8508d6b888c606995490fba39
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
Use pre-commit framework [1] to run black and flake8 before the commit.
black and flake8 are managed by the pre-commit framework and they can be
run manually by the user using `pre-commit run` command.
Fix the code base with the help of black and flake8.
Fix import statements according to PEP8 guidelines [1]
Both tools have the following settings (specified in the pre-commit
configuration file):
* line length: 120 characters
* directory to exclude: ethosu/vela/tflite/ and ethosu/vela/ethos_u55_regs
Updated README.md on how to install pre-commit and how to run sanity checks.
Pipenv files have been updated including new dependencies for pre-commit.
[1]: https://www.python.org/dev/peps/pep-0008/#imports
[2]: https://github.com/pre-commit/pre-commit
Change-Id: I304d9fffdf019d390ffa396a529c8a7c2437f63d
Signed-off-by: Diego Russo <diego.russo@arm.com>
|
|
- Added modules ethosu.vela and ethosu.mlw_codec.
- Added README and various configuration files.
Change-Id: I3690f8c8f5966306ecddaeb2793c30ca9c6e2eee
|