aboutsummaryrefslogtreecommitdiff
path: root/ethosu/vela/npu_performance.py
AgeCommit message (Collapse)Author
2024-03-12MLBEDSW-8725: Remove scales & biases from --verbose-weightsAlexander Bengtsson
Remove scales and biases from encoded weights size. This aligns better with original weights size (which only represents the weight tensor) Change-Id: I5aabf61385d8fdf150764c45e04ba4388c6a63f0 Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>
2023-08-09MLBEDSW-7754: Performance estimator is not using write/read shapes3.9.0.rc1William Isaksson
- npu_performance now uses write/read shapes instead of using ifm/ofms for memory cycle estimations. - also fixes a would be bug in the tflite_graph_optimiser, where one read shape is not Shape4D. Change-Id: I2067069a713d2cf9e65a5cc227e803de79940fff Signed-off-by: William Isaksson <william.isaksson@arm.com>
2023-07-31MLBEDSW-7397: Wrong mem_area used in schedulerwilisa01
Performance estimation now uses the parent_tensor mem_area instead of the scheduler_op mem_area, because the mem_area is only set on the parent_tensor by the scheduler. Signed-off-by: wilisa01 <william.isaksson@arm.com> Change-Id: I11f73686bfbd6958a8920c5e264a5f95cc3f23d1
2023-07-10MLBEDSW-7752: setting query shapes to Shape4D(0)William Isaksson
Changes query initialization shapes to Shape4D(0,0,0,0) = [0,0,0,0] instead of Shape4D(0) = [0,1,1,1]. The [0,1,1,1] tensors would affect performance estimates and are not real. Change-Id: Ic83b6f6a70c0c904b500f62756e1e125c99856c6 Signed-off-by: William Isaksson <william.isaksson@arm.com>
2023-05-03MLBEDSW-7416: per_layer_csv has extra rowwilisa01
removed redundant row Change-Id: I8b90df3b45ed863c93572b33f695b06094103015 Signed-off-by: wilisa01 <william.isaksson@arm.com>
2023-03-14MLBEDSW-6260: Add support for using DMA to copy feature mapsJohan Alfven
- Reshape ops can be bypassed and there is no need to process them by the NPU. There are use cases when the IFM must be preserved so a memcpy is needed. This is implemented by an AvgPool. - In order to reduce the cost of the AvgPool the IFM can be copied by DMA. This is faster and also it can be turned into a real NOP in cases where the IFM and the OFM can use the same memory space. - Added new memcpy op. Only NHWC format supported since DMA can not change the format on the fly. - Allow ofm to reuse ifm for memcpy op - Make sure the DMA copy size is 16 byte aligned Change-Id: I3605a48d47646ff60d2bb3644dd3a23f872235a7 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-12-15MLBEDSW-7173: MLCE: NPU performance for reversed operandsJohan Alfvén
- When introducing the support for reversed operands the npu performance was not updated. The result is larger numbers (degrade) from the performance estimater compared to the previous release. In reality there is no degrade and the real performance is the same. - Updated npu performance to reflect the behavior implemented by the reversed operands attribute. Change-Id: I1b37a07f25def8f7a8adbdaadcf931bfe49165cb Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-11-16MLBEDSW-6620: Update copyright notice and yearsRickard Bolin
- Update copyright notices to use SPDX format and add OSS mail as contact. - Update years on files where it had been missed. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f
2022-11-14MLBEDSW-7040: verbose-performance doesn't report the original opTim Hall
- Fixed the reporting of the input network operator to correctly report the original operator type rather than the current one - Fixed a divide by zero bug when calculating percentages - Refactored the verbose-performance code so that console and csv outputs use a single definition of the header and data Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: Ibd3fa99b65f0602dcdcff696f2d565ac13453306
2022-09-01MLBEDSW-6755: Add per-layer performance to CSV filewilisa01
Dump the current per-layer performance estimation information that appears on the terminal to a CSV file. Change-Id: I00e94168704be8c3c674c8779fb807ed28607ccd Signed-off-by: wilisa01 <william.isaksson@arm.com>
2022-05-19MLBEDSW-6271: Key error when using --verbose-performance optionRickard Bolin
- The print_performance function that is called when using the --verbose-performance option crashed with KeyError when no SRAM was used. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
2022-05-19MLBEDSW-6384: Updated weight buffering cycle calculationJohan Alfvén
- The npu cycles are not correct calculated when only one weight buffer is used, since weights can not be fetched in parallel. - Added new calculation in the single buffer case. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
2022-05-17MLBEDSW-6271: MLCE: Layer wise Utilization info from VelaTim Hall
- Added support to print per operator sram usage and performance information - Added new CLI option --verbose-performance to control this feature Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
2022-05-16MLBEDSW-6263: Use separate tensors for double bufferingRickard Bolin
Uses separate tensors for the individual weight buffers in case of weight double buffering. Each weight buffer tensor gets its own individual live range. This patch is a clone of a previously reverted patch, but with some additional bug fixes applied. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
2022-05-04Revert "MLBEDSW-6263: Use separate tensors for double buffering"Tim Hall
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
2022-03-30Update version of Black to 22.3.0Jonas Ohlsson
Update version of Black to 22.3.0 due to updated dependencies. Updates to fix reported issues due to new version. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
2022-03-30MLBEDSW-6263: Use separate tensors for double bufferingLouis Verhaard
Uses separate tensors for the individual weight buffers in case of weight double buffering. Each weight buffer tensor gets its own individual live range. Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13 Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2022-03-21MLBEDSW-3367 Add mypy to pre-commitJonas Ohlsson
Add mypy to pre-commit and clean up all reported errors. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
2022-03-14Fix bug storing encoded NPU weight UUIDsJonas Ohlsson
Fix bug when storing the encoded NPU weight UUID in the NPU performance estimation. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: I92127b0020f12352d923c0c9aa2b6f47e6110764
2022-03-08Updated elementwise cycle calculationJohan Alfvén
- Corrected rounding error - Number of elements depends on ofm format Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I568d660b7571b6e0ffb131211b3a89c8be4b9295
2022-02-22MLBEDSW-5873 Fixed divide by zero warning in memory transfer efficiencyAyaan Masood
*Corrected calculation where use of the _estimate_memory_transfer_efficiency function when calculating the scaled bandwidth for LUT transfers resulted in a divide by zero error. Change-Id: I2356e924d9ca2f315ca1988f465f58b13a8fa4c9 Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
2022-02-22MLBEDSW-5880 Fixed Vela verbose weight flagAyaan Masood
*Original weights and encoded NPU weight now report correct size instead of zero when running vela with --verbose-weights flag (Code to update the aforementioned attributes was missing) *Removed print references to unencoded NPU weight size Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
2021-06-17vela: Improve block configuration and weight buffering algorithmTim Hall
- Update block config selection to take into account partial IFM fetches at edge of non-whole OFM block data. - Change to scheduler depth slicing for networks in MLBEDSW-4637 for improved buffering. This helps general performance by buffering larger depth slices. - Bug fix for opt_max_schedule always being fitted to SRAM which prevented the optimisation step running in some cases. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I97642c5adec3bb684b1daabf2b81574c27d4eef2
2021-06-03MLBEDSW-4688: Fix performance estimatesPatrik Gustavsson
Putting back the estimates related to unbuffered weight transfer. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I2072066bc1e01814fe3b0b87a912f69646da861c
2021-05-27MLBEDSW-4034: New Scheduler Size or Performance OptimisationTim Hall
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
2021-04-08MLBEDSW-4334 Non-linear format decision in graph opt.Patrik Gustavsson
Check if non linear tensor format can be used is refactored. -Flag avoid_NHCWB16 replaced with needs_linear_format -Checking restrictions located to one function in graph optimiser. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: Iec5c7996a1a6039cad052197f1ae56f7c0290440
2021-02-25MLBEDSW-1499: Add MEAN operatorDwight Lidman
This commit adds support for the MEAN operator, with some caveats. Signed-off-by: Dwight Lidman <dwight.lidman@arm.com> Change-Id: I165cb26cb5aefd68e70d2cfc68291ccf7b778921
2021-02-25MLBEDSW-4064: Update copyright headerserik.andersson@arm.com
All files which have been updated in 2021 and contain a copyright header have had their headers updated. Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: Ia682111a719d16e690433398ccfb69c7e93c1cd1
2021-02-05vela: Change Shape4D mutability usageTim Hall
- Removed requirement for cloning shapes when unique values required by forcing top-level immutability. This alleviates issues with Shapes being unintentionally shared and then mutated as if value-types. - Shape4D fields can no longer be assigned without replication. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: Ic0dbfa349eb0215eabefb4f4e2cf99f12d83699c
2021-01-28MLBEDSW-3772 Reshape removalPatrik Gustavsson
-Removed reshapes in the original graph -Removed the addition of reshapes to the optimized graph -Reshapes with different ifm/ofm quantisation will remain Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I94862be53dac0d7434815e2aee5ca678228495f8
2020-12-22MLBEDSW-1493: Optimise strided convDiqing Zhong
- Reshape/rearrange IFM and weight tensor for better HW utilization - Update estimator to cover this case Change-Id: I4be70a69fa600a1951bf1c247f9973e6cc9b03f4 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-12-21Revert "Revert "MLBEDSW-3645 4D class for op ifm/ofm shapes""patrik.gustavsson
This reverts commit df0a5905177f3a1b836076bc3f9f39b2e86f1794. Reason for revert: <INSERT REASONING HERE> Change-Id: I891c66fb29db9d25e942947e8d1c29a10610de51
2020-12-21Revert "MLBEDSW-3645 4D class for op ifm/ofm shapes"patrik.gustavsson
This reverts commit bf31d647dc5df47410ee577b12427ddf076d816b. Reason for revert: <INSERT REASONING HERE> Change-Id: I7b6c585b7658f94dbaa916c2b6bfe9fb463b8d37
2020-12-21MLBEDSW-3645 4D class for op ifm/ofm shapesPatrik Gustavsson
Add 4D shape class for op Ifm/ofm shapes Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: Ic0a98da9d2f9d085605e39a9ab5a26bad6e702a3
2020-12-18MLBEDSW-3654 Add/use op ifm/ofm shapesPatrik Gustavsson
Add ifm/ofm shapes to op Changed to rely on these shapes Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I571535a1dcadc2bdb04a3c727a8e1c49703b174d
2020-12-16MLBEDSW-3465: Add memory settings into sys configDiqing Zhong
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com> Change-Id: I4a5c53d0c5957595fc639b174b2b227ea043d409
2020-12-09Vela: bandwidth calculation improvementsDiqing Zhong
- Combine conv and vector_product calculation - Remove internal bandwidth - Remove blocks and hw_macs from report - Use scaled_bws for cycle estimation Related to: MLBEDSW-3598 Change-Id: I1927a8311ec563f68115e0f2ed077806b86fd717 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-12-08MLBEDSW-2836 Change sets to tuplesMichael McGeagh
Replace conditional checks against sets with tuples. If not requiring uniqueness, or complex set operations, it is quicker to use tuples instead. Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com> Change-Id: Ie8732c8d46067244963936c53f0ec81adda50372
2020-12-07MLBEDSW-3685 Fix dangerous default value usageMichael McGeagh
Pylint W0102: When a mutable value as list or dictionary is detected in a default value for an argument. Replace detected instances with None, and upon checking for None, sets the default accordingly Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com> Change-Id: I4eb73d07d01d4cdefa586eb71b9c76746eee3b11
2020-11-25MLBEDSW-3530: Fix performance issueDiqing Zhong
- Improve conv estimation by adding delay cycles - Estimate minimal block cmd cycles Change-Id: Ibea818e8e820731fc7d05c948d5d1abd22e17089 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-11-20MLBEDSW-3249: Vela config file examplesTim Hall
- Added sample vela.ini config file - Changed vela config format, split into system config and memory mode - Removed unused CPU cycle performance estimation - Added new CLI options for --memory-mode and --verbose-config - Changed CLI option --config to take multiple files - Removed CLI option --global-memory-clock-scales - Changed error helper functions to raise a VelaError exception - Refactored to create a new is_spilling_enabled function Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I27c41577e37a3859edb9524cd99784be10ef0a0d
2020-11-19MLBEDSW-3476: Fix performance regressionDiqing Zhong
- Improve the conv estimation when the block size is very small - Estimate cycles on bias/scale channel Signed-off-by: Diqing Zhong <diqing.zhong@arm.com> Change-Id: I275770b7f013b0812fc1ffe91f42ad07727c9dc7
2020-11-13MLBEDSW-839: Code generation using external API2.0.0.rc1Louis Verhaard
Added external API to generate register command streams. Existing code generation has been refactored to make use of this API. Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2020-11-11Vela: estimate memory transfer efficiencyDiqing Zhong
Change-Id: I9e00afe0eef0e13fe990e021bcbe3dd0eda4c471 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-11-11Vela: Fix perf estimation for conv 1D reshapeDiqing Zhong
Change-Id: I8f139381d0e01e8ac70d89c4a312ee3000fb5fa1 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-11-11MLBEDSW-3146: memory transfers cycle estimationDiqing Zhong
- DMA ops cycle estimation for the first pass - fix a bug in ifm_blk_depth calculation - fix a bug in sram bandwidth calculation - merge dpu and elementwise cycles into npu cycles - use str.format() in performance print Change-Id: I78895416f47fc3c652743c5da13fc45630322371 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com> (cherry picked from commit 5245e97a62c2fe54250f99b06e778f3e0c6dc376) (cherry picked from commit 16e415677403fc04a90b1a7ec554761d38315640)
2020-11-11MLBEDSW-3146: Cycle estimation for conv/pooling opsDiqing Zhong
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com> Change-Id: Ic6ae795a1626d1cdf63a69d2ff86f7cd898f3134
2020-10-21vela: Refactor operators to use Kernel objectsTim Hall
- Normalise kernel availability by requiring all operators offer a kernel describing how much data they consume from the source, per OFM element, regardless of whether kernels are relevant to the operation. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: Idbcff64879fc2eccf292b6208a7d2038eb388017
2020-10-21MLBEDSW-603: Improve cycle estimation in elementwise opsDiqing Zhong
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com> Change-Id: I9f3671041c2b1497519cf42b5f52e3cd01d9c10a (cherry picked from commit e8c989f5236cce12d07a6644329935dbbf0ee8e6)
2020-10-08MLBEDSW-3148: Refactor OperationLouis Verhaard
- op.type is now an enum instead of a string - Removed unused operator codes - Refactored some attributes like npu_block_type, fused_activation_function - Refactored operator index calculation - Refactored a number of operator sets Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8 Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>