aboutsummaryrefslogtreecommitdiff
path: root/ethosu/vela/scheduler.py
AgeCommit message (Collapse)Author
2022-05-12MLBEDSW-6296: Regression caused by bigger weight buffering size3.4.0.rc1Johan Alfvén
- Due to that bigger weight buffer sizes are being used, there are use cases when feature maps are evicted from SRAM, causing the total performance to drop. - A way to improve this is to limit the memory for those weight buffer ops, to get the feature maps back to SRAM, and see if total performance is improved. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
2022-05-04Revert "MLBEDSW-6263: Use separate tensors for double buffering"Tim Hall
This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
2022-04-08MLBEDSW-6339 Performance drop on wav2letterJohan Alfvén
Corrected calculation for used bufferering depth. Before change there were scenarios when it was set to smaller sizes than needed. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I162859ade78487e848510c6a605685e4568c7068
2022-03-30Update version of Black to 22.3.0Jonas Ohlsson
Update version of Black to 22.3.0 due to updated dependencies. Updates to fix reported issues due to new version. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
2022-03-30MLBEDSW-6263: Use separate tensors for double bufferingLouis Verhaard
Uses separate tensors for the individual weight buffers in case of weight double buffering. Each weight buffer tensor gets its own individual live range. Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13 Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2022-03-21MLBEDSW-6298: MLCE: Unable to find a valid block configTim Hall
- Fixed a bug due to ResizeBilinear modifying the attributes of a shared IFM - The ifm_resampling_mode is now an attribute of an operator rather than a tensor - Changed all calls to try_block_config() to use the attribute rather than recalculating it in multiple places Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I4641e9cd6b049bd4186776d98e3e751c5e5bcc06
2022-03-21MLBEDSW-3367 Add mypy to pre-commitJonas Ohlsson
Add mypy to pre-commit and clean up all reported errors. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
2022-03-14MLBEDSW-6245: Bug fix fast storage allocatorLouis Verhaard
Fast storage allocator did not always return an optimal allocation. Signed-off-by: Louis Verhaard <louis.verhaard@arm.com> Change-Id: Ic758b6c4a82dc2633c4752b0c204a27ed36f651b
2022-03-04MLBEDSW-3367 Update pre-commit flake8 versionJonas Ohlsson
Update the version of flake8 used in pre-commit to facilitate adding mypy to pre-commit. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: I457dec87b77487ca6f14ff4a679c4cc927b272b0
2022-02-22MLBEDSW-5880 Fixed Vela verbose weight flagAyaan Masood
*Original weights and encoded NPU weight now report correct size instead of zero when running vela with --verbose-weights flag (Code to update the aforementioned attributes was missing) *Removed print references to unencoded NPU weight size Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
2022-02-21MLBEDSW-6148: Reduce SRAM usage for elementwise opJohan Alfvén
Reduce memory footprint when using optimization strategy Size for elementwise operations. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I30380aed587c31adbf7615f74179b4c5da686773
2022-02-08MLBEDSW-5839: Port of improved spilling behaviourerik.andersson@arm.com
Ported the improved spilling behaviour from Regor into Vela. This replaces use_fast_storage_for_feature_maps with allocate_feature_maps and introduces the class called FastStorageComponentAllocator. Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: I34785840c905a79750a62863773015b00fb43387
2021-10-27MLBEDSW-5450 MLCE: Vela to handle skip TensorFredrik Svedberg
Added checks to avoid merging elementwise op live ranges for subgraph inputs and outputs, which sometimes caused problems when parts of the network run on CPU. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: Id07ab277a205b8550d19a276559f8904b9a4b4be
2021-10-01MLBEDSW-5286 - MLCE: IndexError for ADD + TANH networkDwight Lidman
Resolves a bug where an IndexError would occur if the same tensor was assigned to both IFM and IFM2 of a binary elementwise operator due to duplicates being allowed in operator inputs but not in pass inputs. Signed-off-by: Dwight Lidman <dwight.lidman@arm.com> Change-Id: I39a6206a6252f6a848be9f9d4c5a8dc749c71699
2021-10-01MLBEDSW-5013 Output diff for u55-bring-up tests, int16Fredrik Svedberg
Fixed output diff for some architectures due to incorrect IFM buffer size calculation when using NearestNeighbour upscaling. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: I0d6d1efc606603cdd6188ae282e7f6babfd7e24e
2021-09-07MLBEDSW-5160 Fix constant data move to fast storagePatrik Gustavsson
Additional check added for when constant data can be moved to fast storage. Do not move constant data for concat. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: Ib8b5fd1483ee9fabe48e9874a5723af9b7c5231a
2021-09-06MLBEDSW-4975 Fix semodepth assertsJacob Bohlin
This commit fixes one assert regarding rolling buffers for 3D tensors. It also addresses another issue where the incorrect weight buffering was proposed for cascaded operators. Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com> Change-Id: I2501f35e5668b3085d917751cfc8002d250973d8
2021-08-23MLBEDSW-4976: index errors in schedulerLouis Verhaard
- Fixed index error in memory_snapshot - When removing a cascade, also references are removed Change-Id: I2b35dc52671d8ce115eb32bfdd93584391d1fc6d Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2021-08-19MLBEDSW-4641 Bugfix Inception v3 hangJacob Bohlin
Fixed a bug that caused the constant and buffered weights to expect different encoding. Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com> Change-Id: I77acee29d104bc7c8e132907e61a72b581ace0e5
2021-06-17vela: Improve block configuration and weight buffering algorithmTim Hall
- Update block config selection to take into account partial IFM fetches at edge of non-whole OFM block data. - Change to scheduler depth slicing for networks in MLBEDSW-4637 for improved buffering. This helps general performance by buffering larger depth slices. - Bug fix for opt_max_schedule always being fitted to SRAM which prevented the optimisation step running in some cases. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I97642c5adec3bb684b1daabf2b81574c27d4eef2
2021-06-16MLBEDSW-4635: yolo_v3 output diffJacob Bohlin
Fixed an issue where the scheduler would set the incorrect tensor layout. Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com> Change-Id: I28abdf3f3c523d7da0cf8840316ece37dad364ab
2021-06-08MLBEDSW-4602: Fix Deepspeech scale & bias reuse issue.Tim Hall
- Deepspeech reuses identical weights and biases throughout the network. Since biases are now interleaved with weights there is a scaling issue when the ifm scales differ between operations using the same weight and scale tensor. - This commit uses interleaved weights/scales on their first use but separates scales to source memory on subsequent use (if the ifm scale is different). Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I7aae163438160a919cae04e235966e75355a6148
2021-05-27MLBEDSW-4034: New Scheduler Size or Performance OptimisationTim Hall
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
2021-05-21MLBEDSW-4219: Add tensor allocation info to summaryTim Hall
- Moved new tensor allocation info under --verbose-allocation flag - Tidied up and added histogram to --verbose--allocation print Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I76fb5187319aedf86f599f57b766220cafc17326
2021-04-19[MLBEDSW-4421] Fix verbose-all crashFredrik Svedberg
Fixed exception when using the CLI option --verbose-all. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: I203fe31ad6914936730343958009e2370045c67c
2021-04-08MLBEDSW-4334 Non-linear format decision in graph opt.Patrik Gustavsson
Check if non linear tensor format can be used is refactored. -Flag avoid_NHCWB16 replaced with needs_linear_format -Checking restrictions located to one function in graph optimiser. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: Iec5c7996a1a6039cad052197f1ae56f7c0290440
2021-02-12MLBEDSW-3600: Cascading support for ResizeBilinearJacob Bohlin
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com> Change-Id: I7899263ff5bb3d0de00681ee8351a02eecff1553
2021-02-11MLBEDSW-3774 Fix avoid cascading for spillingPatrik Gustavsson
Fix avoid cascading for spilling. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: If86189bd1566eaa14387dfc2c02e3324ea6c184e
2021-02-04MLBEDSW-3937 Fix moving FM to fast storagePatrik Gustavsson
Featuremaps were never moved to fast storage when tensor is set to not use NHCWB16. This patch enables the evaluation of feature maps to be moved fast storage, also when tensor use NHWC. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I6367c975e7af8739c774cb7c34b43fb9a6776c8c
2020-12-18MLBEDSW-3654 Add/use op ifm/ofm shapesPatrik Gustavsson
Add ifm/ofm shapes to op Changed to rely on these shapes Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I571535a1dcadc2bdb04a3c727a8e1c49703b174d
2020-12-09Vela: bandwidth calculation improvementsDiqing Zhong
- Combine conv and vector_product calculation - Remove internal bandwidth - Remove blocks and hw_macs from report - Use scaled_bws for cycle estimation Related to: MLBEDSW-3598 Change-Id: I1927a8311ec563f68115e0f2ed077806b86fd717 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-11-27vela: Rename --keep-scale-placement CLITim Hall
- Changed to --cache-bias-scale-tensor Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I285fe253f03ba98eff36dbe996ad3a57e2ee3d99
2020-11-25MLBEDSW-3352 Fix ifm end_coord for upsamplingPatrik Gustavsson
-Fix for end_coord for upsampling -Remove restriction for ifm streaming -Added restriction for cascading on ResizeBilinear Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I384abf12cfe8ac9ce7b76066b709600ea901b248
2020-11-25MLBEDSW-3352 Avoid ifm streaming for some casesPatrik Gustavsson
Changed so it is not allowed to do ifm-streaming for TransposeConv and ResizeBilinear Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I85da279fae6202830c46e4a5500fb1b0dd6ef542
2020-11-23MLBEDSW-3468: Move of scale tensors to SRAM after weight compressorAndreas Nevalainen
After weight compressor weights has correct sizes. Placing move of scale tensors after weight compressor gives more accurate estimate of available SRAM for scale tensors. Change-Id: I4571780180778ef43e943c4e98048e17d6f33580 Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
2020-11-20MLBEDSW-3249: Vela config file examplesTim Hall
- Added sample vela.ini config file - Changed vela config format, split into system config and memory mode - Removed unused CPU cycle performance estimation - Added new CLI options for --memory-mode and --verbose-config - Changed CLI option --config to take multiple files - Removed CLI option --global-memory-clock-scales - Changed error helper functions to raise a VelaError exception - Refactored to create a new is_spilling_enabled function Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I27c41577e37a3859edb9524cd99784be10ef0a0d
2020-11-18MLMBED-3468: Update scale tensors SRAM size calculationAndreas Nevalainen
Updated SRAM size calculation for scale tensors. Change-Id: Idaecc3bf0c83d58ea70163bfd194c594295b66db Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
2020-11-11MLBEDSW-3222: Bias tensors in fast storageAndreas Nevalainen
For IFM streamed cascades bias tensors are read several times. Moves these tensors to fast storage and add DMA commands. Change-Id: I630f6275986c1b5e3f126c925b11e22500fb1128 Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
2020-11-10[MLBEDSW-3227] Improve u65 softmax performanceFredrik Svedberg
Improve u65 softmax performance by selecting more feature map tensors as SRAM candidates. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: I239c9dbebbf2a929004eb01bb0f3efe77f5b97aa
2020-11-06MLBEDSW-3212 Remove CLI opt ifm-ofm-overlapPatrik Gustavsson
Removed the CLI opt ifm-ofm-overlap Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I23faa0d10c3e71972c543e22e8155086fce73556
2020-10-28MLBEDSW-3212 Enable overlap of elementwise input/outputPatrik Gustavsson
Enable overlap of elementwise input/output Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: I6e6f11953319c843c8203bf038f96778df194332
2020-10-08MLBEDSW-3148: Refactor OperationLouis Verhaard
- op.type is now an enum instead of a string - Removed unused operator codes - Refactored some attributes like npu_block_type, fused_activation_function - Refactored operator index calculation - Refactored a number of operator sets Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8 Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2020-09-25MLBEDSW-2337: Intermediate feature maps in fast storageLouis Verhaard
Attempts to use fast storage for feature maps used in between cascaded passes. This is only relevant for system configurations where feature maps are by default not placed in SRAM, but there is SRAM for fast storage. Change-Id: I207b7cf32cfcb5bea3e6b93c2da1161c4af5221d Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2020-09-21MLBEDSW-2816: Fix assert in schedulerDiqing Zhong
- Use non local memory as the base sram usage for a subgraph - Make avoid_for_spilling more generic for all mem configs Change-Id: I99cd30fe6a8ba075d5a70dc2138aa0635afaadb3 Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
2020-09-17MLBEDSW-2809: Redo the Tensor addressingJacob Bohlin
Added a static class TensorAddressMap that stores all Tensor addresses based on their equivalence_id. Made the "address" field into a property which getter and setter looks up/sets the tensor's address in TensorAddressMap. This makes the references to cpu_tensor/npu_tensor obsolete and they have been removed. Addition to scheduler: avoid SRAM spilling if an op has consumers in other subgraphs. Minor rework in LUTState; it will now assign a unique equivalence_id to the SHRAM lut tensor to avoid issues with addressing. The equivalent checks in LUTState now compares the values of the LUT instead of the the equivalence_id. Updated LUT unit tests accordingly. Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com> Change-Id: I41de5a8a4e5f07b77d6544d8d4034b754993e503
2020-08-28MLBEDSW-2889: NHCWB16 format issue at end of subgraphTim Hall
- Processing reshapes at the end of NPU subgraphs selected NHCWB16 tensor format before handing over to the CPU. This commit detects end-of-subgraph during the reshape-consumers compatibility check and chooses NHWC format instead. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: Ieefdbecdba1a6183d79d3ac4d2505503dbf321cb
2020-08-27[MLBEDSW-2846] Do not use NHCWB16 for reduce_sum int32Fredrik Svedberg
Added checks for not using NHCWB16 for reduce_sum int32 which makes int8/uint8 softmax work. Also enabled softmax graph rewrite by default and fixed a saturation problem. Change-Id: Ic01bd9ece7e5c3edb2900b7915cc747efe9e5760 Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2020-08-26MLBEDSW-2686: Use NPU tensor format for noop reshapes.1.2.0.rc2Tim Hall
- Reshapes that merely add/remove dimensions, rather than re-layout the data need not fall back to NHWC. This commit allows reshapes betweeen NPU operators to use NHCWB16. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: Ieb7745e586bf324e92e741a04b74caf7285f4b8b
2020-08-26MLBED-2822 Added CLI-opt for weight size est.Patrik Gustavsson
Added --weight-estimation-scaling, which enables additional scaling of weight compression scale estimate. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: Idcda41257f44901d3a3f345341e07fb1ae8585a9
2020-08-14MLBEDSW-2570 Avoid usage of NHCWB16 for some casesPatrik Gustavsson
Avoid usage of NHCWB16 when Stack/Pack/Concat is performed in axis 3, and the "concat start" of each slice to be combined is not a multiple of 16. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: If3f7b4a3424be3c86fc2dc48e8649ce4c4f49485