ethos-u/ethos-u-vela.git

Age	Commit message (Collapse)	Author
2023-08-09	MLBEDSW-7754: Performance estimator is not using write/read shapes3.9.0.rc1	William Isaksson
	- npu_performance now uses write/read shapes instead of using ifm/ofms for memory cycle estimations. - also fixes a would be bug in the tflite_graph_optimiser, where one read shape is not Shape4D. Change-Id: I2067069a713d2cf9e65a5cc227e803de79940fff Signed-off-by: William Isaksson <william.isaksson@arm.com>
2023-07-31	MLBEDSW-7397: Wrong mem_area used in scheduler	wilisa01
	Performance estimation now uses the parent_tensor mem_area instead of the scheduler_op mem_area, because the mem_area is only set on the parent_tensor by the scheduler. Signed-off-by: wilisa01 <william.isaksson@arm.com> Change-Id: I11f73686bfbd6958a8920c5e264a5f95cc3f23d1
2023-06-14	MLBEDSW-7147: Enable weight buffering when opt for Size	Johan Alfven
	- When optimizing for Size the scheduler does not try to add weight buffering to the schedule since this would add extra SRAM usage to the peak usage. However, for all other ops that uses less SRAM than the peak there is memory available that could be used for weight buffering and hence improve the performance. - Removed limitation to only run optimize schedule when optimizing for Performance. Regardless of optimizing for Performance or Size the scheduler flow is the same except that the limit for max SRAM usage is different. Change-Id: I6880b35655e37b4916a9c15150f0b8e5126a1cd8 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-05-31	MLBEDSW-7600: MLCE: Enable cascading for resize ops	Johan Alfven
	- Added fix when building the minimum schedule forcing the stripe to be even for is_nearest ops. This is required in order to be able to allow cascading for resize ops. - Remove limitation in cascade builder that prevents resize ops to be cascaded. Change-Id: I05150102b91531ecba786936494f1817a4472f42 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-05-15	MLBEDSW-7390: Add verbose progress option	Raul Farkas
	Add --verbose-progress CLI option used to enable printing progress information in the compiler driver and scheduler. Change-Id: I99ac8c6a654e60391d5c11e28b89250405daa53a Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-04-17	MLBEDSW-7196 Add LSTM support	Fredrik Svedberg
	Added int8 and int16 UNIDIRECTIONAL_SEQUENCE_LSTM support. The implementation does not include support for: * CIFG * Peephole * Projection * Normalisation This change also: * Removed unused Op.BlockLSTM operation type. * Removed the only one consumer limitation on putting the SplitSliceRead on the tensor consumer(s), if all consumers fullfills the requirements * Added Op.VariableTensorWrite as a Operation.memory_function to make sure writes to variable tensors: * Always use linear mode * Are not moved to fast scratch * Are not fused with other elementwise operation tensor ranges Change-Id: Ief831738924ac3d1f2ba6d41f10bd6dc969911f3 Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2023-03-27	MLBEDSW-6343: Remove op_index constraint	Raul Farkas
	Remove op_index constraint and force linear format for all Conv2D that have strides that can be optimised. Change-Id: Idef3508ab074ea9abeacac030eaaa15a00ad1211 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-03-21	MLBEDSW-7430: Remove non local mem usage from cascade info	Johan Alfven
	- There is a latent bug when calculating the mem usage parallel to the sub schedule. The error is the calculation done when optimizing the sub schedules. There the cascade size is withdrawn from the snapshot usage to decide non local memory usage. The problem is that the cascade mem usage actually also includes non local memory so the end result will be zero. This is normally not a problem but it will be when starting to optimize sub schedule when optimizing for Size. - The solution is to not include the non local usage in the cascade info, the scheduler already have this information. - Corrected usage of persistent initial IFM. This size should not be included for Dedicated SRAM since only intermediate buffers are in SRAM. - Added some comment to clarify the code in the cascade builder. Change-Id: I473b36e0d69550ab6565f4ef028195636b362997 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-16	MLBEDSW-7352: Refactoring move_constant_data	Johan Alfven
	Refactoring move_constant_data in the scheduler. The use case currently only work for LUT tensor, so simplifying the logic. In order to make it work for other tensors one would also have to take into consideration memory usage when building cascades and also the use_fast_storage_for_feature_maps would be effected. Change-Id: Ic8de53b65a2c17d34515002d7f184d0ab1830222 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-13	MLBEDSW-7393: MLCE: Optimize compile time for large networks	Johan Alfven
	- There is a problem with large networks containing many NPU subgraphs. The scheduling takes too long time since the snapshot memory calculation is always doing a complete update for the full graph. - A complete run is needed in the end to calculate all the time indexes correctly. However, when scheduling a NPU subgraph it is enough to extract live ranges for the current schedule and its operators. Change-Id: Iccb7d6728119c1428ad0b45a2ac34e92158c15bd Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-01-16	MLBEDSW-7091: MLCE: Reduce SRAM, compiling for Performance	Johan Alfven
	- Previously a feature was added in order to reduce SRAM usage when optimizing for Size. An investigation has now been done that shows that this feature is also beneficial when optimizing for Performance and hence this patch removes the Size only limitation. Change-Id: I5b130db43cbda47e09d4196ab1daa5a21e35ae00 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-12-21	MLBEDSW-7062: Clean up and and add comments to scheduler	Rickard Bolin
	Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I026facce572ddce4249e05529f2bb1d285552ab9
2022-12-15	MLBEDSW-7179: Fix assert for non local memory calculation	Johan Alfvén
	IFM's in persistent memory should not be included in the memory op SRAM calculation. Change-Id: Iaac4d2ad8b206c5fb727e5815477cb3611a13e0e Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-12-08	MLBEDSW-6716: Updates to estimate op SRAM usage	Johan Alfvén
	- The cascade builder estimates how much SRAM usage an operator takes when calculating the cascades. If an elementwise operator is included in a cascade the IFM2 will always be a constant/scalar and the IFM2 will be in permanent memory and the size of the IFM2 should not be included in the SRAM estimate. - The scheduler did not take into account that IFM can be reused for the OFM when calculating the op memory usage resulting in a negative number for non-local memory usage. Corrected the calculation and added assert to detect future problems. Change-Id: Id7ec8fe1ec5560290f34579a7b9203a75067aba2 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-11-16	MLBEDSW-6620: Update copyright notice and years	Rickard Bolin
	- Update copyright notices to use SPDX format and add OSS mail as contact. - Update years on files where it had been missed. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f
2022-10-26	MLBEDSW-7063: Fix output diff for networks with split ops	Johan Alfvén
	- Due to a SPLIT op the following ADD op did get an IFM shape that is bigger than its original shape but that is handled by read_offset and read_shapes. The problem was that the IFM was considered not be primary and an erroneously swap was done. - Make it even more clear when the swap is allowed. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I0aefa04234f66c935f269267ae8ed1d77da64c81
2022-10-26	MLBEDSW-6984: Optimize fast storage for feature maps	Johan Alfvén
	- Remove very long live ranges that are standing out compared to its neighbors. This can be seen on large networks with complex structure. If they are chosen instead of shorter live ranges, it will be difficult for the HillClimb Allocator to find a perfect fit in the final allocation. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I6cf23adfdc06c1e93e12e9cf816453d940ff31f7
2022-10-25	MLBEDSW-7028: Fix compiler assert for elementwise op	Johan Alfvén
	- Refactored erroneously if statement that allowed illegal swapping between ifm1 and ifm2 for elementwise operators. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: Iec571f710824432edac9104d960f199f33a1b241
2022-10-21	MLBEDSW-6840: New stripe algo for optimize sub schedule	Johan Alfvén
	- The algorithm for trying out different stripes in order to optimize a sub schedule/cascade, have a problem that it can split the initial cascade into several smaller cascades. The problem with this is that it will increase IFM/OFM DRAM bandwith and performance will drop. - Changed the stripe algorithm to prefer long cascades. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I4f38b381597b7094819e9dd463aa1876e4e6bc62
2022-10-20	MLBEDSW-7019: Update to elementwise cascading	Johan Alfvén
	- The cascade builder is using the ifm_ifm2_correct_order function in order to decide if the operator is cascadable or not. The problem is that this function expects a full shape or no shape and the cascade builder did not provide that, so the operator was reported to be non cascadable. - The fix is to provide a full 4D shape, also refactoring ifm_ifm2_correct_order to use 4D shape to avoid confusion in the future. - Refactoring code so that the scheduler can perform a correct ifm and ifm2 swap. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I9a86c4690612f332afa428456a07e67698852495
2022-10-12	MLBEDSW-6971 Fix output diff when cascading elementwise operators	Fredrik Svedberg
	Fixed output diff when cascading elementwise operators with reversed operand order. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: Iac2e28cfb53037b929459af213f4fa7715b3e6de
2022-08-17	MLBEDSW-6769: Fix odd stripe heights for upscaling	erik.andersson@arm.com
	Output diffs were found to be caused by odd input stripe heights, despite the input being an upscaling operator. Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: Ia3791d815250364cfe7a38c3ed0e30768d64ca08
2022-08-17	MLBEDSW-6645: MLCE: Optimize SRAM usage	Johan Alfvén
	- When compiling for shared SRAM the old scheduler has an option so that it produces less SRAM than what the new scheduler manages to produce. The old scheduler was able to creates more/longer cascades. In order to improve the new scheduler, the following has been implemented: - Take persistent IFM's into account when creating the min schedule. - Choose longer cascades when it is possible to reduce the total SRAM usage compared to using shorter cascades. - Updated calculation for estimated SRAM usage for elementwise ops. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I209bbf2d94425e4f6aacb1d151b3b2aa65c0870b
2022-07-13	MLBEDSW-6496 mlperf_deeplabv3_mnv2_ade20k_int8 fails at verify_output for u65	Fredrik Svedberg
	Added check to see if additional stripe data is needed from producer op when cascading to make sure the stripes are not overwriting data still being used. Also changed scheduler to make sure ResizeBilinear always runs with even stripe height. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328
2022-07-11	MLBEDSW-6261: Elementwise cascading	erik.andersson@arm.com
	Enabled elementwise cascading for binary/single variable IFM operators. Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd
2022-06-27	MLBEDSW-6639: Bug fix for evicted FMS in the fast storage allocator	Johan Alfvén
	- The fast storage allocator is supposed to add all feature maps that does not fit in SRAM to an evicted list. However, in the case when conflicting tensors were handled the list was not updated. -This patch makes sure to update the list correctly. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: Ibeb3b4e4927f22a8206784a478f1ac38bd7f5a87
2022-06-20	MLBEDSW-6347: Improved fast storage allocator	Johan Alfvén
	- The fast storage allocator only looked at tensor size, giving priority to larger tensors. The problem with this method is that it does not consider the actual read/write access of the tensor. So, a smaller tensor size can cause higher memory transactions than a bigger one. - The solution is to calculate the read/write access of the tensor and add that score to the decision when deciding where to place the tensors. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I59eb9bd3a44a0238b576cfd8f09ff27012b99070
2022-05-19	MLBEDSW-6563: networks failing with memory area exceeded in vela3.4.0.rc2	Tim Hall
	- For allocations that have a hard memory limit the Hill Climb allocator should be given more attempts to find a solution that would fit - The fix is to use a memory limit when there is a hard constraint, and a minimum iteration count, reset on every improvement, when there is a soft constraint - Added maximum number iterations CLI option Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
2022-05-19	MLBEDSW-6296: improvement_dram can become NaN	Tim Hall
	- Problem is due to a divide by zero - Fix is simply to detect and assign zero. This could also affect improvement_sram Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
2022-05-17	MLBEDSW-6271: MLCE: Layer wise Utilization info from Vela	Tim Hall
	- Added support to print per operator sram usage and performance information - Added new CLI option --verbose-performance to control this feature Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
2022-05-17	MLBEDSW-6296: Updated condition for the opt size weight buffering schedule	Johan Alfvén
	Allow schedule do be used when calculations says zero total improvement but calculations on the other hand shows there are dram improvement. When testing on real target, total performance is improvement. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b
2022-05-16	MLBEDSW-6263: Use separate tensors for double buffering	Rickard Bolin
	Uses separate tensors for the individual weight buffers in case of weight double buffering. Each weight buffer tensor gets its own individual live range. This patch is a clone of a previously reverted patch, but with some additional bug fixes applied. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
2022-05-12	MLBEDSW-6296: Regression caused by bigger weight buffering size3.4.0.rc1	Johan Alfvén
	- Due to that bigger weight buffer sizes are being used, there are use cases when feature maps are evicted from SRAM, causing the total performance to drop. - A way to improve this is to limit the memory for those weight buffer ops, to get the feature maps back to SRAM, and see if total performance is improved. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
2022-05-04	Revert "MLBEDSW-6263: Use separate tensors for double buffering"	Tim Hall
	This reverts commit cc5f4de1c35ba44fca7ff6295c6ae846f8242344. Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
2022-04-08	MLBEDSW-6339 Performance drop on wav2letter	Johan Alfvén
	Corrected calculation for used bufferering depth. Before change there were scenarios when it was set to smaller sizes than needed. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I162859ade78487e848510c6a605685e4568c7068
2022-03-30	Update version of Black to 22.3.0	Jonas Ohlsson
	Update version of Black to 22.3.0 due to updated dependencies. Updates to fix reported issues due to new version. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
2022-03-30	MLBEDSW-6263: Use separate tensors for double buffering	Louis Verhaard
	Uses separate tensors for the individual weight buffers in case of weight double buffering. Each weight buffer tensor gets its own individual live range. Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13 Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
2022-03-21	MLBEDSW-6298: MLCE: Unable to find a valid block config	Tim Hall
	- Fixed a bug due to ResizeBilinear modifying the attributes of a shared IFM - The ifm_resampling_mode is now an attribute of an operator rather than a tensor - Changed all calls to try_block_config() to use the attribute rather than recalculating it in multiple places Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I4641e9cd6b049bd4186776d98e3e751c5e5bcc06
2022-03-21	MLBEDSW-3367 Add mypy to pre-commit	Jonas Ohlsson
	Add mypy to pre-commit and clean up all reported errors. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
2022-03-14	MLBEDSW-6245: Bug fix fast storage allocator	Louis Verhaard
	Fast storage allocator did not always return an optimal allocation. Signed-off-by: Louis Verhaard <louis.verhaard@arm.com> Change-Id: Ic758b6c4a82dc2633c4752b0c204a27ed36f651b
2022-03-04	MLBEDSW-3367 Update pre-commit flake8 version	Jonas Ohlsson
	Update the version of flake8 used in pre-commit to facilitate adding mypy to pre-commit. Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com> Change-Id: I457dec87b77487ca6f14ff4a679c4cc927b272b0
2022-02-22	MLBEDSW-5880 Fixed Vela verbose weight flag	Ayaan Masood
	Original weights and encoded NPU weight now report correct size instead of zero when running vela with --verbose-weights flag (Code to update the aforementioned attributes was missing) Removed print references to unencoded NPU weight size Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
2022-02-21	MLBEDSW-6148: Reduce SRAM usage for elementwise op	Johan Alfvén
	Reduce memory footprint when using optimization strategy Size for elementwise operations. Signed-off-by: Johan Alfven <johan.alfven@arm.com> Change-Id: I30380aed587c31adbf7615f74179b4c5da686773
2022-02-08	MLBEDSW-5839: Port of improved spilling behaviour	erik.andersson@arm.com
	Ported the improved spilling behaviour from Regor into Vela. This replaces use_fast_storage_for_feature_maps with allocate_feature_maps and introduces the class called FastStorageComponentAllocator. Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: I34785840c905a79750a62863773015b00fb43387
2021-10-27	MLBEDSW-5450 MLCE: Vela to handle skip Tensor	Fredrik Svedberg
	Added checks to avoid merging elementwise op live ranges for subgraph inputs and outputs, which sometimes caused problems when parts of the network run on CPU. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: Id07ab277a205b8550d19a276559f8904b9a4b4be
2021-10-01	MLBEDSW-5286 - MLCE: IndexError for ADD + TANH network	Dwight Lidman
	Resolves a bug where an IndexError would occur if the same tensor was assigned to both IFM and IFM2 of a binary elementwise operator due to duplicates being allowed in operator inputs but not in pass inputs. Signed-off-by: Dwight Lidman <dwight.lidman@arm.com> Change-Id: I39a6206a6252f6a848be9f9d4c5a8dc749c71699
2021-10-01	MLBEDSW-5013 Output diff for u55-bring-up tests, int16	Fredrik Svedberg
	Fixed output diff for some architectures due to incorrect IFM buffer size calculation when using NearestNeighbour upscaling. Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: I0d6d1efc606603cdd6188ae282e7f6babfd7e24e
2021-09-07	MLBEDSW-5160 Fix constant data move to fast storage	Patrik Gustavsson
	Additional check added for when constant data can be moved to fast storage. Do not move constant data for concat. Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com> Change-Id: Ib8b5fd1483ee9fabe48e9874a5723af9b7c5231a
2021-09-06	MLBEDSW-4975 Fix semodepth asserts	Jacob Bohlin
	This commit fixes one assert regarding rolling buffers for 3D tensors. It also addresses another issue where the incorrect weight buffering was proposed for cascaded operators. Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com> Change-Id: I2501f35e5668b3085d917751cfc8002d250973d8
2021-08-23	MLBEDSW-4976: index errors in scheduler	Louis Verhaard
	- Fixed index error in memory_snapshot - When removing a cascade, also references are removed Change-Id: I2b35dc52671d8ce115eb32bfdd93584391d1fc6d Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>