ethos-u/ethos-u-vela.git

Age	Commit message (Collapse)	Author
2023-04-28	MLBEDSW-7503: Avoid changing buffer index for models with only CPU ops	Johan Alfven
	- When compiling a model that only contains CPU ops, Vela unnecessary adds an empty buffer. - This extra buffer is added due to that the fast scratch tensor always occupies index 1. - Since scratch and fast_scratch does not have any constant data they can use buffer 0. Change-Id: I25e1fb124deed7069641bde1f571b522c5bf763a Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-27	MLBEDSW-7530: Enable int16 input precision for mean operator	Rickard Bolin
	Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: Iaeb8f2cea0d3b576a6b138e64a882c701ac88ccb
2023-04-27	MLBEDSW-7527: Mean operator output diff	Rickard Bolin
	Mean operators with height larger than 64 are reshaped but the IFM shape was then reset to the original value, causing an output diff. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I3a89d4efac53173cbd6fe0a5c0542e028bed42ad
2023-04-25	MLBEDSW-6954: Update to TensorFlow 2.11	Rickard Bolin
	Updated FlatBuffers autogenerated files to TensorFlow 2.11 Change-Id: Ia39d30b06e9a37c9ab119d501ebf442f32167afe Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
2023-04-24	MLBEDSW-7501: Vela unnecessary adds reshaped weights tensors	Johan Alfven
	- Weights are internally cloned and reshaped/transposed when running on the NPU. This happens already in the reader. If the op is passed through to the CPU there are code that writes backs these clones but with another round of reshape/transpose. This adds extra tensors in the optimized file compared to the original file if the original tensors are subgraph inputs. - If the op is passed trough to the CPU the clones should not be written to the file. Solved this by setting the src_tensor when making the clone. Change-Id: I9f55d542c099882882920bffe8e15b43b2ca2c8d Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-24	MLBEDSW-7458: Fused activation not passed through correctly	Johan Alfven
	- Fixed a problem where the fused activation got lost when the op was passed through to the CPU - The fix is to always make sure the attribute is not removed Change-Id: I612cfa8f6f0a0465459080762094fe61e7ddc1c3 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-21	MLBEDSW-7373: Vela sometimes write empty buffers in incorrect format	Tim Hall
	- Fixed an issue whereby a zero length buffer was written out instead of an empty buffer - Added a warning message to highlight when this type of semantically incorrect empty buffer is read from an input network Change-Id: Iac3bc71a2dbfda53737bbeb6e7f895552f0f13d0 Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-04-21	MLBEDSW-7408: MLCE: Crash when serialising model LSTM	Tim Hall
	- Added checking and reporting of missing operator attributes when reading and writing TFLite file - Added a TFLite semantic check to ensure that all required attribute fields of builtin operators are read - Added some sanity checks for RESHAPE operators that run on the Ethos-U - Stopped CPU operators from having their attributes modified Change-Id: I05700681acdb09554f5945819717c08a9457295c Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-04-19	MLBEDSW-7487: Updated implementation for the Mean op	Johan Alfven
	- Latest reference has changed implementation for the Mean op and now only contain one variant. - Updated Vela implementation to match reference. The full sum is first calculated and then divided by the numbers of elements. - Removed the avg pool variant and test case. - Updated SUPPORTED_OPS.md Change-Id: I4275e36e3697fa837f119f2cefd7c0ff94231605 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-17	MLBEDSW-7196 Add LSTM support	Fredrik Svedberg
	Added int8 and int16 UNIDIRECTIONAL_SEQUENCE_LSTM support. The implementation does not include support for: * CIFG * Peephole * Projection * Normalisation This change also: * Removed unused Op.BlockLSTM operation type. * Removed the only one consumer limitation on putting the SplitSliceRead on the tensor consumer(s), if all consumers fullfills the requirements * Added Op.VariableTensorWrite as a Operation.memory_function to make sure writes to variable tensors: * Always use linear mode * Are not moved to fast scratch * Are not fused with other elementwise operation tensor ranges Change-Id: Ief831738924ac3d1f2ba6d41f10bd6dc969911f3 Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2023-04-12	MLBEDSW-7437: Add 64-bit output support for ArgMax	Johan Alfven
	- Added 64-bit support for ArgMax - Updated constraints for ArgMax and regenerated SUPPORTED_OPS.md Change-Id: I4ef7d2e6fccab0088b87757f6afe40a006c77bbd Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-04	MLBEDSW-7442: Removed ofm quantization for ArgMax	Johan Alfven
	- Quantization for the OFM was added for the ArgMax operator as a workaround in order to avoid a crash in the weight compressor. This quantization is now removed. - The weight compressor expects that all tensors have a quantization. Updated code to use scale = 1.0 and zero point = 0 for tensor without quantization. Change-Id: I6816dce2db55f7d795d19f88d7fbe7ee419347fc Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-31	MLBEDSW-7439: Add support for input dims < 4 for ArgMax	Johan Alfven
	- Updated ARG_MAX to support IFM rank less than 4 - Regenerated SUPPORTED_OPS.md Change-Id: Icd8e72733279413cbea49021325e1ab06fdc6011 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-27	MLBEDSW-6343: Remove op_index constraint	Raul Farkas
	Remove op_index constraint and force linear format for all Conv2D that have strides that can be optimised. Change-Id: Idef3508ab074ea9abeacac030eaaa15a00ad1211 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-03-24	MLBEDSW-7429: Add dev dependencies	Raul Farkas
	Add dev dependencies to pyproject.toml. They can be installed by using: `pip install ethos-u-vela[dev]` Change-Id: I212ed7c39c9c7e93896a1e6a25cff7c7102d2c7f Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-03-24	MLBEDSW-4178: Add automatic tag handling	Raul Farkas
	* Add automatic tag handling when building source distribution. Change-Id: Ia20df463ae3eddf78de7e0b710c9c2279ddf61cd Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-03-22	MLBEDSW-6435: Implement support for ArgMax along depth dimension	Rickard Bolin
	- Add support for ArgMax along depth dimension with a depth limit of 127. - Only supports 8-bit input and 32-bit output Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I5f6f0503135bebabbb1ca637f9729587b7c60740
2023-03-21	MLBEDSW-7430: Remove non local mem usage from cascade info	Johan Alfven
	- There is a latent bug when calculating the mem usage parallel to the sub schedule. The error is the calculation done when optimizing the sub schedules. There the cascade size is withdrawn from the snapshot usage to decide non local memory usage. The problem is that the cascade mem usage actually also includes non local memory so the end result will be zero. This is normally not a problem but it will be when starting to optimize sub schedule when optimizing for Size. - The solution is to not include the non local usage in the cascade info, the scheduler already have this information. - Corrected usage of persistent initial IFM. This size should not be included for Dedicated SRAM since only intermediate buffers are in SRAM. - Added some comment to clarify the code in the cascade builder. Change-Id: I473b36e0d69550ab6565f4ef028195636b362997 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-16	MLBEDSW-7352: Refactoring move_constant_data	Johan Alfven
	Refactoring move_constant_data in the scheduler. The use case currently only work for LUT tensor, so simplifying the logic. In order to make it work for other tensors one would also have to take into consideration memory usage when building cascades and also the use_fast_storage_for_feature_maps would be effected. Change-Id: Ic8de53b65a2c17d34515002d7f184d0ab1830222 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-16	MLBEDSW-7312: Refactoring bypass_memory_only_ops	Johan Alfven
	- The logic when bypassing memory only ops is complicated and it still does not fix all corner cases. - This patch simplifies the logic by always bypassing the op by replacing the IFM with the OFM. If that is not possible the memory only op is changed to an memcpy op. - The bypassing was previously done in two steps but is now reduced to one. Change-Id: I545dd65e0ec77c70be479a5ada2d277cac3a027c Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-14	MLBEDSW-6260: Add support for using DMA to copy feature maps	Johan Alfven
	- Reshape ops can be bypassed and there is no need to process them by the NPU. There are use cases when the IFM must be preserved so a memcpy is needed. This is implemented by an AvgPool. - In order to reduce the cost of the AvgPool the IFM can be copied by DMA. This is faster and also it can be turned into a real NOP in cases where the IFM and the OFM can use the same memory space. - Added new memcpy op. Only NHWC format supported since DMA can not change the format on the fly. - Allow ofm to reuse ifm for memcpy op - Make sure the DMA copy size is 16 byte aligned Change-Id: I3605a48d47646ff60d2bb3644dd3a23f872235a7 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-13	MLBEDSW-7427 Fix scale calculations for FullyConnected	Fredrik Svedberg
	Fixed scale calculations for FullyConnected to match the reference. Also removed unused low_precision_scaling. Change-Id: I4b766febff4a0010acd3de708bb49be458d22bf3 Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2023-03-13	MLBEDSW-7393: MLCE: Optimize compile time for large networks	Johan Alfven
	- There is a problem with large networks containing many NPU subgraphs. The scheduling takes too long time since the snapshot memory calculation is always doing a complete update for the full graph. - A complete run is needed in the end to calculate all the time indexes correctly. However, when scheduling a NPU subgraph it is enough to extract live ranges for the current schedule and its operators. Change-Id: Iccb7d6728119c1428ad0b45a2ac34e92158c15bd Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-10	MLBEDSW-7386: Fix assert in pass packing	Johan Alfven
	- The assert was caused due to a faulty optimization being done in the pass packing when trying to group CPU passes. The code did not take into account that a CPU op could have many outputs. -The fix is to make sure that the pass the follows the CPU pass is not dependent on any of the outputs from the CPU pass. If there is a dependency the CPU pass cannot be moved. Change-Id: Ia0c90bae1ed97d503a97e7bc353f834a0fa75130 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-06	MLBEDSW-7396: MLCE: Add num elements constraint on reshape	Johan Alfven
	Adding constraint for faulty reshape operators. Number of elements for IFM and OFM must be the same. Change-Id: I2e31e9d1e39b5aa3a0c595032a66e14374a0719e Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-02-16	MLBEDSW-7094: Update release notes3.7.0.rc2 3.7.0	Tim Hall
	- Updated release notes for 3.7.0 - Updated tag in SUPPORTED_OPS and setup.py - Tidied up README Change-Id: Ib33a3d85383ce297b10acd74f8a2455d738276be Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-02-15	MLBEDSW-7347: MLCE: Split followed by elementwise op will assert	Johan Alfven
	- The problem was that when the split slice read was moved to the tensor consumer, in this case an elementwise operator, this was not taken into account when the npu op for the element wise operator was created. The npu op was created with wrong ifm_width and ifm and ifm2 ended up with different sizes. As a result, broadcasting is expected but that is not True so the assert was triggered. - The fix is to use the ifm box in order to set the correct ifm_width for the npu operator. Change-Id: I3291d34e7f8e7add9caf2296cca600c60e96bf7e Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-02-15	MLBEDSW-7343: MLCE: Unsupported STRIDED_SLICE with negative index and ↵	Tim Hall
	shrinking an axis - The problem was that the end values of STRIDED_SLICE operators were not taking the shrink_axis_mask into account - The fix is simply to ignore the end value set on the operator and calculate one based upon shrinking the axis Change-Id: I2e5f2d3c9b08035dfd9b1629c775408f2356d1cf Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-02-15	MLBEDSW-7211: Convert fixup_asymmetric_weights to supported ops check	wilisa01
	Changed default behaviour to place int8 ops with asymmetric quantization on cpu, and added an option to force symmetric quantization Change-Id: Ib9b717aaf61eae78833254ca3dfa745f4f253dc6 Signed-off-by: wilisa01 <william.isaksson@arm.com>
2023-02-15	MLBEDSW-7342: Regression: Output diff for mlperf_deeplabv3_mnv2_ade20k_int8	wilisa01
	Swapped order of ifms to add in tflite graph optimiser. The output diff was caused by the second input tensor being placed on sram, despite there being no dma request to move it there. Change-Id: I2e83b669ba226c7e96a0bb0d46ba811434cf7bb6 Signed-off-by: wilisa01 <william.isaksson@arm.com>
2023-02-14	MLBEDSW-7316: Fix crash for networks with resource variables	Johan Alfven
	- The problem was that networks with resource variables have not been thought of. The major problem was the graph traversal where these ops were not visited resulting in an empty subgraph that resulted in the crash. - Fixed the problem by attaching virtual tensors to the ops simulating subgraph output. These tensors are only used to get the graph traversal to work. - Fixed serializing of attribute container and shared_name - Fixed subgraph index for operator CallOnce - All resource variable ops are pushed to the CPU Change-Id: I815f9c81baf7a3fbb686e895980b462f58208b6e Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-02-13	MLBEDSW-7274 Add support for Variable Tensors	Fredrik Svedberg
	Added support for Variable Tensor, including offline planning. Change-Id: I39f33fee207f1f1a4574a0f53f7377eec8709e15 Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2023-02-13	MLBEDSW-7250: pytest RuntimeWarning: overflow encountered in float_scalars	wilisa01
	Since test works by creating an overflow, sets NumPy to ignore overflow for this test case Change-Id: I74d03e8d73455295168352542dcb844283d54d33 Signed-off-by: wilisa01 <william.isaksson@arm.com>
2023-02-10	MLBEDSW-7100: Update mlw_codec copyright years	Rickard Bolin
	Some copyright years of files in the mlw_codec had not been updated during changes in late 2022. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: Iebab154127e5868202a805aff0125154ac1d3beb
2023-02-10	MLBEDSW-4960: convert_resizebilinear_1x1_to_add creates constant output tensor	wilisa01
	Sets second input tensor of resize op to be constant and refactored function. Signed-off-by: wilisa01 <william.isaksson@arm.com> Change-Id: I496764f18b4c1ae0fa1a828dd7a90e937a42d41b
2023-02-09	MLBEDSW-6982: Move to setup.cfg and pyproject.toml3.7.0.rc1	Raul Farkas
	- Move all static information from setup.py to newly added pyproject.toml - Add setup.cfg used for static information that cannot be added to pyproject.toml due to it still being in beta. - Modify mlw_codec to to throw a real python exception when importing NumPy arrays instead of just printing them to stdout. - Surround mlw_codec import with try catch statement to catch NumPy C API mismatch errors and throw them again with a more detailed message. - Update README.md with documentation about known issue with changing used NumPy version after installing ethos-u-vela. Change-Id: I1eeee5536be7c1744e30d6088f7069fbb1403e06 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-02-09	MLBEDSW-7331: Reinstate max stride height constraint of 3 for Conv2D	Raul Farkas
	Reinstate constraint for stride height to (1,3) instead of (1,4) for Conv2D and update unit tests. Change-Id: I17389ee040eeff0cea08279cab1c038e951569ea Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-02-09	MLBEDSW-7281: create_const_tensor OverflowError on Microsoft Windows	Tim Hall
	- Additional overflow checks are performed when running under Microsoft Windows compared to Linux. These checks happen when converting from Python int to NumPy int/uint - The problem is that the lut activation values are int32 type, however they are defined as Python ints. If these are converted to numpy.int32 it could result in an overflow error - The fix is to convert these values to uint32 but keep the operator's IFM tensor type the same (as this will allow them to be interpreted correctly) - Fixing this highlighted another problem where convert_to_lut always calls create_lut_tensor() with an int8 datatype, whereas it should be using the IFM datatype Change-Id: I781a9d850f654267aa4a67754438607c4bb95685 Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-02-07	MLBEDSW-7237: CONV_2D stride 4 optimisation	Raul Farkas
	* Extend stride range from (1,3) to (1,4) * Add stride 4 support when optimising CONV_2D * Add some tests for various strides Change-Id: Iddaeb42c4a6e02695ecdd3740bc8b9dd59a7eb3c Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-02-06	MLBEDSW-7284: MLCE: Fix assert for faulty Split op	Johan Alfven
	- An assert in Vela is triggered when the number of splits does not evenly divide the input.shape[axis] value and the split offsets are calculated wrongly. - The fix is to add the same constraints as in the reference kernel and only run the Split op on the NPU when the criterias are fulfilled. - Modified test to reflect the new constraints - Updated SUPPORTED_OPS.md Change-Id: I4103ff4a3fdf9a813f5fcb7f51081b859e611100 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-02-03	Revert "MLBEDSW-6954: Update to TensorFlow 2.11"	Rickard Bolin
	This reverts commit 9d254b6f9e76ccf266a0f72a0171e73bc8d435c9. Reason for revert: Due to 0-size constants being treated differently (MLTOOLS-2043) Change-Id: Ie1150fb2dd9092050a7fd44708a893d52ffe59f8 Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
2023-01-20	MLBEDSW-6954: Update to TensorFlow 2.11	wilisa01
	Updated FlatBuffers autogenerated files to TensorFlow 2.11 Change-Id: Ied60f9fbacdcf91ec8d289cafbde0d88169bb349 Signed-off-by: wilisa01 <william.isaksson@arm.com> Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2023-01-20	MLBEDSW-7151: MLCE: Difference in model output between x86 & aarch64	Tim Hall
	- The issue is due to undefined behaviour when casting a NumPy float to a NumPy unsigned integer which occurs in create_const_tensor() - The fix is to make sure that the values are first cast to a Python float - In addition, the values datatype argument has been removed from create_const_tensor() to stop the tensor and values datatypes getting out of sync Change-Id: I134b9be8c941b361929a5ae7db8cb35f2e9728f2 Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-01-16	MLBEDSW-7091: MLCE: Reduce SRAM, compiling for Performance	Johan Alfven
	- Previously a feature was added in order to reduce SRAM usage when optimizing for Size. An investigation has now been done that shows that this feature is also beneficial when optimizing for Performance and hence this patch removes the Size only limitation. Change-Id: I5b130db43cbda47e09d4196ab1daa5a21e35ae00 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-01-13	MLBEDSW-7231: MLCE: Fixed assert caused by multiple reshape op's	Johan Alfvén
	Fixed an assert that was caused by a model that has a reshape operator followed by another reshape operator. This structure has never been thought of. However, since there is no need for the first reshape just remove it from the path while traversing the graph. Change-Id: I2a939df37502028ffc07115ac87e85375484efee Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-01-12	MLBEDSW-7106: Add inclusive language statement to README	Rickard Bolin
	Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ibab6e94e6c02890ed03d50730bee7f23ac89b1fc
2023-01-10	MLBEDSW-7220: Updated uncascaded memory calculation	Johan Alfvén
	- The uncascaded SRAM usage for an op in the cascade builder did not take into account that OFM will be reusing the IFM for elementwise ops and resulted in wrong values for the uncascaded memory. - Changed code to use the _estimate_sram_usage since this function does the calucation correctly. Change-Id: I681bcf6e45ee869bbfb92306869b18ee4a838325 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-12-22	MLBEDSW-7203: Data type alias deprecations	Rickard Bolin
	Deprecation of some data type aliases in NumPy version 1.24.0 caused Vela to crash when using Python version 3.8 or above. Replaced the deprecated aliases. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ide167ee864a340194ec5e69537c8718192c78ace
2022-12-21	MLBEDSW-7206: Fixed weight buffering problem in cascading	Johan Alfvén
	- Fixed a problem where buffered weights were only used in the first stripe that was produced. The following stripes read the weights from permanent storage. Change-Id: I176909fa0e2edbecf80e8ec8ac136f42d5d3bcd4 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2022-12-21	MLBEDSW-7111: Changed shape calculation for the rolling buffer	Johan Alfvén
	- When operators are cascaded, there are rolling buffers used between the producer and the consumer operator. Depending on the attributes, like strides, there was a use case when the allocated intermediate buffer was too small and resulted in a buffer overflow. The problem was that the producer ofm stripe width was greater than the consumer ifm stripe width. - Changed the allocation to use the max of the producer width and consumer width Change-Id: I5aa20795eac5591d254b2163deec329cf9325a1b Signed-off-by: Johan Alfven <johan.alfven@arm.com>