ethos-u/ethos-u-vela.git

Age	Commit message (Collapse)	Author
2024-07-02	Merge branch 'upstream/main' into upstream/dev/ethos_u85dev/ethos_u85	Tim Hall
	Change-Id: I18bbb12d9a61ca8e6f0974712e1c35a99813e45b
2024-07-02	MLBEDSW-9247: Fix issue with less than 4 dimensions for strided slice	Rickard Bolin
	A bug was introduced when stride support was added to strided slice operators. Fix is to convert stride tensor to four dimensions before extracting height and width strides. Change-Id: Ief7159276980613331fa8b5b1df20158ba254f28 Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
2024-06-27	Merge remote-tracking branch 'origin/upstream/main' into dev/ethos_u85	Alexander Bengtsson
	Change-Id: Ibb1fb718fa35f2224351bef4eb48535f43f244fb Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>
2024-06-18	MLBEDSW-9198: Additional overflow errors when using NumPy 2.0	Rickard Bolin
	Additional testing uncovered two more pieces of code with overflow errors after updating to the latest version of NumPy. Solution is to force larger type in the same way as for the other issues. Change-Id: I40a04e115509055710a70a5767358a4bb8a9cd33 Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
2024-05-16	MLBEDSW-8561: Striding support in H/W for StridedSlice3.12.0.rc1	Rickard Bolin
	Change-Id: Ie6f39d9c4125f7c16d27621de47cd76143c2e636 Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
2024-05-14	Add support for Ethos-U85	Tim Hall
	- Added Ethos-U85 support via the --accelerator-config CLI option Change-Id: Ia3c77dbf61c1b7fa9cb03f8f51d336de2f115a3a Signed-off-by: Tim Hall <tim.hall@arm.com>
2024-04-24	MLBEDSW-8969: Enable weight buffering for fully connected with batch shape	Johan Alfven
	- Fully connected with batch shape will use the weights more than once. Models with these type of fully connected will benefit from weight buffering. - If a fully connected op with this shape is detected it is changed to a conv2d and the normal weight buffering flow will be used. Change-Id: I272741a32390e036d5e04bd5af41d4538162e86e Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2024-04-03	MLBEDSW-8875: MLCE: Update criteria when to move SplitSpliceRead to consumer	Johan Alfven
	- When possible, a read slice from a split or stride is moved to the following op. The problem in this case was that the following op was a Maxpool op (from Softmax). The Maxpool op is using a different input shape compared to the original Softmax op, and this input shape was then changed when the read slice was applied to the Maxpool op. - The result is a faulty Maxpool op with an output diff. - The fix is to prevent moving the slice read when the consumer input shape differs from the Split/Stride ofm shape Change-Id: I649d89c38645fa51c20c3602954e2b8af9372076 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2024-03-06	MLBEDSW-8749: MLCE: Output diff on strided slice	Johan Alfven
	- When possible, a read slice from a split or stride is moved to the following op. The problem in this case was that the following op was an elementwise op where the ifm needed to be broadcasted and that is not supported. - The result is a faulty elementwise op with an output diff. - The fix is to prevent moving the slice read to the elementwise op if broadcasting is needed. Change-Id: I89928c217510a822f91f051fd1ad6e34040c19de Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2024-01-30	MLBEDSW-8491: Add support for Mirror pad	Rickard Bolin
	Change-Id: I3c13118e14195a5fb8e522a38b205b75fb07b74b Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
2023-11-09	MLBEDSW-8290: MLCE: Add TRANSPOSE support3.10.0.rc1	Johan Alfven
	- Added graph optimiser function to convert TRANSPOSE op into an AvgPool op with swapped stride for height and width - Added TRANSPOSE supported op check - Added unit tests for TRANSPOSE supported op check - Updated SUPPORTED_OPS.md - Fixed problem in pass packing when optimizing the pass list. Old problem, but now seen when moving TRANSPOSE from cpu. Change-Id: I0a0ef420b0fb8241090c2e2434622881105cde15 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-10-31	MLBEDSW-8219: Activation can not be fused with dma operation	Johan Alfven
	- A reshape followed by an activation function was converted to a Memcpy with fused activation. The problem is that Memcpy does not support activation so no activation was executed. - Added logic to prevent activation functions to be fused with the Memcpy. Change-Id: Ibc7d985e5037146dd1f6cb2601407d0f8b865ac6 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-10-31	MLBEDSW-8201: [MLCE] Extended stride support for CONV_2D	Johan Alfven
	- Added support for stride_h > 3 when ofm height is 1 - Added support for stride_w > 3 when ofm width is 1 - Updated constraints - Updated tests - Updated SUPPORTED_OPS.md Change-Id: I8f89909b05a0f052df5f03702966cee50da61cfc Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-09-18	MLBEDSW-8042: MLCE: Add SQUARED_DIFFERENCE support	Johan Alfven
	- Added SQUARED_DIFFERENCE support - Updated SUPPORTED_OPS.md Change-Id: Id83d9d92129e645390c7979759dfdeff7a14c2ee Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-09-14	MLBEDSW-8010: Refine fixup_pool_strides to also check stride	Johan Gunnarsson
	Only set stride to (1, 1) if kernel, stride and IFM shape all are equal. And also set padding to VALID to handle ops with SAME padding. Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com> Change-Id: Id3cc34686f09667ea21541fac432351555344e3d
2023-09-14	MLBEDSW-8003: Limit fixup_pool_strides to AvgPool and MaxPool	Johan Gunnarsson
	This fixup is not relevant for Resize ops. Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com> Change-Id: I81b9d3c8a6dd820b1e5d747d754100282b93c641
2023-09-12	MLBEDSW-7997: [MLCE] Extended stride support for TRANSPOSE CONV	Johan Alfven
	- Support for stride WxH 1x1 - Support for stride WxH 2x1 when IFM and KERNEL is 1D shape with height 1 - Added test to supported operators - Updated SUPPORTED_OPS.md Change-Id: Ic1abead8399a5e14a78d962f8aded0d3b3dbfcc4 Signed-off-by: Johan Alfven <johan.alfven@arm.com>X
2023-09-05	MLBEDSW-7968: Add fixup for strides when kernel size equals IFM shape	Johan Gunnarsson
	There are networks out there with Pool ops with filter (W, H) equals IFM (W, H) equals stride (W, H). The stride is technically too large for the NPU, but we can actually run these ops in the NPU since the filter is large enough the window doesn't slide. To support these ops we need to fix the stride so later checks don't put this op on CPU. Change-Id: I8f0a46b26fb94ee76c33748589536cc5ba07ea59 Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
2023-08-29	MLBEDSW-7881: Convert Quantize op to Avgpool op in graph optimiser	Johan Gunnarsson
	This convert is already done in the pass packing stage, but doing it in the graph optimiser stage is better. Change-Id: Ib9baa98d115cf88491ce39936972a93467a378ce Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
2023-08-09	MLBEDSW-7754: Performance estimator is not using write/read shapes3.9.0.rc1	William Isaksson
	- npu_performance now uses write/read shapes instead of using ifm/ofms for memory cycle estimations. - also fixes a would be bug in the tflite_graph_optimiser, where one read shape is not Shape4D. Change-Id: I2067069a713d2cf9e65a5cc227e803de79940fff Signed-off-by: William Isaksson <william.isaksson@arm.com>
2023-07-12	MLBEDSW-7756: MLCE: Grouped convolutions runtime problem	Tim Hall
	- Added graph optimiser function to convert convolution groups into a split followed by separate convolutions and then a concat - Added semantic check for convolution groups - Added unit tests for convolution groups semantic checks - Fixed a minor typing issue with test_constraint_stride_range Change-Id: I78ade408aa23469a79c9f517c4751da8619b77a9 Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-07-11	MLBEDSW-7653: Extend Mean support for depth axis	Alexander Hansson
	If any of H,W axes have shape 1, the IFM can be reshaped to support reduction over the depth axis. Signed-off-by: Alexander Hansson <Alexander.Hansson@arm.com> Change-Id: I432ff1c399b7cee4ca5f0a8f4461e9c0a936d804
2023-07-11	MLBEDSW-7652: Add mean support for batch and channel when shape is 1	Alexander Hansson
	- Add support for batch and depth channels when shape is 1 - Refactor reshaping in convert_mean_to_depthwise_conv Signed-off-by: Alexander Hansson <Alexander.Hansson@arm.com> Change-Id: If663395934ab58c76ba92b6ebaaf484a389ae699
2023-06-20	MLBEDSW-7449: Add function description and type annotations	Raul Farkas
	Add function description and type annotations to the optimization functions missing them. Fix type annotation issue when re-assigning variable value to a different type. Change-Id: I1ee442ff7a29cc07708fdd013430131eff599dd5 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-06-19	MLBEDSW-7654: Extend support for Mean where HxW > 4096	Alexander Hansson
	* Convert Means with large IFMs to several DeptwiseConv2DBias and Add operations. * Update tflite supported operator check with new height and width constraints. * Update unit-tests to verify supported operator changes. * Fix output-diff for 2D IFMs (MLBEDSW-7772) Signed-off-by: Alexander Hansson <Alexander.Hansson@arm.com> Change-Id: Ifae6fb1cdac475ae7dac5116c5f13631ff82108a
2023-06-16	MLBEDSW-7315: Add support for AvgPool with stride_width > 3	Raul Farkas
	* Convert AvgPool with stride_width > 3 and Valid padding to Conv2D to optimize it to run on NPU. Change-Id: I06ab412357f0b09b1498f9019a9d1963a324ad34 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-06-16	MLBEDSW-7648: Fix bug with filter padding in conv2d	Raul Farkas
	* Fix bug that caused filter padding to not be added proportionally compared to the hardware padding added to IFM. * Update needed_total_padding function that calculates hardware padding to also account for the cases in which IFM width is not divisible by the stride width. * Update supported ops constraint on strides for conv2d to mark ops with stride width > 3 and IFM width that is not divisible by the optimization resize factor as not supported. * Update unit tests that verify correct functionality when checking whether ops are supported or not. Change-Id: I62f14cca890b779ca787a9603fa37c873ad522f8 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-06-14	MLBEDSW-7748: Add RSQRT support	Johan Alfven
	- Added RSQRT int8 support, implemented as LUT. - Added test to supported operators - Updated SUPPORTED_OPS.md Change-Id: I34904772e044be8d22a6dfe426edf85358a205b7 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-05-17	MLBEDSW-7223: Fusing Pad and AvgPool causes diff	Tim Hall
	- Fixed an issue with the fusing of PAD and AVERAGE_POOL_2D whereby the rounding away from zero didn't work because it requires the zero point to be at zero but the input padding required it to be set to the desired zero point. This affected both int8 and int16. The solution was to remove it by using the bias prior to the scaling - Refactored the rounding away from zero mode Change-Id: I8f2df69df06d2a9722315c346646e5a901cb2c3b Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-05-10	MLBEDSW-7283: Add opt cases for strided CONV2D	Raul Farkas
	* Implement a general optimization solution for strided CONV2D that supports a stride_w with no upper bound. * Implement filter zero padding to allow for optimization in those cases in which the filter width is not divisible by the stride width. E.g.: Filter width = 8, stride width = 3 -> Filter width = 8 + 1 (0 padding) = 9, stride width = 3 * Implement partial optimization to reduce the stride to hw supported strides (i.e. 2 and 3) when optimizing to reach a stride = 1 is not possible due to the IFM width not being divisible by the stride width. * Implement optimization for when SAME padding is used. If the pre-opt and post-opt padding do not match, add zero padding to the filter so that the post-opt IFM padding matches. Change-Id: Ia66b0d107281fa9993f6bf4d0c26627ee743253b Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-05-10	Revert "MLBEDSW-6343: Remove op_index constraint"	Raul Farkas
	This reverts commit 72c6a2414205e033279f80b622cdf479c05a4f5b. Reason for revert: Fix performance regression caused by breaking cascades in certain models Change-Id: I5aba6e3c59ab27c5129f4a3f0c320ed18df78943 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-05-02	MLBEDSW-2082: Add Exp support	Johan Alfven
	- Added int8 and int16 Exp support, implemented as LUT. - Added generic 8bit and 16bit LUT table functions following the implementation in the latest reference. If new ops are added by the reference, they can easily be implemented in Vela using the generic functions. - Moved convert_to_lut to lut.py to have all LUT related code in one file. - Updated SUPPORTED_OPS.md Change-Id: I388e76ea4b39162313599a5341cfb9bad71a782c Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-27	MLBEDSW-7527: Mean operator output diff	Rickard Bolin
	Mean operators with height larger than 64 are reshaped but the IFM shape was then reset to the original value, causing an output diff. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I3a89d4efac53173cbd6fe0a5c0542e028bed42ad
2023-04-21	MLBEDSW-7408: MLCE: Crash when serialising model LSTM	Tim Hall
	- Added checking and reporting of missing operator attributes when reading and writing TFLite file - Added a TFLite semantic check to ensure that all required attribute fields of builtin operators are read - Added some sanity checks for RESHAPE operators that run on the Ethos-U - Stopped CPU operators from having their attributes modified Change-Id: I05700681acdb09554f5945819717c08a9457295c Signed-off-by: Tim Hall <tim.hall@arm.com>
2023-04-19	MLBEDSW-7487: Updated implementation for the Mean op	Johan Alfven
	- Latest reference has changed implementation for the Mean op and now only contain one variant. - Updated Vela implementation to match reference. The full sum is first calculated and then divided by the numbers of elements. - Removed the avg pool variant and test case. - Updated SUPPORTED_OPS.md Change-Id: I4275e36e3697fa837f119f2cefd7c0ff94231605 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-17	MLBEDSW-7196 Add LSTM support	Fredrik Svedberg
	Added int8 and int16 UNIDIRECTIONAL_SEQUENCE_LSTM support. The implementation does not include support for: * CIFG * Peephole * Projection * Normalisation This change also: * Removed unused Op.BlockLSTM operation type. * Removed the only one consumer limitation on putting the SplitSliceRead on the tensor consumer(s), if all consumers fullfills the requirements * Added Op.VariableTensorWrite as a Operation.memory_function to make sure writes to variable tensors: * Always use linear mode * Are not moved to fast scratch * Are not fused with other elementwise operation tensor ranges Change-Id: Ief831738924ac3d1f2ba6d41f10bd6dc969911f3 Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
2023-04-12	MLBEDSW-7437: Add 64-bit output support for ArgMax	Johan Alfven
	- Added 64-bit support for ArgMax - Updated constraints for ArgMax and regenerated SUPPORTED_OPS.md Change-Id: I4ef7d2e6fccab0088b87757f6afe40a006c77bbd Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-04-04	MLBEDSW-7442: Removed ofm quantization for ArgMax	Johan Alfven
	- Quantization for the OFM was added for the ArgMax operator as a workaround in order to avoid a crash in the weight compressor. This quantization is now removed. - The weight compressor expects that all tensors have a quantization. Updated code to use scale = 1.0 and zero point = 0 for tensor without quantization. Change-Id: I6816dce2db55f7d795d19f88d7fbe7ee419347fc Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-31	MLBEDSW-7439: Add support for input dims < 4 for ArgMax	Johan Alfven
	- Updated ARG_MAX to support IFM rank less than 4 - Regenerated SUPPORTED_OPS.md Change-Id: Icd8e72733279413cbea49021325e1ab06fdc6011 Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-03-27	MLBEDSW-6343: Remove op_index constraint	Raul Farkas
	Remove op_index constraint and force linear format for all Conv2D that have strides that can be optimised. Change-Id: Idef3508ab074ea9abeacac030eaaa15a00ad1211 Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-03-22	MLBEDSW-6435: Implement support for ArgMax along depth dimension	Rickard Bolin
	- Add support for ArgMax along depth dimension with a depth limit of 127. - Only supports 8-bit input and 32-bit output Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I5f6f0503135bebabbb1ca637f9729587b7c60740
2023-03-16	MLBEDSW-7312: Refactoring bypass_memory_only_ops	Johan Alfven
	- The logic when bypassing memory only ops is complicated and it still does not fix all corner cases. - This patch simplifies the logic by always bypassing the op by replacing the IFM with the OFM. If that is not possible the memory only op is changed to an memcpy op. - The bypassing was previously done in two steps but is now reduced to one. Change-Id: I545dd65e0ec77c70be479a5ada2d277cac3a027c Signed-off-by: Johan Alfven <johan.alfven@arm.com>
2023-02-15	MLBEDSW-7211: Convert fixup_asymmetric_weights to supported ops check	wilisa01
	Changed default behaviour to place int8 ops with asymmetric quantization on cpu, and added an option to force symmetric quantization Change-Id: Ib9b717aaf61eae78833254ca3dfa745f4f253dc6 Signed-off-by: wilisa01 <william.isaksson@arm.com>
2023-02-15	MLBEDSW-7342: Regression: Output diff for mlperf_deeplabv3_mnv2_ade20k_int8	wilisa01
	Swapped order of ifms to add in tflite graph optimiser. The output diff was caused by the second input tensor being placed on sram, despite there being no dma request to move it there. Change-Id: I2e83b669ba226c7e96a0bb0d46ba811434cf7bb6 Signed-off-by: wilisa01 <william.isaksson@arm.com>
2023-02-10	MLBEDSW-4960: convert_resizebilinear_1x1_to_add creates constant output tensor	wilisa01
	Sets second input tensor of resize op to be constant and refactored function. Signed-off-by: wilisa01 <william.isaksson@arm.com> Change-Id: I496764f18b4c1ae0fa1a828dd7a90e937a42d41b
2023-02-07	MLBEDSW-7237: CONV_2D stride 4 optimisation	Raul Farkas
	* Extend stride range from (1,3) to (1,4) * Add stride 4 support when optimising CONV_2D * Add some tests for various strides Change-Id: Iddaeb42c4a6e02695ecdd3740bc8b9dd59a7eb3c Signed-off-by: Raul Farkas <raul.farkas@arm.com>
2023-01-20	MLBEDSW-7151: MLCE: Difference in model output between x86 & aarch64	Tim Hall
	- The issue is due to undefined behaviour when casting a NumPy float to a NumPy unsigned integer which occurs in create_const_tensor() - The fix is to make sure that the values are first cast to a Python float - In addition, the values datatype argument has been removed from create_const_tensor() to stop the tensor and values datatypes getting out of sync Change-Id: I134b9be8c941b361929a5ae7db8cb35f2e9728f2 Signed-off-by: Tim Hall <tim.hall@arm.com>
2022-12-22	MLBEDSW-7203: Data type alias deprecations	Rickard Bolin
	Deprecation of some data type aliases in NumPy version 1.24.0 caused Vela to crash when using Python version 3.8 or above. Replaced the deprecated aliases. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ide167ee864a340194ec5e69537c8718192c78ace
2022-11-17	MLBEDSW-6915: MLCE - Missing operators in Debug DB3.6.0.rc2	wilisa01
	- Adds missing operators and type conversion recording to DebugDB Change-Id: If76b0b430bbe73ae1469024c3160ecf0eea26abe Signed-off-by: wilisa01 <william.isaksson@arm.com>
2022-11-16	MLBEDSW-6620: Update copyright notice and years	Rickard Bolin
	- Update copyright notices to use SPDX format and add OSS mail as contact. - Update years on files where it had been missed. Signed-off-by: Rickard Bolin <rickard.bolin@arm.com> Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f