Age | Commit message (Collapse) | Author |
|
- There is a latent bug when calculating the mem usage parallel to the
sub schedule. The error is the calculation done when optimizing the sub
schedules. There the cascade size is withdrawn from the snapshot usage
to decide non local memory usage. The problem is that the cascade mem
usage actually also includes non local memory so the end result will be
zero. This is normally not a problem but it will be when starting to
optimize sub schedule when optimizing for Size.
- The solution is to not include the non local usage in the cascade
info, the scheduler already have this information.
- Corrected usage of persistent initial IFM. This size should not be
included for Dedicated SRAM since only intermediate buffers are in SRAM.
- Added some comment to clarify the code in the cascade builder.
Change-Id: I473b36e0d69550ab6565f4ef028195636b362997
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Refactoring move_constant_data in the scheduler. The use case currently
only work for LUT tensor, so simplifying the logic. In order to make it
work for other tensors one would also have to take into consideration
memory usage when building cascades and also the
use_fast_storage_for_feature_maps would be effected.
Change-Id: Ic8de53b65a2c17d34515002d7f184d0ab1830222
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The logic when bypassing memory only ops is
complicated and it still does not fix all corner cases.
- This patch simplifies the logic by always bypassing
the op by replacing the IFM with the OFM. If that is not
possible the memory only op is changed to an memcpy op.
- The bypassing was previously done in two steps but
is now reduced to one.
Change-Id: I545dd65e0ec77c70be479a5ada2d277cac3a027c
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Reshape ops can be bypassed and there is no need to process them by the NPU.
There are use cases when the IFM must be preserved so a memcpy is needed.
This is implemented by an AvgPool.
- In order to reduce the cost of the AvgPool the IFM can be copied by DMA.
This is faster and also it can be turned into a real NOP in cases where
the IFM and the OFM can use the same memory space.
- Added new memcpy op. Only NHWC format supported since DMA can not change
the format on the fly.
- Allow ofm to reuse ifm for memcpy op
- Make sure the DMA copy size is 16 byte aligned
Change-Id: I3605a48d47646ff60d2bb3644dd3a23f872235a7
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Fixed scale calculations for FullyConnected to match the reference.
Also removed unused low_precision_scaling.
Change-Id: I4b766febff4a0010acd3de708bb49be458d22bf3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- There is a problem with large networks containing many NPU
subgraphs. The scheduling takes too long time since the snapshot
memory calculation is always doing a complete update for the
full graph.
- A complete run is needed in the end to calculate all the
time indexes correctly. However, when scheduling a NPU subgraph
it is enough to extract live ranges for the current schedule
and its operators.
Change-Id: Iccb7d6728119c1428ad0b45a2ac34e92158c15bd
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The assert was caused due to a faulty optimization being done
in the pass packing when trying to group CPU passes. The code
did not take into account that a CPU op could have many outputs.
-The fix is to make sure that the pass the follows the CPU pass is
not dependent on any of the outputs from the CPU pass. If there is a
dependency the CPU pass cannot be moved.
Change-Id: Ia0c90bae1ed97d503a97e7bc353f834a0fa75130
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Adding constraint for faulty reshape operators. Number of elements
for IFM and OFM must be the same.
Change-Id: I2e31e9d1e39b5aa3a0c595032a66e14374a0719e
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The problem was that when the split slice read was moved
to the tensor consumer, in this case an elementwise operator,
this was not taken into account when the npu op for the
element wise operator was created. The npu op was created
with wrong ifm_width and ifm and ifm2 ended up with different
sizes. As a result, broadcasting is expected but that is
not True so the assert was triggered.
- The fix is to use the ifm box in order to set the correct
ifm_width for the npu operator.
Change-Id: I3291d34e7f8e7add9caf2296cca600c60e96bf7e
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
shrinking an axis
- The problem was that the end values of STRIDED_SLICE operators
were not taking the shrink_axis_mask into account
- The fix is simply to ignore the end value set on the operator
and calculate one based upon shrinking the axis
Change-Id: I2e5f2d3c9b08035dfd9b1629c775408f2356d1cf
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
Changed default behaviour to place int8 ops with asymmetric quantization on cpu, and added an option to force symmetric quantization
Change-Id: Ib9b717aaf61eae78833254ca3dfa745f4f253dc6
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
Swapped order of ifms to add in tflite graph optimiser.
The output diff was caused by the second input tensor being placed on sram, despite there being no dma request to move it there.
Change-Id: I2e83b669ba226c7e96a0bb0d46ba811434cf7bb6
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- The problem was that networks with resource variables have
not been thought of. The major problem was the graph traversal
where these ops were not visited resulting in an empty subgraph
that resulted in the crash.
- Fixed the problem by attaching virtual tensors to the ops simulating
subgraph output. These tensors are only used to get the graph
traversal to work.
- Fixed serializing of attribute container and shared_name
- Fixed subgraph index for operator CallOnce
- All resource variable ops are pushed to the CPU
Change-Id: I815f9c81baf7a3fbb686e895980b462f58208b6e
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Added support for Variable Tensor, including offline planning.
Change-Id: I39f33fee207f1f1a4574a0f53f7377eec8709e15
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Since test works by creating an overflow, sets NumPy to ignore overflow for this test case
Change-Id: I74d03e8d73455295168352542dcb844283d54d33
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
Some copyright years of files in the mlw_codec had not been updated
during changes in late 2022.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Iebab154127e5868202a805aff0125154ac1d3beb
|
|
Sets second input tensor of resize op to be constant and refactored function.
Signed-off-by: wilisa01 <william.isaksson@arm.com>
Change-Id: I496764f18b4c1ae0fa1a828dd7a90e937a42d41b
|
|
- Move all static information from setup.py to newly added
pyproject.toml
- Add setup.cfg used for static information that cannot be added to
pyproject.toml due to it still being in beta.
- Modify mlw_codec to to throw a real python exception when importing
NumPy arrays instead of just printing them to stdout.
- Surround mlw_codec import with try catch statement to catch NumPy C
API mismatch errors and throw them again with a more detailed message.
- Update README.md with documentation about known issue with changing
used NumPy version after installing ethos-u-vela.
Change-Id: I1eeee5536be7c1744e30d6088f7069fbb1403e06
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
Reinstate constraint for stride height to (1,3) instead of (1,4) for
Conv2D and update unit tests.
Change-Id: I17389ee040eeff0cea08279cab1c038e951569ea
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
- Additional overflow checks are performed when running under
Microsoft Windows compared to Linux. These checks happen when
converting from Python int to NumPy int/uint
- The problem is that the lut activation values are int32 type,
however they are defined as Python ints. If these are converted to
numpy.int32 it could result in an overflow error
- The fix is to convert these values to uint32 but keep the
operator's IFM tensor type the same (as this will allow them to be
interpreted correctly)
- Fixing this highlighted another problem where convert_to_lut
always calls create_lut_tensor() with an int8 datatype, whereas it
should be using the IFM datatype
Change-Id: I781a9d850f654267aa4a67754438607c4bb95685
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
* Extend stride range from (1,3) to (1,4)
* Add stride 4 support when optimising CONV_2D
* Add some tests for various strides
Change-Id: Iddaeb42c4a6e02695ecdd3740bc8b9dd59a7eb3c
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
|
|
- An assert in Vela is triggered when the number of splits does
not evenly divide the input.shape[axis] value and the split offsets
are calculated wrongly.
- The fix is to add the same constraints as in the reference kernel
and only run the Split op on the NPU when the criterias are fulfilled.
- Modified test to reflect the new constraints
- Updated SUPPORTED_OPS.md
Change-Id: I4103ff4a3fdf9a813f5fcb7f51081b859e611100
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
This reverts commit 9d254b6f9e76ccf266a0f72a0171e73bc8d435c9.
Reason for revert: Due to 0-size constants being treated differently (MLTOOLS-2043)
Change-Id: Ie1150fb2dd9092050a7fd44708a893d52ffe59f8
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
|
|
Updated FlatBuffers autogenerated files to TensorFlow 2.11
Change-Id: Ied60f9fbacdcf91ec8d289cafbde0d88169bb349
Signed-off-by: wilisa01 <william.isaksson@arm.com>
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- The issue is due to undefined behaviour when casting a NumPy float
to a NumPy unsigned integer which occurs in create_const_tensor()
- The fix is to make sure that the values are first cast to a Python
float
- In addition, the values datatype argument has been removed from
create_const_tensor() to stop the tensor and values datatypes getting
out of sync
Change-Id: I134b9be8c941b361929a5ae7db8cb35f2e9728f2
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
- Previously a feature was added in order to reduce SRAM usage
when optimizing for Size. An investigation has now been done
that shows that this feature is also beneficial when optimizing for
Performance and hence this patch removes the Size only limitation.
Change-Id: I5b130db43cbda47e09d4196ab1daa5a21e35ae00
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Fixed an assert that was caused by a model that has a reshape operator
followed by another reshape operator. This structure has never been
thought of. However, since there is no need for the first reshape just
remove it from the path while traversing the graph.
Change-Id: I2a939df37502028ffc07115ac87e85375484efee
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The uncascaded SRAM usage for an op in the cascade builder did not
take into account that OFM will be reusing the IFM for elementwise ops
and resulted in wrong values for the uncascaded memory.
- Changed code to use the _estimate_sram_usage since this
function does the calucation correctly.
Change-Id: I681bcf6e45ee869bbfb92306869b18ee4a838325
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Deprecation of some data type aliases in NumPy version 1.24.0 caused Vela
to crash when using Python version 3.8 or above. Replaced the deprecated
aliases.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ide167ee864a340194ec5e69537c8718192c78ace
|
|
- Fixed a problem where buffered weights were only used
in the first stripe that was produced. The following stripes
read the weights from permanent storage.
Change-Id: I176909fa0e2edbecf80e8ec8ac136f42d5d3bcd4
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- When operators are cascaded, there are rolling buffers
used between the producer and the consumer operator.
Depending on the attributes, like strides, there was a use
case when the allocated intermediate buffer was too small
and resulted in a buffer overflow. The problem was that
the producer ofm stripe width was greater than the consumer
ifm stripe width.
- Changed the allocation to use the max of the producer width
and consumer width
Change-Id: I5aa20795eac5591d254b2163deec329cf9325a1b
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I026facce572ddce4249e05529f2bb1d285552ab9
|
|
IFM's in persistent memory should not be included in the memory
op SRAM calculation.
Change-Id: Iaac4d2ad8b206c5fb727e5815477cb3611a13e0e
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- When introducing the support for reversed operands the npu performance
was not updated. The result is larger numbers (degrade) from the performance
estimater compared to the previous release. In reality there is no degrade
and the real performance is the same.
- Updated npu performance to reflect the behavior implemented by the
reversed operands attribute.
Change-Id: I1b37a07f25def8f7a8adbdaadcf931bfe49165cb
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Only 1D bias shape is supported
- Modified test to reflect the constraint
- Update SUPPORTED_OPS.md
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I00ae4b229d5f89512cb94f87f276af61cc66a6fd
|
|
- The cascade builder estimates how much SRAM usage an operator
takes when calculating the cascades. If an elementwise operator
is included in a cascade the IFM2 will always be a constant/scalar
and the IFM2 will be in permanent memory and the size of the
IFM2 should not be included in the SRAM estimate.
- The scheduler did not take into account that IFM can be reused
for the OFM when calculating the op memory usage resulting in
a negative number for non-local memory usage. Corrected the
calculation and added assert to detect future problems.
Change-Id: Id7ec8fe1ec5560290f34579a7b9203a75067aba2
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Investigated all code linter output and fixed non-intentional
reports.
Change-Id: If49d6df8fe1a8a6ae4f1e28de4889a8c5763a0b3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Added references to performance CSVs and documented per-layer performance.
Also removed a space that caused black in pre-commit to fail.
Change-Id: Ia20cb381654cc6344c68bcaad0a7dfc517d55e63
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Adds missing operators and type conversion recording to DebugDB
Change-Id: If76b0b430bbe73ae1469024c3160ecf0eea26abe
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Update copyright notices to use SPDX format and add OSS mail as contact.
- Update years on files where it had been missed.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f
|
|
- Added graph optimisation pass to support dilations greater than 2
in either dimension
- Removed supported operators restrictions
- Removed erroneous dilation on TRANSPOSE_CONV
- Updated unit tests and documentation
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ide302374b0d5eff25c20501383a63f6aa7625c52
|
|
- Removed unused variable total_npu_weights to fix summary csv error
Change-Id: Id3c94166a787d2bb094ac6c6612fc866811515c2
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Fixed the reporting of the input network operator to correctly
report the original operator type rather than the current one
- Fixed a divide by zero bug when calculating percentages
- Refactored the verbose-performance code so that console and csv
outputs use a single definition of the header and data
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibd3fa99b65f0602dcdcff696f2d565ac13453306
|
|
Fixed by adding an operation to copy the statically optimised
data to the subgraph output.
Change-Id: Ica757e37d5460237973444ffd39c7d2850f319e3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
The reference kernel for the MEAN operator has changed.
As a result, the mean implementation can be simplified
and the constraint for mean int8 can be removed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I318e9b495eefea99e7ac4aea4b8c436c83753405
|
|
- A bug was introduced by using the original_shape attribute that
causes CPU CONV2D ops to fail to run due to an incorrect weight
tensor shape
- This was due to the original_shape not being modified when a
transpose was performed on the weight tensor
- The fix was to transpose the original_shape just like the current
shape
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ied72316463d26c502cf931b9dd5784041c42ab66
|
|
- CPU side always needs to work we the original tensor shape.
Due to a bypass memory optimization the IFM, produced by CPU,
was stored with the wrong shape in the optimized file.
- Store the original tensor shape so it can be correctly
written to the optimized file.
Change-Id: I666dbcb0acd806ad208c0f925a51dfc25421688b
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The previous patch the always replaced ifm with ofm
introduced unnecessary avg pool ops for some cases.
That patch has been reverted and this is a new solution.
- Replace ifm with ofm for the following condition:
a) Ops that are dependent that the original ifm tensor
shape is not changed by the bypass memory op function.
b) When the memory op has different IFM and OFM rank.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I16a023e169ae64c5db46f6f88516a5e1ca7ed7ef
|
|
This reverts commit 5060ff53f5ac2382e04a68d7772bd71a36f63845.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8dd7e9ed8325fd2e8c17509fd9757292706f5ee7
|
|
Always make sure the bias is a 1D tensor.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ic0cb85d4fb9d2e07b4d1b7ac6059bffa432e28a3
|