Age | Commit message (Collapse) | Author |
|
Updated FlatBuffers autogenerated files to TensorFlow 2.11
Change-Id: Ied60f9fbacdcf91ec8d289cafbde0d88169bb349
Signed-off-by: wilisa01 <william.isaksson@arm.com>
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
- The issue is due to undefined behaviour when casting a NumPy float
to a NumPy unsigned integer which occurs in create_const_tensor()
- The fix is to make sure that the values are first cast to a Python
float
- In addition, the values datatype argument has been removed from
create_const_tensor() to stop the tensor and values datatypes getting
out of sync
Change-Id: I134b9be8c941b361929a5ae7db8cb35f2e9728f2
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
- Previously a feature was added in order to reduce SRAM usage
when optimizing for Size. An investigation has now been done
that shows that this feature is also beneficial when optimizing for
Performance and hence this patch removes the Size only limitation.
Change-Id: I5b130db43cbda47e09d4196ab1daa5a21e35ae00
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Fixed an assert that was caused by a model that has a reshape operator
followed by another reshape operator. This structure has never been
thought of. However, since there is no need for the first reshape just
remove it from the path while traversing the graph.
Change-Id: I2a939df37502028ffc07115ac87e85375484efee
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The uncascaded SRAM usage for an op in the cascade builder did not
take into account that OFM will be reusing the IFM for elementwise ops
and resulted in wrong values for the uncascaded memory.
- Changed code to use the _estimate_sram_usage since this
function does the calucation correctly.
Change-Id: I681bcf6e45ee869bbfb92306869b18ee4a838325
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Deprecation of some data type aliases in NumPy version 1.24.0 caused Vela
to crash when using Python version 3.8 or above. Replaced the deprecated
aliases.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ide167ee864a340194ec5e69537c8718192c78ace
|
|
- Fixed a problem where buffered weights were only used
in the first stripe that was produced. The following stripes
read the weights from permanent storage.
Change-Id: I176909fa0e2edbecf80e8ec8ac136f42d5d3bcd4
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- When operators are cascaded, there are rolling buffers
used between the producer and the consumer operator.
Depending on the attributes, like strides, there was a use
case when the allocated intermediate buffer was too small
and resulted in a buffer overflow. The problem was that
the producer ofm stripe width was greater than the consumer
ifm stripe width.
- Changed the allocation to use the max of the producer width
and consumer width
Change-Id: I5aa20795eac5591d254b2163deec329cf9325a1b
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I026facce572ddce4249e05529f2bb1d285552ab9
|
|
IFM's in persistent memory should not be included in the memory
op SRAM calculation.
Change-Id: Iaac4d2ad8b206c5fb727e5815477cb3611a13e0e
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- When introducing the support for reversed operands the npu performance
was not updated. The result is larger numbers (degrade) from the performance
estimater compared to the previous release. In reality there is no degrade
and the real performance is the same.
- Updated npu performance to reflect the behavior implemented by the
reversed operands attribute.
Change-Id: I1b37a07f25def8f7a8adbdaadcf931bfe49165cb
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- Only 1D bias shape is supported
- Modified test to reflect the constraint
- Update SUPPORTED_OPS.md
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I00ae4b229d5f89512cb94f87f276af61cc66a6fd
|
|
- The cascade builder estimates how much SRAM usage an operator
takes when calculating the cascades. If an elementwise operator
is included in a cascade the IFM2 will always be a constant/scalar
and the IFM2 will be in permanent memory and the size of the
IFM2 should not be included in the SRAM estimate.
- The scheduler did not take into account that IFM can be reused
for the OFM when calculating the op memory usage resulting in
a negative number for non-local memory usage. Corrected the
calculation and added assert to detect future problems.
Change-Id: Id7ec8fe1ec5560290f34579a7b9203a75067aba2
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
Investigated all code linter output and fixed non-intentional
reports.
Change-Id: If49d6df8fe1a8a6ae4f1e28de4889a8c5763a0b3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Added references to performance CSVs and documented per-layer performance.
Also removed a space that caused black in pre-commit to fail.
Change-Id: Ia20cb381654cc6344c68bcaad0a7dfc517d55e63
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Adds missing operators and type conversion recording to DebugDB
Change-Id: If76b0b430bbe73ae1469024c3160ecf0eea26abe
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Update copyright notices to use SPDX format and add OSS mail as contact.
- Update years on files where it had been missed.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7e9715ea4e17b76252728c708e46df12ad67ab1f
|
|
- Added graph optimisation pass to support dilations greater than 2
in either dimension
- Removed supported operators restrictions
- Removed erroneous dilation on TRANSPOSE_CONV
- Updated unit tests and documentation
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ide302374b0d5eff25c20501383a63f6aa7625c52
|
|
- Removed unused variable total_npu_weights to fix summary csv error
Change-Id: Id3c94166a787d2bb094ac6c6612fc866811515c2
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
- Fixed the reporting of the input network operator to correctly
report the original operator type rather than the current one
- Fixed a divide by zero bug when calculating percentages
- Refactored the verbose-performance code so that console and csv
outputs use a single definition of the header and data
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibd3fa99b65f0602dcdcff696f2d565ac13453306
|
|
Fixed by adding an operation to copy the statically optimised
data to the subgraph output.
Change-Id: Ica757e37d5460237973444ffd39c7d2850f319e3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
The reference kernel for the MEAN operator has changed.
As a result, the mean implementation can be simplified
and the constraint for mean int8 can be removed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I318e9b495eefea99e7ac4aea4b8c436c83753405
|
|
- A bug was introduced by using the original_shape attribute that
causes CPU CONV2D ops to fail to run due to an incorrect weight
tensor shape
- This was due to the original_shape not being modified when a
transpose was performed on the weight tensor
- The fix was to transpose the original_shape just like the current
shape
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ied72316463d26c502cf931b9dd5784041c42ab66
|
|
- CPU side always needs to work we the original tensor shape.
Due to a bypass memory optimization the IFM, produced by CPU,
was stored with the wrong shape in the optimized file.
- Store the original tensor shape so it can be correctly
written to the optimized file.
Change-Id: I666dbcb0acd806ad208c0f925a51dfc25421688b
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
|
|
- The previous patch the always replaced ifm with ofm
introduced unnecessary avg pool ops for some cases.
That patch has been reverted and this is a new solution.
- Replace ifm with ofm for the following condition:
a) Ops that are dependent that the original ifm tensor
shape is not changed by the bypass memory op function.
b) When the memory op has different IFM and OFM rank.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I16a023e169ae64c5db46f6f88516a5e1ca7ed7ef
|
|
This reverts commit 5060ff53f5ac2382e04a68d7772bd71a36f63845.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8dd7e9ed8325fd2e8c17509fd9757292706f5ee7
|
|
Always make sure the bias is a 1D tensor.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ic0cb85d4fb9d2e07b4d1b7ac6059bffa432e28a3
|
|
- Due to a SPLIT op the following ADD op did get an IFM shape
that is bigger than its original shape but that is handled
by read_offset and read_shapes. The problem was that
the IFM was considered not be primary and an erroneously
swap was done.
- Make it even more clear when the swap is allowed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I0aefa04234f66c935f269267ae8ed1d77da64c81
|
|
Corrected offset calculation for operator Slice. All values
in tensor begin and tensor size must be used to calculate the
offset range in order to read the correct data.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ic463d8f72a2167f8129109b8dcf005f034cce6ed
|
|
- Remove very long live ranges that are standing out compared to
its neighbors. This can be seen on large networks with complex
structure. If they are chosen instead of shorter live ranges,
it will be difficult for the HillClimb Allocator to find a perfect
fit in the final allocation.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I6cf23adfdc06c1e93e12e9cf816453d940ff31f7
|
|
- Refactored erroneously if statement that allowed illegal
swapping between ifm1 and ifm2 for elementwise operators.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iec571f710824432edac9104d960f199f33a1b241
|
|
- The algorithm for trying out different stripes in order
to optimize a sub schedule/cascade, have a problem that it
can split the initial cascade into several smaller cascades.
The problem with this is that it will increase IFM/OFM DRAM
bandwith and performance will drop.
- Changed the stripe algorithm to prefer long cascades.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I4f38b381597b7094819e9dd463aa1876e4e6bc62
|
|
- The cascade builder is using the ifm_ifm2_correct_order
function in order to decide if the operator is cascadable or not.
The problem is that this function expects a full shape or no shape
and the cascade builder did not provide that, so the operator was
reported to be non cascadable.
- The fix is to provide a full 4D shape, also refactoring
ifm_ifm2_correct_order to use 4D shape to avoid confusion
in the future.
- Refactoring code so that the scheduler can perform a
correct ifm and ifm2 swap.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9a86c4690612f332afa428456a07e67698852495
|
|
Change code in cascade builder to instead
use common functionality in live range.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I7bbd7ea3d1e7e085813e9d93256a54e6bab2267b
|
|
- Vela failed to compile networks with multiple subgraphs because
only cascaded passes in the root subgraph were used when
extracting the live ranges. The fix is to extract the subgraph
range live on Ops that have connected subgraphs.
- The tf_writer did not handle multiple subgraphs in a correct way
resulting in corrupt buffer data in the optimized tflite file. The buffer
index must be unique for every tensor.
-Added support to handle multiple subgraphs for the OfflineMemoryAllocation
meta data. The change will not change behavior for single graphs.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2328dfc1f07e2e4faf43a75423ea95423096ffa3
|
|
- The op contained supported operator checks for both the stride being
in the range 1 to 3, and being equal to 2. Whilst both are correct, only
the later is needed
- Removed the stride in the range 1 to 3 check for TRANSPOSE_CONV
- Regenerated the documentation
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I9789cdbd3ed65ce310f1529036abbac62296d2ca
|
|
If IFM operator shape is rewritten so that batching
is greater than one for fully connect, the OFM batch
must also be calculated. This change will fix output diffs
for networks that have fully connect OFM with rank greater
than 2.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I5009edc647a1449a02c8116b45808c1c68beffe6
|
|
- Removed half pixel centers constraint for resize nearest neightbor.
- Supported scale 2x, 4x and 8x.
- Removed test_constraint_resize_half_pixel_centers
- Regenerated SUPPORTED_OPS.md
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ic3e02e9c2b2034d537c9a9841b8fb4ee433c96dc
|
|
Fixed output diff when cascading elementwise operators with
reversed operand order.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iac2e28cfb53037b929459af213f4fa7715b3e6de
|
|
The problem was that the updated conditions for elementwise
cascading was to permissive after the RescaleAdd removal.
Conditions for elementwise updated and transpose convolution
removed from cascading since it does have issues.
Change-Id: I0151256c4e3905fad39152941eec44bc76035d30
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
The palette variable located on the stack was not properly
initialized and could potentially overwrite the stack memory
when palette size was increased to 2.
Make sure lut value is initialized.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9fecfe218dc39c0157d1af015e725d1e4becf2f0
|
|
Removed RescaleAdd and RescaleMul operators in favour of
Operation.explicit_scale and removed Operation.rescale.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Idccd8851731d4bb8d4e84970e0fd6b409d7d4e45
|
|
- Updated to TensorFlow 2.10 and FlatBuffers 2.0.7
- Changed absolute to relative imports in the auto-generated code
- Updated Vela's TFLite writer to support FlatBuffer builder's internal
number of elements count
- Removed use of deprecated numElems argument to FlatBuffer builder's
EndVector()
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: If447778134db81ae0ac374c7397e1140082372fd
|
|
Added unit tests for scaling including saturated multiplier test.
Change-Id: I87bb3a4bed8f62f5ef5cf3851b97f09ce42bf2b6
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Cleaned up bias tensor use in graph optimiser for Mean operator.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ibcbfa010a4de67d97181df664b420168d6883d1e
|
|
- In order to solve output diffs, the Reshape op was pushed
to the CPU. The problem was that the Mean op ifm shape
was replaced by the Reshape op ifm shape.
- This limitation is now removed. Changed implementation
how memory only ops are bypassed. Always replace the memory
only op ifm tensor with its ofm tensor. By doing this
the ifm tensor for the operator that is after the memory only
op is never changed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibcdebf33fd9b7a37f90984a129500b5dac52e5ea
|
|
Fixed bug when height is greater than max kernel height. The shape
of the weight must match the ifm shape.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I901a8af2edd5858bb15d53d85ef8e2389049ada7
|
|
Make the address_for_coordinate function a bit easier to read
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I854e1643a39108edc8b1de95198d30a1891fdfd1
|
|
The test failed since the tanh had batch size > 1.
Added checks for batch size for all supported operators.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I3570352740c40eb96bd9db965dfa3c91c81ff2ad
|
|
Added LeakyRelu to supported activation ops.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Icca27730946d02ec16159f988782567be716b594
|