Age | Commit message (Collapse) | Author |
|
- Due to a SPLIT op the following ADD op did get an IFM shape
that is bigger than its original shape but that is handled
by read_offset and read_shapes. The problem was that
the IFM was considered not be primary and an erroneously
swap was done.
- Make it even more clear when the swap is allowed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I0aefa04234f66c935f269267ae8ed1d77da64c81
|
|
Corrected offset calculation for operator Slice. All values
in tensor begin and tensor size must be used to calculate the
offset range in order to read the correct data.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ic463d8f72a2167f8129109b8dcf005f034cce6ed
|
|
- Remove very long live ranges that are standing out compared to
its neighbors. This can be seen on large networks with complex
structure. If they are chosen instead of shorter live ranges,
it will be difficult for the HillClimb Allocator to find a perfect
fit in the final allocation.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I6cf23adfdc06c1e93e12e9cf816453d940ff31f7
|
|
- Refactored erroneously if statement that allowed illegal
swapping between ifm1 and ifm2 for elementwise operators.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iec571f710824432edac9104d960f199f33a1b241
|
|
- The algorithm for trying out different stripes in order
to optimize a sub schedule/cascade, have a problem that it
can split the initial cascade into several smaller cascades.
The problem with this is that it will increase IFM/OFM DRAM
bandwith and performance will drop.
- Changed the stripe algorithm to prefer long cascades.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I4f38b381597b7094819e9dd463aa1876e4e6bc62
|
|
- The cascade builder is using the ifm_ifm2_correct_order
function in order to decide if the operator is cascadable or not.
The problem is that this function expects a full shape or no shape
and the cascade builder did not provide that, so the operator was
reported to be non cascadable.
- The fix is to provide a full 4D shape, also refactoring
ifm_ifm2_correct_order to use 4D shape to avoid confusion
in the future.
- Refactoring code so that the scheduler can perform a
correct ifm and ifm2 swap.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9a86c4690612f332afa428456a07e67698852495
|
|
Change code in cascade builder to instead
use common functionality in live range.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I7bbd7ea3d1e7e085813e9d93256a54e6bab2267b
|
|
- Vela failed to compile networks with multiple subgraphs because
only cascaded passes in the root subgraph were used when
extracting the live ranges. The fix is to extract the subgraph
range live on Ops that have connected subgraphs.
- The tf_writer did not handle multiple subgraphs in a correct way
resulting in corrupt buffer data in the optimized tflite file. The buffer
index must be unique for every tensor.
-Added support to handle multiple subgraphs for the OfflineMemoryAllocation
meta data. The change will not change behavior for single graphs.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2328dfc1f07e2e4faf43a75423ea95423096ffa3
|
|
- The op contained supported operator checks for both the stride being
in the range 1 to 3, and being equal to 2. Whilst both are correct, only
the later is needed
- Removed the stride in the range 1 to 3 check for TRANSPOSE_CONV
- Regenerated the documentation
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I9789cdbd3ed65ce310f1529036abbac62296d2ca
|
|
In order to be able to add your SSH key there must
exist a valid email address in your account.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I60c70e63ea6ad015d5a10d8e9efec6d61d56cbad
|
|
If IFM operator shape is rewritten so that batching
is greater than one for fully connect, the OFM batch
must also be calculated. This change will fix output diffs
for networks that have fully connect OFM with rank greater
than 2.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I5009edc647a1449a02c8116b45808c1c68beffe6
|
|
- Removed half pixel centers constraint for resize nearest neightbor.
- Supported scale 2x, 4x and 8x.
- Removed test_constraint_resize_half_pixel_centers
- Regenerated SUPPORTED_OPS.md
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ic3e02e9c2b2034d537c9a9841b8fb4ee433c96dc
|
|
Fixed output diff when cascading elementwise operators with
reversed operand order.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iac2e28cfb53037b929459af213f4fa7715b3e6de
|
|
The problem was that the updated conditions for elementwise
cascading was to permissive after the RescaleAdd removal.
Conditions for elementwise updated and transpose convolution
removed from cascading since it does have issues.
Change-Id: I0151256c4e3905fad39152941eec44bc76035d30
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
The palette variable located on the stack was not properly
initialized and could potentially overwrite the stack memory
when palette size was increased to 2.
Make sure lut value is initialized.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9fecfe218dc39c0157d1af015e725d1e4becf2f0
|
|
Removed RescaleAdd and RescaleMul operators in favour of
Operation.explicit_scale and removed Operation.rescale.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Idccd8851731d4bb8d4e84970e0fd6b409d7d4e45
|
|
- The issue is due to the numpy version needed when installing on
aarch64 with Python 3.8 and TensorFlow
- The fix is to use the python_version variable when specifing the
numpy version
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I6134b6dbccefc3be0b87feb17e3176b7f42641b3
|
|
- Updated to TensorFlow 2.10 and FlatBuffers 2.0.7
- Changed absolute to relative imports in the auto-generated code
- Updated Vela's TFLite writer to support FlatBuffer builder's internal
number of elements count
- Removed use of deprecated numElems argument to FlatBuffer builder's
EndVector()
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: If447778134db81ae0ac374c7397e1140082372fd
|
|
Added unit tests for scaling including saturated multiplier test.
Change-Id: I87bb3a4bed8f62f5ef5cf3851b97f09ce42bf2b6
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Cleaned up bias tensor use in graph optimiser for Mean operator.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ibcbfa010a4de67d97181df664b420168d6883d1e
|
|
- In order to solve output diffs, the Reshape op was pushed
to the CPU. The problem was that the Mean op ifm shape
was replaced by the Reshape op ifm shape.
- This limitation is now removed. Changed implementation
how memory only ops are bypassed. Always replace the memory
only op ifm tensor with its ofm tensor. By doing this
the ifm tensor for the operator that is after the memory only
op is never changed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibcdebf33fd9b7a37f90984a129500b5dac52e5ea
|
|
Fixed bug when height is greater than max kernel height. The shape
of the weight must match the ifm shape.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I901a8af2edd5858bb15d53d85ef8e2389049ada7
|
|
Make the address_for_coordinate function a bit easier to read
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I854e1643a39108edc8b1de95198d30a1891fdfd1
|
|
The test failed since the tanh had batch size > 1.
Added checks for batch size for all supported operators.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I3570352740c40eb96bd9db965dfa3c91c81ff2ad
|
|
Added LeakyRelu to supported activation ops.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Icca27730946d02ec16159f988782567be716b594
|
|
Setting bias tensor dtype to DataType.int32 solves rounding issues for
RB HPC int16.
Removing the input data type check also solves the issue of resize
nearest neighbor int16 ops incorrectly getting placed on the CPU.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Iee352bcb78e581c0cde3c203dfbe866f1f6fae18
|
|
- Added support for Resize Bilinear with half pixel centers for int8 and
uint8.
- Utilizes the new "TILE" padding mode.
- Utilizes ofm stride multipliers and modified tile base offsets to
write OFMs interleaved.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I37fa77c022a368f05fda0ead75d8696c9205f833
|
|
The issue was that the AveragePool in these test cases were
translated to DepthwiseConv2DBias and int16 convolutions
always runs with reduced scale. Fixed so that reduced scale
is not used in this case.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ice956eabbb37c8aa1991464870006971c6ecec43
|
|
Fixed PReLU optimisation to LeakyReLU with negative alpha.
Added optimisation of LeakyReLU to ReLU when alpha is zero.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I5e66f79b29908fffd95b6115799021138ebb401a
|
|
Allow sparse writing of OFM by multiplying H/W/C of the OFM with the
values of ofm_stride_multiplier
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I65d742ad36ad3154e9914cdd22e2da928ad1f095
|
|
Fixed LeakyReLU regressions for int16 due to scaling introduced
for handling negative alpha.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I84a494fedf54bd4b47c4632645ded7d6cda445f8
|
|
Removed duplicate code and moved constraint to
the correct file.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2da3c5b88e1af351751c481217b8183b5948f0f8
|
|
Remove Pipfile support due to lack of testing and maintenance.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I93786cdbf22bfa2130601291d23cead177bd8f81
|
|
Added support for int16 LeakyRelu for negative alpha and alpha
greater than one.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I7f522ebfe014786d0a1d96172e75c7d9bdd76921
|
|
Implement new padding mode which pads two edges of the IFM with the
current values of those edges
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I8523e0cabdac80b48710703859003e33050cc150
|
|
Changed acc type from int16 to int32. This will solve
saturation problems and the constraint added in
commit "MLBEDSW-5029: Output diff for Mean op"
can be removed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I05ec8835b43313b1a264d61a2b147fa62da123fe
|
|
- Ethos-U65-512 requires the input to REDUCE_SUM to use NHWC format
- Updated the graph optimiser format check to cover this condition
- Added a exception check to the backend of the compiler to verify that
this condition is not been violated by the external api or Vela internals
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2f1fabcbd264daf77d5822349d855a3a32b12c64
|
|
Added optimisations for PReLU when the alpha values allows it.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iff9124e691663ee495379f89900e7c35dbc5f948
|
|
Fixed three test cases causing output diff compared to
the reference kernel for the Mean operator.
- If there is a possibility that the accumulator could saturate
the Mean op must run CPU
- Use correct rounding for the bias term
- If a Reshape op is followed by a Mean op, push the Reshape op
to the CPU since this cannot be handled by the NPU
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I734465730372105821a5e2f73a6a125b9eb7d7f4
|
|
Dump the current per-layer performance estimation information
that appears on the terminal to a CSV file.
Change-Id: I00e94168704be8c3c674c8779fb807ed28607ccd
Signed-off-by: wilisa01 <william.isaksson@arm.com>
|
|
Added PReLU support in graph optimiser.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I3a188675e3edcdf0b4a4bfcdd134fda0bf8a560f
|
|
- The optimisation of the SHAPE operator resulted in a divide by zero
when printing the percentage of npu/cpu operators in the final output
summary
- The fix is to detect when there are no operators in the output tflite
and then avoid the division
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I5bd2342335e9468a8b7028e6e2291a03960e2e55
|
|
- Updated SUPPORT_OPERATORS.md with Resize operators
- Updated release notes with the main changes and bug fixes
- Updated version numbers
Signed-off-by: oliper01 <oliver.perssonbogdanovski@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: If25b5fab708098bc3e7eb243924b55a50f148c3a
|
|
Mypy and pylint was previously not included in TESTING.md.
Also, installation of pre-commit, pytest and pytest-cov outside
of a virtual environment was not detailed.
CONTRIBUTIONS.md had an old Python version listed in the conding standard section.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Idff9454083e41d719e6d75e90cb2be2861500eb9
|
|
Remove resize ops completely from being cascaded since there
are corner cases which are not currently handled.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I9923f8e119af7bdc0e93b0e69b521b399e0629af
|
|
Output diffs were found to be caused by odd input stripe heights,
despite the input being an upscaling operator.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ia3791d815250364cfe7a38c3ed0e30768d64ca08
|
|
- When compiling for shared SRAM the old scheduler has an option so
that it produces less SRAM than what the new scheduler manages to
produce. The old scheduler was able to creates more/longer cascades.
In order to improve the new scheduler, the following has been
implemented:
- Take persistent IFM's into account when creating the min schedule.
- Choose longer cascades when it is possible to reduce the total
SRAM usage compared to using shorter cascades.
- Updated calculation for estimated SRAM usage for elementwise ops.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I209bbf2d94425e4f6aacb1d151b3b2aa65c0870b
|
|
- The compiler will assert when compiling a faulty concat op.
In the reported use case, there were 3 inputs with shape 1x1x2
but the output shape was 1x1x2 (expected to be 1x1x6)
- The solution is to add constraints to the concat operator.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I94a505c51a9fd54d1aa92531a0415031db52378a
|
|
There is an issue with using NumPy 1.21.4 or above in setup.py with
python 3.7. Restriction can most likely be removed when upgrading to
python 3.8.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I9f826201d68bb5ab61f5bf76c7796442d34447b9
|
|
Limit relative cost to 1 for elementwise operations since increasing
block size when the full ofm already fits gives no additional benefits.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6128f6346834fd916efa59adbe07a069dbda0ae
|