Age | Commit message (Collapse) | Author |
|
Generate flatbuffer files with relative imports.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.8
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba
|
|
*Added generic function which checks if underlying shape of
FullyConnected operation is 2D and performs shape reduction
*Fully connected operation >2 dimensions now run on NPU if the above
case is satisfied
*constraint_fc_output_2d and rewrite_fully_connected_input refactored
*Added unit test to confirm this functionality
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38
|
|
- This is due to calling range() on a non-integer value which in turn is due
to a change in the behaviour of round() on numpy.float64 values
- The fix is to always force the output of the round() to be an integer and
thereby stop whole number floating point values propagating into the kernel
dimensions which later feed into the range().
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92
|
|
- Modify the operator clone function to also clone resampling mode
attribute.
A previous patch changed the ifm resampling mode to be an attribute of
an operator rather than a tensor but did not modify the operator clone
function to clone the new attribute.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7a2f6103666a0997f657de20ad962e849976b904
|
|
Corrected calculation for used bufferering depth. Before change there
were scenarios when it was set to smaller sizes than needed.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I162859ade78487e848510c6a605685e4568c7068
|
|
- Changed comments to docstring on QuantizationParams
- Simplified op type to op name conversion
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2fdf5922cc17944c9bd37917a85fdfe50a1e651d
|
|
- Added optional name attributes to operators and tensors
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3b5d881a7b1043a6ba4b58fff5d7532b271ba536
|
|
Update version of Black to 22.3.0 due to updated dependencies.
Updates to fix reported issues due to new version.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I60056aae452093ce8dcea1f499ecced22b25eef1
|
|
Uses separate tensors for the individual weight buffers
in case of weight double buffering.
Each weight buffer tensor gets its own individual live range.
Change-Id: I724a8c61a7045615fbd2ed9535663076ac8edd13
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added a mechanism that reduces the risk for getting stuck
if the current best allocation cannot be improved by only
swapping 2 indices.
Change-Id: Ife379757752f0c1ed54af7bd826e0a9390d54267
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added checks in the cascade builder to ensure that scheduled operations
are in the correct order.
Change-Id: Ic1765a6a1cb8335ff222bfe3b2d2e642980967d7
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Fixed a bug due to ResizeBilinear modifying the attributes of a
shared IFM
- The ifm_resampling_mode is now an attribute of an operator rather
than a tensor
- Changed all calls to try_block_config() to use the attribute rather
than recalculating it in multiple places
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4641e9cd6b049bd4186776d98e3e751c5e5bcc06
|
|
Add mypy to pre-commit and clean up all reported errors.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: If7dc869f5fecdb0e2db40f14e7d9db21aa33df71
|
|
- The number of accumulators is doubled in an Ethos-U configuration with
2 cores
- Likewise, for elementwise, depthwise and pooling operations
the IFM buffer depth capacity is doubled
- FindBlock: step the search space depth in multiples of ublock * ncores
Change-Id: I923cc347a2f252876d405ed93095d39181103f81
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added check that horizontal padding is unaffected when applying
graph optimization "optimise_strided_conv".
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I7032a44163e300cdf62cf615b4b10a1417e38eaa
|
|
Fast storage allocator did not always return an optimal
allocation.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: Ic758b6c4a82dc2633c4752b0c204a27ed36f651b
|
|
Fix bug when storing the encoded NPU weight UUID in the
NPU performance estimation.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I92127b0020f12352d923c0c9aa2b6f47e6110764
|
|
- Extend ifm/ofm dimensions explicitly in mean op
This fix a bug when ifm/ofm shape has different dimensions
e.g. IFM=1x19x18x25 axis=2 OFM=1x19x25,
the ofm_shape should be 1x19x1x25, not 1x1x19x25
- Fix wrong weight shape
Change-Id: I269eb71ea56c09deee2aa6c6433d9b2baa98a113
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- Corrected rounding error
- Number of elements depends on ofm format
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I568d660b7571b6e0ffb131211b3a89c8be4b9295
|
|
Update the version of flake8 used in pre-commit to facilitate
adding mypy to pre-commit.
Signed-off-by: Jonas Ohlsson <jonas.ohlsson@arm.com>
Change-Id: I457dec87b77487ca6f14ff4a679c4cc927b272b0
|
|
- The bug is that TransposeConv does not support explicit padding
which is needed in order to combine it with a proceeding Pad op
- The fix is to exclude such combination
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ide03d034dc32b5fc9bcaaf291ab713482223a042
|
|
*Corrected calculation where use of the
_estimate_memory_transfer_efficiency function when calculating the
scaled bandwidth for LUT transfers resulted in a divide by zero error.
Change-Id: I2356e924d9ca2f315ca1988f465f58b13a8fa4c9
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
*Original weights and encoded NPU weight now report correct size instead
of zero when running vela with --verbose-weights flag
(Code to update the aforementioned attributes was missing)
*Removed print references to unencoded NPU weight size
Change-Id: I6d3e41c04cc46d24eeb54cab89818a35e5df27be
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
|
|
Reduce memory footprint when using optimization strategy Size
for elementwise operations.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I30380aed587c31adbf7615f74179b4c5da686773
|
|
- Combine two MEAN operator checks for single axis averages into one
- Only apply that check if the single axis is the height dimension
(previously checks were also applied to width averages)
- Rephrase some MEAN operator constraint descriptions
Signed-off-by: James Peet <james.peet@arm.com>
Change-Id: Ie0577f2b99aba1f3d6a4c39f8934eafe3813b736
|
|
Make sure output from subgraph is write protected and
not overwritten by an element wise op.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ie26979913843c62794c5346a315b7089206850e0
|
|
Fixed problem when ofm is produced by different NPU nodes by
making sure that output is always in NHWC format.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I00e55c989d5860499fbaf4f4318661b17b4bda7e
|
|
Ported the improved spilling behaviour from Regor
into Vela. This replaces use_fast_storage_for_feature_maps
with allocate_feature_maps and introduces the class called
FastStorageComponentAllocator.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I34785840c905a79750a62863773015b00fb43387
|
|
This change will allow the subgraph's input tensor
to be reused/overwritten by the output from an elementwise op
if there is only one consumer attached to the input tensor.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I317188af11a5470614770e18dc8973462fd5f21c
|
|
The root cause of this diff is precision errors caused by rounding
several times when performing a resize bilinear upscaling to more than
twice the initial size. This is solved by rewriting the algorithm to
perform nearest neighbour upscaling to the correct size and then
applying one larger average pool instead of several 2x2 pools. Avgpool
with padding is limited to kernel size 8x8, which constraints the
largest possible bilinear upscaling to 8 times the input size.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I846232f309ba26aab6c385e593cbe25b646c6668
|
|
- Issue was due to a previous patch to fix MLBEDSW-5582
- Revert fix for MLBEDSW-5582
commit 849ff81f82c10a68898e5101930b92372bec5565,
- Made new fix for MLBEDSW-5582 that enforce
output tensor from NPU graphs to be in NHWC format.
This information is otherwise lost in the case when
parts of a concatenation are placed in different custom operators
resulting in mismatch bewteen NHWC and NHCWB16.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iab3ba29d348353c854f357836e6aa7c338ae1572
|
|
Only the first half of weight double buffers was used
on dual core configurations, which causes degraded performance.
Change-Id: I49972c00343bbffbae28ed11c645e993ed61d43f
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- This bug was due to an interaction between multiple Ethos-U custom
operators and concatenation of constant tensors
- It resulted in different parts of the concatenation being placed in
different custom operators
- The fix involves places all parts of the concatenation into the
same custom operator by switching to a breadth first search in pass
packing
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic47613cfd7bf675b4674dc91d6f9765849ba3130
|
|
Update the flatbuffers generated code to comply with TensorFlow 2.7
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Iff29b05a6e145245861329b4ff9fc9fbd968da53
|
|
By not comparing items that have already been compared with
each other, the number of iterations for the loop is reduced.
For large network with long live ranges, this improves compile
time significantly.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I298cd6f109527fc32f6db77ffffca9e765a84ce0
|
|
The output diff is caused by not including the kernel dilation when
calculating the bottom padding to be used on the last h_stripe. This
only shows up when using dedicated_sram since shared_sram does not split
into multiple h_stripes and thus uses the padding specified by the skirt
instead.
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f643748b153004d65be2124c0ac6c9d21cd803f
|
|
Signed-off-by: Jonny Svärd <jonny.svaerd@arm.com>
Change-Id: Ib398024c2f41beb4f93f7976c678a9fd54af94a5
|
|
Fixed a crash caused by loading a network containing
operators with empty constant tensors.
This could occur when a branched network is split
before said branches have converged.
We now put the affected operator on the CPU.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I63e9cd13cecf86d976c5750c727e218c334c32b5
|
|
When an LUT tensor address is updated with another existing LUT tensor
address, also make sure to update the equivalence id.
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I5ce8c608d9ff6d31e16212b1a725b4147dd3f6f1
|
|
- This bug causes a regression in the use of unpack and split operators
- The bug is due to the read_shapes attribute being an absolute calculation
for slice and strided_slice, but a relative one for unpack and split
- The fix is to consistently treat the attribute as a shape relative to the
read_offset
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4504b161be507ea22ca6ee40fbe7808bfe049405
|
|
- This bug causes an exception to occur when trying to index split
shape in Box.transform_with_strides_and_skirt()
- The bug was due to the read shapes not being initialised when creating
a primary op in pass packing
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3ebd7fc4c7ef5c06488a36d8340a17ae6afd4609
|
|
- Issue was due to a previous patch to fix MLBEDSW-4350
- Manually reverted that fix 5fabfcaa2b636b02899b4d6e0ccf95d853986475
- Made a new fix for MLBEDSW-4350 that calculates the padding and
skirt by taking into account the split read offsets and shapes
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I96010c1b977011aecbc411a3c91ab3e61af22db4
|
|
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I87dc5963972a7ef91db467b2ff8e0261e9899372
|
|
Fixed issue with sigmoid int16 with 1/2048 scaling.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I32718757e3776e6be89fe94a9b38368c78f0006b
|
|
This commit updates the release notes for Vela
version 3.2.0.
It also updates the SUPPORTED_OPS.md file with new
constraints.
Updated the API version as a result of the bug fix
commit 399c4a2d77df791e5d988c51d7fb1824ac4f266f.
Updated Vela version in setup.py.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I181e89f639a1da6013e8511ebe2d7e4f81242916
|
|
- Removed the passes information as this was no longer correct
or useful
- Fixed the reporting of the number of CPU operators
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I80bf3f023de7d470af9aa5c6fe7bcb58c60ccd0b
|
|
- The failing tests contain operations with dynamic tensors which
are not supported and therefore they should be placed on the CPU.
However, a bug in the removal of RESHAPEs which contain a dynamic
shape prevented this happening.
- This change adds a check to make sure that RESHAPE ops with a
dynamic shape tensor are not removed and instead are placed on the
CPU.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2d7481f7f80f99a0f01df100d956933777e6875a
|
|
Change-Id: I645496536a6bddf2bd289a87be9d7cef11693954
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
* 1D optimised block_config was incorrectly beign set to the ArchitectureBlockConfig in try_block_config()
* Write external API test for the reduced block height case (on H256)
Signed-off-by: James Ward <james.ward@arm.com>
Change-Id: I9ced7eb31b23730e4423aabbaf769bc72fac8fc9
|