Age | Commit message (Collapse) | Author |
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit reverts a control flow path where
already modified StridedSlice operators are
left untouched.
If not, Vela would recurse infinitely and crash.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Iaf3ae916325bedd3dd1edd3395fb4a9ecf832590
|
|
- Added mechanism to track input to output graph transforms for
debugging the resultant command stream.
- Provides base implementation for MLBEDSW-2661
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2dfe8a409fbde7ad0282bfab5acb11ba1c8b82d8
|
|
Change-Id: I9e00afe0eef0e13fe990e021bcbe3dd0eda4c471
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
Change-Id: I8f139381d0e01e8ac70d89c4a312ee3000fb5fa1
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
|
|
- DMA ops cycle estimation for the first pass
- fix a bug in ifm_blk_depth calculation
- fix a bug in sram bandwidth calculation
- merge dpu and elementwise cycles into npu cycles
- use str.format() in performance print
Change-Id: I78895416f47fc3c652743c5da13fc45630322371
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
(cherry picked from commit 5245e97a62c2fe54250f99b06e778f3e0c6dc376)
(cherry picked from commit 16e415677403fc04a90b1a7ec554761d38315640)
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: Ic6ae795a1626d1cdf63a69d2ff86f7cd898f3134
|
|
For IFM streamed cascades bias tensors are read several times.
Moves these tensors to fast storage and add DMA commands.
Change-Id: I630f6275986c1b5e3f126c925b11e22500fb1128
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
This commit removes the constraint on all tensor
shapes matching the OFM shape.
The motivation is that this constraint essentially
only checks that the fixup function has run.
This means that it removes the possibility for the
fixup function to run after the supported operator
check and this effectively means that any
StridedSlice operator that would be placed on the
CPU is still modified by the fixup function.
Because the fixup function is moved to after the
supported operators check, some unreachable cases
are removed from the fixup function.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I7a82126b7de73bd67873b4e6daf53a6767e33d16
|
|
Changed so that there is an option to set if Tensor clone should be
seen as unique or not.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ie51c1a5e84b535380d498b105aa18ccba1c8b27c
|
|
Improve u65 softmax performance by selecting more feature map
tensors as SRAM candidates.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I239c9dbebbf2a929004eb01bb0f3efe77f5b97aa
|
|
Previously the internal operator type was printed when checking the
supported operator checks. This now converts that back to the external
type name.
Additionally removed dead code and changed the message for cpu-only ops
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ib2b0cbcb49fdf63edb835828e266b079e63bae37
|
|
Removed the CLI opt ifm-ofm-overlap
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I23faa0d10c3e71972c543e22e8155086fce73556
|
|
All existing constraints have now been refactored using the new
framework.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ic9ba0d7040cb9f114b959a949bfdf777f86752c7
|
|
Added a supported_operators check for Relu activation functions. If the
scaling value overflows to infinity, it will be placed on the CPU.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I66b7bec062599609aadcbb7531caebbc45a7451f
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
|
|
Set the actual size of the Scratch and Fast Scratch buffer and remove both
Scratch buffers from the subgraph inputs.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I9e4213f48289d9136cdd4cd43c668d37c6af8530
|
|
Separate scale+bias tensors by different equivilence_id.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I674341950bc001ac6e4015206995f048a0dfee75
|
|
- copy bandwidth compression rate when weight tensor is cloned
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I41c4c1f7001e8dc12af35695f5f5d02815e28351
|
|
Enable overlap of elementwise input/output
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I6e6f11953319c843c8203bf038f96778df194332
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I91a3b277cda91dca3bad38908d4ed11a4f5d7d5f
|
|
- Fixed typo in Tensor.is_quantized()
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I36156a6aa5aaff01c4f271a6a8325636173225f3
|
|
- Normalise kernel availability by requiring all operators offer a kernel
describing how much data they consume from the source, per OFM element,
regardless of whether kernels are relevant to the operation.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idbcff64879fc2eccf292b6208a7d2038eb388017
|
|
- Fixed and documented both tensor and quant params scaling checks
- Added quant params validity check and tensor quantisation check
- Added valid tensor checks to some graph optimisation functions
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I8d6e8f03a603d28886dde511672c8399c85b794c
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I9f3671041c2b1497519cf42b5f52e3cd01d9c10a
(cherry picked from commit e8c989f5236cce12d07a6644329935dbbf0ee8e6)
|
|
- Refactored mark_tensor_purpose
- Initial weight compression is now always done in insert_dma
- Removed mark_tensor_format
Change-Id: Ic719b9bcd1d27e1390d7b9ce8cd21795139ec814
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
Change-Id: Ie404a0c13e7c7de0eff649f77e0147a0f3d73acd
|
|
Using a new system to report constraints, replaced existing
functionality for checking conv-like ops.
This new system will allow reporting of all constraints regardless of
any input network.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: If81177deca2a3b57c9dd9a3a08868cbc9cef0c23
|
|
This commit fixes a bug where a rewritten Unpack
operator is placed on the CPU and crashes Vela
during serialisation due to the type having
changed and there not being a mapping for the
modified op type.
The solution is to move the fixup_unpack_output
function to the graph optimisation pass B,
allowing the supported op check to run before it.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Ic6bd4c70a478fd61adf377cb487f5b9253130314
|
|
Suppress info print that Const/Placeholder/SubgraphInput are not supported
on the NPU.
Change-Id: I6f323b64185b01b619b584c1473ae61d010ab3a4
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This reverts commit 04986c0016e59993563490fe67052371fc0e1ad2.
Reason for revert: Merged by mistake
Change-Id: I150ad9ba7074ad1e80f21180aeba56a454d9f748
|
|
Suppress info print that Const/Placeholder/SubgraphInput are not supported
on the NPU.
Change-Id: I689d25481df0cd10487484c9f639e4253df081ee
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Keeping the constraint functions consistent with each other
Added specific tensor names in the extra info
Added operator name to the warning generated
This should help easily identify specific problematic nodes in a graph
and give a good enough explanation as to why they are placed on the CPU
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ie5bbdd31e5e75fe37e3d8bb8fee1d260080bce83
|
|
Added info print for unsupported operator
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I1002d1c2249661bff17ef86d9500d1aeb2a1e38e
|
|
Vela supports batching of FC, restriction removed.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Ica56738f1b2676628644fc44f2039a24807f5ccb
|
|
Vela could crash in operator serialization if "fused_activation_function"
was not set.
Change-Id: I7f2364b0849fd371dee87e26c6d33d44ce8cec26
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Incorrect length check in high level command stream generator
- Improved tensor names related to LUT based operations
Change-Id: Ib8844a35a986e2dbef095df23f143f4633b255f9
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit changes and amends some parts of the
restriction functions in order to make sure
operators are correctly placed.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I336cf33a874c9078a5bbf81ce129ff917dbc5e9a
|
|
Change-Id: Idcf1665f95ddecc2a12ff0e714f645263981d501
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added check so that inputs with no values are not reshaped.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Id5e53b093508583c2d70ba7e337869db3de32701
|
|
- op.type is now an enum instead of a string
- Removed unused operator codes
- Refactored some attributes like npu_block_type, fused_activation_function
- Refactored operator index calculation
- Refactored a number of operator sets
Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Fix issue for checking axis in concat, now allowing 0.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I85a5fc3dacdfc66dc01b0e05048dd100254fddff
|
|
- Presence of accumulators in validation was preventing some elementwise
configurations from being chosen. This commit sets accumulator requirement
to zero before validating the shared buffer config.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Id79f80afb12f77274ade53f7678c3b2e56aef059
|
|
A new mechanism to report generic restrictions/constraints for
operators has been implemented.
Each check is its own defined function, and has a general reason for
the constraint defined as its docstring.
This allows us to query all reasons up front and report this without
having to run through real data to trigger the checks.
This is part of a larger refactoring and the specific restrictions will
be replaced by a similar mechanism.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Id3fb2639f91cfac5fc5b8c14f7620de1a85972b2
|
|
Part of larger refactoring. The sets of operators do not need to be
instance attributes and are not expected to be modified at runtime.
This in turn allows almost all functions to become class methods.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I7dc24d65cdd6c4bda641b3d6133b3134302a552f
|
|
When deciding if weights fit sram:
A compression of the weights has been added when a
weight compression test limit makes it impossible to
fit weights in a double buffer in sram.
The worst compression ratio from compression, is used
to decide if weights can be fit in sram.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I9458769866b3f9fc15659185aae09658ed10fb38
|
|
Overflow could occur in the calculation of the LUT table for sigmoid,
for big negative inputs.
Change-Id: I62a33c68de03e9a7a7e4fe2cbd5835c384dc3643
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Min and max operations was not passed through
the checking of elementwize OPs in the supported
operator checking.
Changed so they are passed through this check as well.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I358a121de33882802415d97d9ed5dbee53233f77
|
|
Fixed crash in networks with 5D tensors.
Fixed crash for (int32) tensors without quantization.
Added validity checks for concatenation.
Moved unfusing of activation function from tflite_reader to graph_optimiser.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ib9ba8891dc95ef5491e15d0feedef44331a26393
|
|
SHRAM is removed from performance reports, as the SHRAM numbers only
include LUT usage.
Change-Id: I5d92bb3be9c8e38dad26ac8ef97c84ecb0aff2fa
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Fixed issue in removal of reshapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Id6081de8d6b7b6815cc5e56881c20e075214c407
|