Age | Commit message (Collapse) | Author |
|
- 256 and 512 configuration variants execute 1D convolutions
in an optimised manner compared to their 2x2 microblock
dimensions. This commit takes this into account to improve
Conv1D throughput on these configurations.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I6ecdf6e4a219e356327b22f8393f50ee8817af23
|
|
- Merged dev/scheduler at 83639f90e8c828f70de6e29142355a940224959b
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0050529d4b42da93768c7264296434dd877fb5b4
|
|
This commit fixes a regression caused by a recent
commit where io_ranges and elementwise_broadcast
were failing with off-by-one errors.
The culprit was the incorrect usage of NATURAL
rounding in cases of elementwise ADD and SUB
where the input and output scales were equal and
advanced scaling was not used.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I35d56298e911a4d1bbca7d201bcde6044c8a5490
|
|
For 8 bit arithmetic we cannot guarantee reproducibility in the general
case since precision differs, affecting rounding near half integers.
It should be safe when the ratio between output and input scales has
its 12 LSBs all set to 0, however.
For 16 bit arithmetic it should be sufficient to adjust the input and
output scalings with a factor of 2 to get the same rounding.
Signed-off-by: Henrik G Olsson <henrik.olsson@arm.com>
Change-Id: I809c0042615d16c5488d61f0c7d88e1a1315e6eb
|
|
Bug fix in generation of register command offsets that do not fit in 32 bit.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: Iabb99cf6536c0f77b934691f8744df61f1eab3ed
|
|
- Tensor allocation verification was O(N^2), is now closer to O(N)
- Removed a sort in HillClimb allocator
Change-Id: I286a269881490c485cc2b0eeab3b1ecffa8f3df0
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added special handling of power-of-two input scales for
16-bit tanh/sigmoid to align with the reference.
Change-Id: I87831bcd587623d7db7100e768905355c2c98e9d
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added checks during command stream generation to make sure
that address boundaries are respected.
Change-Id: I4dbc693b42d54e35c8fcc785e8be88059e409eec
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
If the command stream size exceeds a certain threshold,
a VelaError will now be raised.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I9b9383f4c298a778b160cd527374e9244e4cae26
|
|
- The architecture supports address extensions wider than 32b via the cmd1.param
Change-Id: I7a01b4596f7a54f6be05b8e2c454494e6751757b
Signed-off-by: Mauricio Briceno <mauricio.briceno@arm.com>
|
|
All files which have been updated in 2021 and contain a copyright header have had their headers updated.
Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: Ia682111a719d16e690433398ccfb69c7e93c1cd1
|
|
Added RescaleAdd operation to avoid non-standard attribute
"rescale" for Add operation. Also changed ResizeBilinear
in the same way.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I1d286f63890585c06b8a161df1ff77e3f844a4b9
|
|
This commit corrects a number of type errors
reported by mypy and refactors some parts of
the code which are no longer necessary after
making adjustments to satisfy mypy.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I16b880b228e57f2a92fb8936f53e94886e0f9f44
|
|
Moved blockdep calculation and other helper functions for
code generation to a separate file.
Change-Id: I2f8ccea478654272ebf42217fc5c1800e9ad177a
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Blockdep calculation can now handle different sized IFM/OFM.
Change-Id: I898a3c1c3a6778916802f3dbfa658328e5093096
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added public API function npu_find_block_configs.
Change-Id: Ib0925a62d7c5d19a9b9fbd8d808943c2ea2df02f
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Added external API to add driver actions to a command stream.
Change-Id: Ie4779c1c745defc5769fa694358470cd6aea191c
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
All external APIs are now exposed by api.py.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I33f480e424692ac30e9c7d791f583199f31164a7
|
|
- Added sample vela.ini config file
- Changed vela config format, split into system config and memory mode
- Removed unused CPU cycle performance estimation
- Added new CLI options for --memory-mode and --verbose-config
- Changed CLI option --config to take multiple files
- Removed CLI option --global-memory-clock-scales
- Changed error helper functions to raise a VelaError exception
- Refactored to create a new is_spilling_enabled function
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I27c41577e37a3859edb9524cd99784be10ef0a0d
|
|
- Also changed to use Ethos-U where appropriate
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ie45ba2bb3935b305abe897b78b498681296cb7c1
|
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Added mechanism to track input to output graph transforms for
debugging the resultant command stream.
- Provides base implementation for MLBEDSW-2661
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2dfe8a409fbde7ad0282bfab5acb11ba1c8b82d8
|
|
For IFM streamed cascades bias tensors are read several times.
Moves these tensors to fast storage and add DMA commands.
Change-Id: I630f6275986c1b5e3f126c925b11e22500fb1128
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
- Normalise kernel availability by requiring all operators offer a kernel
describing how much data they consume from the source, per OFM element,
regardless of whether kernels are relevant to the operation.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Idbcff64879fc2eccf292b6208a7d2038eb388017
|
|
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
Change-Id: Ie404a0c13e7c7de0eff649f77e0147a0f3d73acd
|
|
- op.type is now an enum instead of a string
- Removed unused operator codes
- Refactored some attributes like npu_block_type, fused_activation_function
- Refactored operator index calculation
- Refactored a number of operator sets
Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Fixed crash in networks with 5D tensors.
Fixed crash for (int32) tensors without quantization.
Added validity checks for concatenation.
Moved unfusing of activation function from tflite_reader to graph_optimiser.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ib9ba8891dc95ef5491e15d0feedef44331a26393
|
|
Added support to convert batched FC to conv.
This enables choosing a suitable block-size.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Idc49e4fb6d29c554f10a38ece7996a7b7795ffad
|
|
Allows fusing of LUT with a preceding operator regardless of
input/output scale.
Change-Id: Ia378adbb3fe61d71299feb085f7313377e0efa39
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- Corrected the rounding mode for softmax
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: If136491c7668e85fba1e2c56c8cff11aa32db328
|
|
Fixed a zero point issue for int32 ifm.
Change-Id: I9149cb24d5b030ea5216a028a113518e458a8d15
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Enables LUT for LeakyRelu with int8/uint8 even if input scale
is different from the output scale.
Fusing LUT with a previous operator for this situation
requires further work.
Change-Id: I9eddfe36f457e763d44eb3e05fbe240eac7cfec9
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
For int16, using LeakyRelu (with bug fix) gives exactly
the same results as Mul+Max if input/output scales are the same.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I4f4db464d77b0aaf0d25ddfca534f91d08db548d
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ida307afc33cd7963bdeb505df400732a3efcc846
|
|
Implemented LUT generation for softmax uint8/int8 to match the
reference.
Change-Id: Ib9acaa295ee1066591e800023d75f364520b44c1
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ia83ab5ba28d193215e3f8fbc52552b0356111723
|
|
Added graph rewrite of Softmax for uint8/int8.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Iecdd5d2cd3156a601b3313debba4a3562e6be5d7
|
|
- This commit removes unnecessary dependency checks and implements
on-demand calculation of the NPU/DMA dependencies.
Signed-off-by: <tim.hall@arm.com>
Change-Id: I85e681d1ab133bd88f64296dc00500f3c188e777
|
|
- Support for more than one 256-byte LUT in SHRAM
- No DMA is performed for a LUT that is already located in SHRAM
- Added MemArea.Shram, used for LUT, to avoid false address collision
asserts during SRAM tensor allocation
- Added read access to LUT in memory access calculation
Change-Id: If4d1eded5ed029d253f4f5efb2d80495fc3eac99
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Id762ee2c03cd8f162cd0c450511ee5b2e0624586
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I5b8db6430e79ec7a5836d8dd00a03413647de8ba
|
|
For binary elementwise ops with broadcasting in first IFM.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I25af67be8d3a852247989bc3ddc8e08e946f6bfa
|
|
Added graph rewrite of Softmax for int16.
Change-Id: Id7885af6056a23e8b8362fb61ae94283251eb398
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I44428d77b2e8e44a477e5c4dfe28ab8dd1792838
|
|
A newer version of numpy gives a deprecation warning. This patch
resolves the deprecation warning so the user should never see it clutter
their output.
Tested on numpy version 1.19.0
Change-Id: I0c468818de4a2e5e2fcb109c45f51b2f1801b7b5
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Iaf4d7ab9c32b0d783072c5f131a61bfebe77cc16
|
|
Automatically generated, no functional changes.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Ia6a791f7dbadc352bc8a7b528afa070e8540b4d0
|
|
- Parallelism mode register was being written for non Yoda targets.
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I31b50031dab4d615733c4c3790dec8934117f275
|
|
- If blockdepth or core count resulted in empty or non-existent substreams, the
command generator generated an error. This commit changes the command stream
generator to only program cores that have streams and are enabled for the
configuration.
Change-Id: I4e724b19de14d3a12e886ec6b17d0038593dfb59
Signed-off-by: Tim Hall <tim.hall@arm.com>
|
|
- Multicore weight and scale stream interleaving for
multicore hardware architecture.
Change-Id: Ic82850463391c629d90d08c26cf0c48dd438286d
Signed-off-by: Tim Hall <tim.hall@arm.com>
|