Age | Commit message (Collapse) | Author |
|
Do not use DMA for weights of a FullyConnected op that has
been converted to a Conv2D.
Change-Id: Ibf6710c0a1723c8b48c563ca204f274af5ca88ce
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This reverts commit 15a8e803844b286fe9533e1cf703c76a77b090a8.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I64169443f473c9ba42551281ad6ac4b45856f420
|
|
- Fixed bug due to typo in Op.type refactor
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55916d90bf792648f496a45c358b7e897c6730ba
|
|
- Improved tensor and scaling query functions
- Fixed bug in convert_batched_fc_to_conv
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ibc3d14036540f27cf5e993beb2163d3e0f5e5933
|
|
Do not convert batched fully connected operators to avoid moving
weights from flash to SRAM.
Change-Id: I873c9ce05377de3f16e4cee9a0863f29d9ec3ad4
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Added external API to generate register command streams.
Existing code generation has been refactored to make
use of this API.
Change-Id: Ibb4c2b167809869f16470b14da24f08a65c82b7b
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
This commit reverts a control flow path where
already modified StridedSlice operators are
left untouched.
If not, Vela would recurse infinitely and crash.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Iaf3ae916325bedd3dd1edd3395fb4a9ecf832590
|
|
- Added mechanism to track input to output graph transforms for
debugging the resultant command stream.
- Provides base implementation for MLBEDSW-2661
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I2dfe8a409fbde7ad0282bfab5acb11ba1c8b82d8
|
|
This commit removes the constraint on all tensor
shapes matching the OFM shape.
The motivation is that this constraint essentially
only checks that the fixup function has run.
This means that it removes the possibility for the
fixup function to run after the supported operator
check and this effectively means that any
StridedSlice operator that would be placed on the
CPU is still modified by the fixup function.
Because the fixup function is moved to after the
supported operators check, some unreachable cases
are removed from the fixup function.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I7a82126b7de73bd67873b4e6daf53a6767e33d16
|
|
Signed-off-by: Diqing Zhong <diqing.zhong@arm.com>
Change-Id: I91a3b277cda91dca3bad38908d4ed11a4f5d7d5f
|
|
- Fixed and documented both tensor and quant params scaling checks
- Added quant params validity check and tensor quantisation check
- Added valid tensor checks to some graph optimisation functions
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I8d6e8f03a603d28886dde511672c8399c85b794c
|
|
This commit fixes a bug where a rewritten Unpack
operator is placed on the CPU and crashes Vela
during serialisation due to the type having
changed and there not being a mapping for the
modified op type.
The solution is to move the fixup_unpack_output
function to the graph optimisation pass B,
allowing the supported op check to run before it.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: Ic6bd4c70a478fd61adf377cb487f5b9253130314
|
|
- Incorrect length check in high level command stream generator
- Improved tensor names related to LUT based operations
Change-Id: Ib8844a35a986e2dbef095df23f143f4633b255f9
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
- op.type is now an enum instead of a string
- Removed unused operator codes
- Refactored some attributes like npu_block_type, fused_activation_function
- Refactored operator index calculation
- Refactored a number of operator sets
Change-Id: I641f65ee375794b7aec42abc0664251ae37d78e8
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
When deciding if weights fit sram:
A compression of the weights has been added when a
weight compression test limit makes it impossible to
fit weights in a double buffer in sram.
The worst compression ratio from compression, is used
to decide if weights can be fit in sram.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I9458769866b3f9fc15659185aae09658ed10fb38
|
|
Overflow could occur in the calculation of the LUT table for sigmoid,
for big negative inputs.
Change-Id: I62a33c68de03e9a7a7e4fe2cbd5835c384dc3643
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Fixed crash in networks with 5D tensors.
Fixed crash for (int32) tensors without quantization.
Added validity checks for concatenation.
Moved unfusing of activation function from tflite_reader to graph_optimiser.
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: Ib9ba8891dc95ef5491e15d0feedef44331a26393
|
|
Fixed issue in removal of reshapes
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Id6081de8d6b7b6815cc5e56881c20e075214c407
|
|
Uses LUT for int8/uint8 based tanh/sigmoid.
Change-Id: Ib6ac5a5c958ab9a17e47f620b22c3e22d8d60321
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
If IFM/OFM is not 4d rescaling ops are added to ReLus with
different scaling.
Change-Id: I631d44fc8a51fb476b9f62ef90eda26eef3d35f3
Signed-off-by: Andreas Nevalainen <andreas.nevalainen@arm.com>
|
|
Fixed issue with checking if axis corresponds to C-dim
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I72d9fd2c9fca642b5ab326324a63111b01c5de98
|
|
Added support to convert batched FC to conv.
This enables choosing a suitable block-size.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Idc49e4fb6d29c554f10a38ece7996a7b7795ffad
|
|
A Split operation with num_splits=1 is essentially
a NOP. This commit adds a graph optimisation function
which replaces said Split ops with an Identity op for
later pruning.
Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I0b0c535f214f54ee4c255662a18c37543bdc6d64
|
|
In the event we have a relu op with different input and output scales,
we need to fuse it with a nop avgpool.
Also refactor the existing avgpool nop code to a common function.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Iedf4513e7595ee4ee1777ba0b1eb38a8df8aed5e
|
|
Addded functionality for removing reshape OPs,
that enclose an Elementwize OP with only one non-constant
Tensor.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: Idaac50cfbd732e2667668be2baa059673236cc56
|
|
We have a number of sets for grouping specific ops together but arent
used that much in code. This updates the file to better utilise these
sets.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I719212671f8bdebc32576278f703549f0937ff65
|
|
Allows fusing of LUT with a preceding operator regardless of
input/output scale.
Change-Id: Ia378adbb3fe61d71299feb085f7313377e0efa39
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I75aad9bf59ad76ee6a0c0feb4d7299b50d787fe8
|
|
Enables LUT for LeakyRelu with int8/uint8 even if input scale
is different from the output scale.
Fusing LUT with a previous operator for this situation
requires further work.
Change-Id: I9eddfe36f457e763d44eb3e05fbe240eac7cfec9
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I287c24725126c169afec779b921e43c3ab26f739
|
|
For int16, using LeakyRelu (with bug fix) gives exactly
the same results as Mul+Max if input/output scales are the same.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I4f4db464d77b0aaf0d25ddfca534f91d08db548d
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I2cb3f6639e4bb8a984fa3647ee7b4678ed6f5890
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ida307afc33cd7963bdeb505df400732a3efcc846
|
|
Replaces LeakyRelu operations with LUT activation function when possible,
else to a combination of multiplication/maximization.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I3d2eb2dba7145997c3cc711d0ef18ab355fbb416
|
|
Includes a number of changes:
* Handle non-existing optional inputs
* Handle disabled optional inputs (-1 indexed)
* Added unit tests for parsing operators
* Add bias tensor to the different Convolutions + FullyConnected if
it's missing.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: Ib88d2b610314b1c886fc0aef4f9da87430ce6ae5
|
|
NHCWB16 is avoided for the input tensor for SplitSliceRead,
when any of the consumers has an start offset in C-dimension
that is not a multiple of 16.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I333e2acfbeb02b9c34ee5ea28074baff12ea7b24
|
|
4 dimensions where assumed in check if NHCWB16 should be avoided.
Changed check so that if axis corresponds to C-dimension,
NHCWB16 should be avoided.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: I7784a7a813a3c3438d6142523bf0a3ba81742aca
|
|
Avoid usage of NHCWB16 when Stack/Pack/Concat is performed in axis 3,
and the "concat start" of each slice to be combined is not a multiple
of 16.
Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: If3f7b4a3424be3c86fc2dc48e8649ce4c4f49485
|
|
add_input_tensor, set_output_tensor, create_const_tensor and
create_reshape_tensor have recently been added.
This replaces all found existing instances with these new helper
functions
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: If33be8dbf237b2087b562b03cdeb51da1f99a786
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: Ibd0cd152fbc46dea0c92fd1bf7da1ffc9803fdba
|
|
Added graph rewrite of Softmax for int16.
Change-Id: Id7885af6056a23e8b8362fb61ae94283251eb398
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
|
|
Signed-off-by: Charles Xu <charles.xu@arm.com>
Change-Id: I44428d77b2e8e44a477e5c4dfe28ab8dd1792838
|
|
By converting certain Conv2D's (where the kernel size is 1x1 and the
IFM H and W are both 1) to Fully Connected's, vela can better know
whether the weights need to be cached/double buffered or not.
This change decreases the number of NPU_OP_DMA_START commands found in
the resulting command stream.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: I928150d9f360578dde75a83986bea1560d83cbdd
|
|
There is a repeating pattern of setting the 3 different shapes in a
tensor to a single shape value.
This adds a new function in the tensor class that does this for you.
Changed existing instances of manually setting shape to use this new
function.
Signed-off-by: Michael McGeagh <michael.mcgeagh@arm.com>
Change-Id: Ibc74e741ea47cec473e6be42cc102f721ec63b11
|
|
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I39cff126dda89d71426ab731427ca1d64d02590d
|
|
This will give a worst case estimate of the Double Buffer size in the
Scheduler and it will no longer be able to choose strategies that end
up with a buffer that doesn't fit in SRAM.
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I763731f63c7672679f3b8cd6db65dad03b946ae5
|
|
- No functional change
Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I5ab1198b9d092cd041fa9b85b2dee9900d299bfc
|
|
Tensors that are the result of an operation were incorrectly marked
as scalars.
Also fixes a bug for IFM2 of shape [*,*,*,1] in elementwise operations.
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
Change-Id: I82a0e643b12e93c7158e4aca3185415c59033a73
|
|
Change-Id: Ie6d8d6de9f3447f19ba06aafa9fa480fc96a973b
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
|
|
- Dilation added to SET_KERNEL_STRIDE instruction
- Kernel height/width adjusted for dilation
- Updated padding calculation
- Updated weight compression
Change-Id: I0c8190223e223b039a305aba0f37896ae1de2b80
Signed-off-by: Louis Verhaard <louis.verhaard@arm.com>
|