aboutsummaryrefslogtreecommitdiff
path: root/chapters/introduction.adoc
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/introduction.adoc')
-rw-r--r--chapters/introduction.adoc157
1 files changed, 55 insertions, 102 deletions
diff --git a/chapters/introduction.adoc b/chapters/introduction.adoc
index b369070..cae23d4 100644
--- a/chapters/introduction.adoc
+++ b/chapters/introduction.adoc
@@ -79,7 +79,7 @@ The following principles govern the selection of operators within TOSA.
|Consistent sub-operation definition reduces the operator implementation cost.
|P4
-|The valid input and output ranges for all arguments shall be specified.
+|The valid input and output ranges for all operands shall be specified.
|Ranges are required to make consistent (numerically agreeing) implementations possible.
|P5
@@ -108,11 +108,11 @@ The following table summarizes the three profiles:
=== Levels
-A TOSA level defines operator argument ranges that an implementation shall support.
+A TOSA level defines operator parameter ranges that an implementation shall support.
This is distinct from a profile that defines the operations and data-types supported.
This version of the specification defines two TOSA levels:
-* No level : allows the full range of arguments specified by the operations according to the operation data types.
+* No level : allows the full range of parameters specified by the operations according to the operation data types.
* Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K.
Later versions of the specification may define additional levels.
@@ -120,7 +120,11 @@ The following table defines the value ranges for Level 1.0.
These ranges are checked using the LEVEL_CHECK() function with the operator descriptions.
.Level maximums
-include::{generated}/levels.adoc[]
+|===
+| Level | tosa_level_t | MAX_RANK | MAX_KERNEL | MAX_STRIDE | MAX_SCALE
+| None | tosa_level_none | NA | NA | NA | NA
+| 8K | tosa_level_8k | 6 | 8192 | 8192 | 64
+|===
=== Status
@@ -135,10 +139,7 @@ The TOSA specification is a work in progress.
=== Compliance
This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level.
-To be compliant an implementation must achieve the results and accuracy defined by this specification.
-TOSA also defines a set of conformance tests.
-A compliant implementation must pass the conformance tests.
-The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests.
+The term conformant will mean the same as compliant.
==== Base Inference Profile Compliance
@@ -180,7 +181,7 @@ bool tosa_test_compliance(tosa_graph_t graph, tosa_list_t input_list, tosa_level
}
----
-==== Main Inference Profile Compliance
+==== Main Inference Profile
A Main Inference compliant implementation must satisfy the following:
@@ -219,7 +220,7 @@ The following criteria apply to all operations:
| Operation | Accuracy bound
| <<ARGMAX>>, <<MAX_POOL2D>>, <<CLAMP>>, <<MAXIMUM>>, <<MINIMUM>>, <<ABS>>, <<NEGATE>>, , <<CONST>>, <<IDENTITY>>
-| Non NaN results must be exact.
+| The result must be exact.
| <<EQUAL>>, <<GREATER>>, <<GREATER_EQUAL>>
| The result must be exact with: +
@@ -231,25 +232,19 @@ The following criteria apply to all operations:
The dot product must meet the <<Dot product accuracy requirements>>
| <<FFT2D>>, <<RFFT2D>>
-| Each output can be expressed as a dot product of an input vector with a constant coefficient vector. +
+| Each output can be expressed as a dot product of an input vector with a costant vector. +
The dot product must meet the <<Dot product accuracy requirements>>
-| <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>>
+| <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>>, <<CAST>>
| Floating-point result overflows must be set to infinity of the correct sign. +
Floating-point result underflows must be set to zero of the correct sign. +
+Integer result overflows must be saturated. +
Addition of infinites of different signs must produce a NaN. +
Subtraction of infinities of the same sign must produce a NaN. +
Multiplication of an infinity by a zero must produce a NaN. +
Otherwise for fp32_t the result must be rounded to the nearest representable value using the round to nearest, ties to even rounding mode. +
Otherwise for fp16_t and bf16_t the result must be within 0.5 ulp of the mathematical result.
-| <<CAST>>
-| Floating-point result overflows must be set to infinity of the correct sign. +
-Floating-point result underflows must be set to zero of the correct sign. +
-Cast from floating-point to integer result overflows must be saturated. +
-Otherwise for fp32_t the result must be rounded to the nearest representable value using the round to nearest, ties to even rounding mode. +
-Otherwise for fp16_t and bf16_t the result must be within 0.5 ulp of the mathematical result.
-
| <<RECIPROCAL>>
| If the input is a zero or the result overlows the output must be an infinity of the same sign. +
If the input is an infinty or the result underflows the output must be a zero of the same sign. +
@@ -273,7 +268,7 @@ Otherwise the result must be within 5 ulp of the mathematical result.
This dot product must meet the <<Dot product accuracy requirements>>
| <<AVG_POOL2D>>
-| Each output can be expressed as a dot product of an input vector with a vector with elements 1/KS where KS is the kernel size. +
+| Each output can be expressed as a dot product of an input vector with a vector with elements 1/d where d is the kernel size. +
This dot product must meet the <<Dot product accuracy requirements>>
| <<REDUCE_PRODUCT>>
@@ -286,65 +281,36 @@ where `E = pow(1 + pow(2, -M-1), N) - 1`. In this expression M is the number of
===== Dot product accuracy requirements
-This section assumes an operation acting on two tensors named 'input' and 'weight'.
-Each output tensor element can be expressed as a dot product of elements between the input and weight tensors.
-The dot product has length KS, the kernel size.
-Note: KS is defined for each relevant operator in the appendix section <<Main Inference operator test data>>.
-
-In other words each output element `out` can be expressed as a dot product between input elements `in[k]` and weight elements `w[k]`:
-
-`out = in[0] * w[0] + in[1] * w[1] + ... + in[KS-1] * w[KS-1]`
-
-The positions of `in[k]` and `w[k]` in the input and weight tensors depends on the operation being performed (for example a convolution).
-
-This section defines the accuracy required for these operations.
-The term "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by <<Other publications>>[1].
-
-For an operation with given sizes and attributes to be compliant the following must hold for each data set S defined in <<Appendix A>>:
-
-* Let input be the input tensor generated by <<Main Inference operator test data>> for test set S
-* Let weight be the weight tensor generated by <<Main Inference operator test data>> for test set S
-* Let output_ref be the output tensor calculated by the operation using fp64 arithemic
-* Let output_imp be the output tensor calculated by the implementation to test
-* Let input_abs be the input tensor with each element replaced with its absolute value
-* Let weight_abs be the weight tensor with each element replaced with its absolute value
-* Let output_bnd be the output tensor calculated using fp64 arithmetic on input_abs and weight_abs
-
-The following checks must then pass:
-
-[source,c++]
-----
-size_t T = tensor_size(output_shape) // number dot product results
-fp64_t out_err_sum = 0.0;
-fp64_t out_err_sumsq = 0.0;
-fp64_t acc_prec; // 1<<(M+1) where M is the number of mantissa bits
-switch (acc_t) {
- case fp32_t: acc_prec = (fp64_t)(1<<24); break;
- case fp16_t: acc_prec = (fp64_t)(1<<11); break;
- default: ERROR_IF(true);
-}
-for_each(index in output_shape) {
- fp64_t out_bnd = tensor_read<fp64_t>(output_bnd, output_shape, index);
- fp64_t out_ref = tensor_read<fp64_t>(output_ref, output_shape, index);
- acc_t out_imp = tensor_read<acc_t> (output_imp, output_shape, index);
- fp64_t out_err;
- if (out_bnd == 0.0) {
- REQUIRE(out_ref == 0.0 && out_imp == 0.0);
- out_err = 0.0;
- } else { // out_bnd > 0.0
- out_err = ((fp64_t)out_imp - out_ref)*acc_prec/out_bnd;
- REQUIRE(abs(out_err) <= KS);
- }
- out_err_sum += out_err;
- out_err_sumsq += out_err * out_err;
-}
-if (S!=1 && S!=2) {
- // check output error bias magnitude for data sets S which are not positive biased
- REQUIRE(abs(out_err_sum) <= 2*sqrt(KS*T));
-}
-// check output error variance magnitude
-REQUIRE(out_err_sumsq <= 0.4*KS*T)
-----
+This section gives accuracy constraints for operations where the result is a sum of products of N floating-point inputs:
+
+`y = x[0] * w[0] + x[1] * w[1] + ... + x[N-1] * w[N-1]`
+
+Let M be the number of mantissa bits in the accumulator.
+So M=23 for an `fp32_t` accumulator and M=10 for an `fp16_t` accumulator.
+
+In this section "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by <<Other publications>>[1].
+
+Appendix A, defines a number of <<Dot product floating-point test data sets>>.
+For each data test set (S, N) consisting of T tests the following must hold:
+
+* For each test t in the range 0 to T-1, calculate:
+** `y_imp[t] = x[0] * w[0] + ... + x[N-1] * w[N-1]` calculated by the implementation
+** `y_ref[t] = x[0] * w[0] + ... + x[N-1] * w[N-1]` calculated using fp64 arithmetic
+** `y_bnd[t] = abs(x[0] * w[0]) + ... + abs(x[N-1] * w[N-1])` calculated using fp64 arithmetic
+* if `y_bnd[t] == 0` then
+** `y_imp[t]` must be zero and set `y_err[t] = 0`
+* if `y_bnd[t] > 0` then set:
+** `y_err[t] = (y_imp[t] - y_ref[t]) * (1<<(M+1)) / y_bnd[t]` calculated using fp64 arithmetic
+* For each test t the following must be satisfied:
+** `y_ref[t], y_bnd[t], y_imp[t]` must be finite
+** `abs(y_err[t]) \<= N`
+* Calculate the sum of y_err using fp64 arithmetic:
+** `y_err_sum = y_err[0] + .... + y_err[T-1]`
+* Calculate the sum of y_err squared using fp64 arithmetic:
+** `y_err_sumsq = y_err[0] * y_err[0] + ... + y_err[T-1] * y_err[T-1]`
+* The error sum and sum squares must satisfy the following. The first equation bounds the bias and the second the error variance.
+** `abs(y_err_sum) \<= 2*sqrt(N*T)`
+** `y_err_sumsq \<= 0.4*N*T`
=== Tensor Definitions
@@ -369,13 +335,10 @@ Tensor elements are addressed using dim_t values, where each element of the vect
==== Tensor size limit
-The tensor overall size is limited by the data type size_t.
-This type must be able to hold integers in the range 0 to (1++<<++(MAX_LOG2_SIZE+1)) - 1 where MAX_LOG2_SIZE is defined in <<Levels>>.
-For each tensor, the number of tensor elements multiplied by the element size in bytes (which is taken to be 1 for elements smaller than a 8-bit) must be less than or equal to (1<<(MAX_LOG2_SIZE+1)) - 1.
-
-The size of tensors along each of their dimensions is limited by the data type index_t.
-This type must be able to hold integers in the range 0 to (1++<<++MAX_LOG2_SIZE) - 1 where MAX_LOG2_SIZE is defined in <<Levels>>.
-This means that the maximum size of a tensor along each dimension is (1<<MAX_LOG2_SIZE) - 1 and therefore the maximum coordinate value is (1<<MAX_LOG2_SIZE) - 2.
+The tensor overall size in elements is limited by the data type size_t.
+In this version of the specification, size_t is defined as an unsigned 32-bit integer representing size from 1 to (1<<32) - 1.
+A tensor dimension co-ordinate is limited by the data type index_t.
+In this version of the specification, index_t is defined as a signed 32-bit integer.
Indices used to access tensors must be non-negative.
==== Data Layouts
@@ -480,17 +443,7 @@ Signed zero must be supported.
|fp32_t
| -infinity
| +infinity
-| 32-bit single-precision floating-point defined by <<Other publications>>[1]. +
-Normal values must be supported. +
-Denormal values must either be supported or flushed to zero. +
-Positive and negative infinity must be supported. +
-At least one NaN encoding must be supported. +
-Signed zero must be supported.
-
-|fp64_t
-| -infinity
-| + infinity
-| 64-bit double-precision floating-point defined by <<Other publications>>[1]. +
+| 16-bit single-precision floating-point defined by <<Other publications>>[1]. +
Normal values must be supported. +
Denormal values must either be supported or flushed to zero. +
Positive and negative infinity must be supported. +
@@ -522,7 +475,7 @@ To convert a network containing quantized tensors to TOSA, generate explicit RES
This reduces quantized operations to purely integer operations.
As an example, an ADD between two quantized tensors requires the integer values represent the same range.
-The scale arguments for RESCALE can be calculated to ensure that the resulting tensors represent the same range.
+The scale parameters for RESCALE can be calculated to ensure that the resulting tensors represent the same range.
Then the ADD is performed, and a RESCALE can be used to ensure that the result is scaled properly.
RESCALE provides support for per-tensor and per-channel scaling values to ensure compatibility with a range of possible quantization implementations.
@@ -554,7 +507,7 @@ The values to achieve a scaling of 1.0 are shift=30, multiplier=1<<30 for apply_
[source,c++]
----
-int32_t apply_scale_32(int32_t value, int32_t multiplier, int8_t shift, bool_t double_round=false) {
+int32_t apply_scale_32(int32_t value, int32_t multiplier, uint6_t shift, bool_t double_round=false) {
REQUIRE(multiplier >= 0);
REQUIRE(2 <= shift && shift <= 62);
REQUIRE(value >= (-1 << (shift - 1)) && value < (1 << (shift - 1));
@@ -569,7 +522,7 @@ int32_t apply_scale_32(int32_t value, int32_t multiplier, int8_t shift, bool_t d
return (int32_t)result;
}
-int32_t apply_scale_16(int48_t value, int16_t multipler, int8_t shift) {
+int32_t apply_scale_16(int48_t value, int16_t multipler, uint6_t shift) {
REQUIRE(multiplier >= 0);
REQUIRE(2 <= shift && shift <= 62);
int64_t round = (1 << (shift - 1));
@@ -586,7 +539,7 @@ In some functions, the multiplier and shift are combined into a scale_t structur
----
typedef struct {
int32_t multiplier;
- int8_t shift;
+ uint6_t shift;
} scale_t;
----
@@ -662,4 +615,4 @@ void generate_lookup_table(int16_t *table, int32_t (*reference)(int32_t))
The following publications are referred to in this specification, or provide more information:
-. IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008.
+. IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008. \ No newline at end of file