1 files changed, 144 insertions, 54 deletions
diff --git a/chapters/introduction.adoc b/chapters/introduction.adoc
index 5a2b9a1..5765f22 100644
--- a/chapters/introduction.adoc
+++ b/chapters/introduction.adoc
@@ -1,7 +1,7 @@
 //
 // This confidential and proprietary software may be used only as
 // authorised by a licensing agreement from ARM Limited
-// (C) COPYRIGHT 2020-2022 ARM Limited
+// (C) COPYRIGHT 2020-2023 ARM Limited
 // ALL RIGHTS RESERVED
 // The entire notice above must be reproduced on all authorised
 // copies and copies may only be made to the extent permitted
@@ -141,7 +141,7 @@ The TOSA specification is a work in progress.
 This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level.
 The term conformant will mean the same as compliant.
 
-==== Baseline Inference Profile Compliance
+==== Base Inference Profile Compliance
 
 The <<Operator Graphs>> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph.
 This behavior is captured in the pseudo-code function tosa_execute_graph().
@@ -181,16 +181,133 @@ bool tosa_test_compliance(tosa_graph_t graph, tosa_list_t input_list, tosa_level
 }
 ----
 
-==== Main Inference and Main Training Profile
+==== Main Inference Profile
 
-An implementation is compliant to the Main Inference or Main Training profiles if the following both hold for that respective profile:
+A Main Inference compliant implementation must satisfy the following:
+
+* The implementation must meet <<Base Inference Profile Compliance>> for all Base inference complaint graphs
+* The implementation must support all Main Inference operations using the datatype fp32_t
+** The operations must meet the precision requirements of <<Main Inference precision requirements>>
+* The implementation must support all Main Inference operations using the datatype fp16_t
+** The operations must meet the precision requirements of <<Main Inference precision requirements>>
+** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype
+* The implementation must support all Main Inference operations using the datatype bf16_t
+** The operations must meet the precision requirements of <<Main Inference precision requirements>>
+** Note: These requirements allow bf16_t operations to be implemented using the fp32_t datatype
+
+As with <<Base Inference Profile Compliance>> the pseudo-code function tosa_execute_graph() can return one of three possible results.
+A compliant implementation must satisfy the following:
 
 * For a graph returning tosa_error the implementation must also return an error
 * For a graph returning tosa_valid the implementation must execute the entire graph without error
 * For a graph returning tosa_valid and consisting only of integer operators the results must match exactly
-* The implementation must report the maximum relative error on a set of standard graphs that contain floating point operators. These graphs will be provided as a future appendix to this specification.
 
-Note that for graphs containing floating point there is no strict precision requirement that must be met, but that the precision achieved must be reported.
+===== Main Inference precision requirements
+
+In a compliant implementation, individual-floating point operations within the graph must meet the following accuracy bounds
+listed in the table below. In the table _ulp_ means unit of the last place.
+
+NOTE: The error criteria in this section are at an early draft stage and are likely to change during conformance test development.
+
+The following criteria apply to all operations:
+
+* If any input is a NaN and the result is floating-point then the result must be a NaN
+* If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false
+* if any input is a NaN and the operation is conversion to an integer or boolean then the result is unpredictable
+
+[cols="1,3"]
+|===
+| Operation | Accuracy bound
+
+| <<ARGMAX>>, <<MAX_POOL2D>>, <<CLAMP>>, <<MAXIMUM>>, <<MINIMUM>>, <<ABS>>, <<NEGATE>>, , <<CONST>>, <<IDENTITY>>
+| The result must be exact.
+
+| <<EQUAL>>, <<GREATER>>, <<GREATER_EQUAL>>
+| The result must be exact with: +
+(1) The sign of the zero is ignored +
+(2) Infinities of the same sign compare as equal
+
+| <<CONV2D>>, <<CONV3D>>, <<DEPTHWISE_CONV2D>>, <<FULLY_CONNECTED>>, <<MATMUL>>, <<TRANSPOSE_CONV2D>>
+| Each output can be expressed as a dot product of two input vectors. +
+The dot product must meet the <<Dot product accuracy requirements>>
+
+| <<FFT2D>>, <<RFFT2D>>
+| Each output can be expressed as a dot product of an input vector with a costant vector. +
+The dot product must meet the <<Dot product accuracy requirements>>
+
+| <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>>, <<CAST>>
+| Floating-point result overflows must be set to infinity of the correct sign. +
+Floating-point result underflows must be set to zero of the correct sign. +
+Integer result overflows must be saturated. +
+Addition of infinites of different signs must produce a NaN. +
+Subtraction of infinities of the same sign must produce a NaN. +
+Multiplication of an infinity by a zero must produce a NaN. +
+Otherwise for fp32_t the result must be rounded to the nearest representable value using the round to nearest, ties to even rounding mode. +
+Otherwise for fp16_t and bf16_t the result must be within 0.5 ulp of the mathematical result.
+
+| <<RECIPROCAL>>
+| If the input is a zero or the result overlows the output must be an infinity of the same sign. +
+If the input is an infinty or the result underflows the output must be a zero of the same sign. +
+Otherwise:the result must be within 1 ulp of the mathematical result.
+
+| <<RSQRT>>
+| If the input is less than zero the result must be a NaN. +
+Otherwise if the input is a zero the output must be an infinity of the same sign. +
+Otherwise the result must be within 1 ulp of the mathematical result.
+
+| <<SIGMOID>>, <<TANH>>, <<POW>>, <<EXP>>, <<LOG>>
+| If the input to LOG is less than zero then the result must be a NaN. +
+If the inputs to POW are both zero then the result must be a NaN. +
+If the first input to POW is less than zero and the second input is not an integer then the result must be a NaN. +
+If the result overflows the output must be an infinity of the correct sign. +
+If the result underflows the output must be a zero of the correct sign. +
+Otherwise the result must be within 5 ulp of the mathematical result.
+
+| <<REDUCE_SUM>>
+| Each output can be expressed as a dot product of an input vector with a vector of ones. +
+This dot product must meet the <<Dot product accuracy requirements>>
+
+| <<AVG_POOL2D>>
+| Each output can be expressed as a dot product of an input vector with a vector with elements 1/d where d is the kernel size. +
+This dot product must meet the <<Dot product accuracy requirements>>
+
+| <<REDUCE_PRODUCT>>
+| Result overflows must be set to an infinity of the correct sign. +
+Result underflows must be set to a zero of the correct sign. +
+Othewise if the final product and all sub-products are within the normal range then the result `R` must have an absolute error of at most `E*abs\(R)`
+where `E = pow(1 + pow(2, -M-1), N) - 1`. In this expression M is the number of mantissa bit of the floating point format and N is the number of elements in the product.
+
+|===
+
+===== Dot product accuracy requirements
+
+This section gives accuracy constraints for operations where the result is a sum of products of N floating-point inputs:
+
+`y = x[0] * w[0] + x[1] * w[1] + ... + x[N-1] * w[N-1]`
+
+Let M be the number of mantissa bits in the accumulator.
+So M=23 for an `fp32_t` accumulator and M=10 for an `fp16_t` accumulator.
+
+In this section "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by <<Other publications>>[1].
+
+Appendix A, defines a number of <<Dot product floating-point test data sets>>.
+For each data test set (S, N) consisting of T tests the following must hold:
+
+* For each test t in the range 0 to T-1, calculate:
+** `y_imp[t] = x[0] * w[0] + ... + x[N-1] * w[N-1]` calculated by the implementation
+** `y_ref[t] = x[0] * w[0] + ... + x[N-1] * w[N-1]` calculated using fp64 arithmetic
+** `y_bnd[t] = abs(x[0] * w[0]) + ... + abs(x[N-1] * w[N-1])` calculated using fp64 arithmetic
+* if `y_bnd[t] == 0` then
+** `y_imp[t]` must be zero and set `y_err[t] = 0`
+* if `y_bnd[t] > 0` then set:
+** `y_err[t] = abs(y_imp[t] - y_ref[t]) * (1<<(M+1)) / y_bnd[t]` calculated using fp64 arithmetic
+* For each test t the following must be satified:
+** `y_ref[t], y_bnd[t], y_imp[t]` must be finite
+** `y_err[t] \<= N`
+* Calculate the mean-square (variance from 0) of y_err using 64 bit IEEE floating point arithmetic:
+**  `y_var = (y_err[0] * y_err[0] + ... + y_err[N-1] * y_err[N-1]) / T`
+* The mean-square must satisfy:
+**  `y_var \<= N/3`
 
 === Tensor Definitions
 
@@ -303,17 +420,32 @@ The number formats supported by a given operator are listed in its table of supp
 |fp16_t
 | -infinity
 | +infinity
-| 16-bit floating-point value.
+| 16-bit half-precision floating-point defined by <<Other publications>>[1]. +
+Normal values must be supported. +
+Denormal values must either be supported or flushed to zero. +
+Positive and negative infinity must be supported. +
+At least one NaN encoding must be supported. +
+Signed zero must be supported.
 
 |bf16_t
 | -infinity
 | +infinity
-| 16-bit brain float value.
+| 16-bit brain floating-point defined as bits [31:16] of the fp32_t format. +
+Normal values must be supported. +
+Denormal values must either be supported or flushed to zero. +
+Positive and negative infinity must be supported. +
+At least one NaN encoding must be supported. +
+Signed zero must be supported.
 
 |fp32_t
 | -infinity
 | +infinity
-| 32-bit floating-point value.
+| 16-bit single-precision floating-point defined by <<Other publications>>[1]. +
+Normal values must be supported. +
+Denormal values must either be supported or flushed to zero. +
+Positive and negative infinity must be supported. +
+At least one NaN encoding must be supported. +
+Signed zero must be supported.
 |===
 
 Note: In this specification minimum<type> and maximum<type> will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point).
@@ -476,50 +608,8 @@ void generate_lookup_table(int16_t *table, int32_t (*reference)(int32_t))
 }
 ----
 
-=== Floating-point
-
-Floating-point support is included in the main inference profile.
-TOSA does not define bit-exact behavior of the floating-point type, since floating-point operation results can vary according to operation order (floating-point addition is not associative in general) and rounding behavior.
-If a bit-exact answer is required then integer operations should be used.
-TOSA does define that the floating-point type must support the following list of features.
-These features ensure that detection of overflow and other exceptional conditions can be handled consistently.
+=== Other publications
 
-* The floating-point type must have at least 16 total bits including the sign bit
-* The floating-point type must support positive and negative infinity values
-* The floating-point type must support at least one Not-a-Number encoding (NaN)
-* The floating-point type must support signed zero
-* The floating-point type must support handling of infinities, NaNs, zeros as in the following table
-
-.floating-point behavior
-|===
-|Case|Result
-
-|Operators other than explicitly mentioned by other rules: Any input operand is a NaN | a NaN
-
-|Comparisons (EQUAL, GREATER, GREATER_EQUAL), where either or both operands is NaN | False
-
-|Comparisons ignore the sign of 0|
-
-|RSQRT (reciprocal square root) of negative numbers | a NaN
-|(&#177; 0) &#215; (&#177; infinity), (&#177; infinity) &#215; (&#177; 0) | a NaN
-
-|LOG of negative numbers | a NaN
-
-|nonzero numbers / (&#177; 0) | (&#177; infinity)
-
-|(&#177; 0) / (&#177; 0), (&#177; infinity) / (&#177; infinity) | a NaN
-
-|(&#177; infinity) * 0 | a NaN
-
-| (+infinity) - (+infinity),  (+infinity) + (-infinity) | a NaN
-
-| Any positive overflow | + infinity
-
-| Any negative overflow | - infinity
-
-| Any positive underflow | + 0
-
-| Any negative underflow | - 0
-
-|===
+The following publications are referred to in this specification, or provide more information:
 
+. IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008.
+\ No newline at end of file