// // This confidential and proprietary software may be used only as // authorised by a licensing agreement from ARM Limited // (C) COPYRIGHT 2020-2024 ARM Limited // ALL RIGHTS RESERVED // The entire notice above must be reproduced on all authorised // copies and copies may only be made to the extent permitted // by a licensing agreement from ARM Limited. == Introduction === Overview Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor operations commonly employed by Deep Neural Networks. The intent is to enable a variety of implementations running on a diverse range of processors, with the results at the TOSA level consistent across those implementations. Applications or frameworks which target TOSA can therefore be deployed on a wide range of different processors, such as SIMD CPUs, GPUs and custom hardware such as NPUs/TPUs, with defined accuracy and compatibility constraints. Most operators from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible in TOSA. It is expected that there will be tools to lower from ML frameworks into TOSA. === Goals The goals of TOSA include the following: * A minimal and stable set of tensor-level operators to which machine learning framework operators can be reduced. * Full support for both quantized integer and floating-point content. * Precise functional description of the behavior of every operator, including their numerical behavior in the case of precision, saturation, scaling, and range as required by quantized datatypes. * Independent of any single high-level framework, compiler backend stack or particular implementation. * The detailed functional and numerical description enables precise code construction for a diverse range of targets – SIMD CPUs, GPUs and custom hardware such as NPUs/TPUs. === Specification The TOSA Specification is written as a combination of XML, AsciiDoc mark-up, and pseudocode files. The content is managed through a git repository here: https://git.mlplatform.org/tosa/specification.git/. The specification is developed and versioned much like software. The pseudocode (.tosac files) is written in a style similar to C++, however it is not guaranteed to be valid or compile as it exists. While the AsciiDoc content is legible and can be read fairly easily in its raw form, it is recommended to build or “render” the mark-up into PDF or HTML. The build process will also create the tables in the specification from the XML. To do this, please follow the instructions in the README.md in the root of the specification repository. === Operator Selection Principles TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way. To remain effective and efficient to implement, the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed. The following principles govern the selection of operators within TOSA. .Principles [cols="1,5,5"] |=== |ID|Principle|Reason for this |P0 |An operator shall be a primitive operation or building block that cannot be decomposed into simpler whole tensor operations. |If the operator can be broken down, then we should look at the component operators. |P1 |An operator shall be usable as a component out of which more than one type of complex operation can be constructed. |Single use operators have a high architectural cost and a more reusable version should be considered instead. |P2 |Precision should be appropriate for the input and output data types. |Precision higher than that needed to calculate the result leads to extra implementation complexity. |P3 |Numerical definition of common sub-operations should be consistent between operators (for example: value scaling). |Consistent sub-operation definition reduces the operator implementation complexity. |P4 |The valid input and output ranges for all arguments shall be specified. |Ranges are required to make consistent (numerically agreeing) implementations possible. |P5 |Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets. |Reduces implementation cost and gives consistent inference results. |=== === Versioning TOSA follows a semantic versioning policy with a major.minor.patch.draft scheme. See below for the TOSA definition of backward compatibility. * Major version changes may break backwards compatibility. * Minor numbers may add functionality in a backwards compatible way. * Patch versions are for bug fixes, clarifications, or trivial changes. * The draft flag notes whether the version referenced is finalized. Major, minor, and patch numbers are limited to eight bits. Draft is a single bit flag. If stored in a 32-bit value, the remaining bits are reserved for future use. ==== Backwards Compatibility TOSA graphs created with previous minor versions within a major version must continue to work. The following portions of the specification and implementation will not change within a major version: * Operator Names * Arguments including ordering, input/attribute/output, name, rank * ERROR_IF statements * Functionality of the pseudocode for each operator * Level definitions and checks * Supported Data Type tables * Conformance test definitions * Enumerated types and values Changes to the following do not break compatibility: * Order of operation definitions within the XML specification * Operator section names * Descriptive text that does not affect functionality * Non-functional changes to pseudocode (for example: cleanup, variable name changes) Minor versions are allowed to add new operators or other functionality as long as the above guarantees hold. In addition, new extensions may be added to the specification between TOSA releases. They may not change anything that would break backward compatibility according to the above definitions. === Profiles TOSA profiles enable efficient implementation on different classes of device. Each profile is an independent set of operations and data type combinations. TOSA profile extensions define optional operation and data type combinations. Each operator's Supported Data Types table defines which profile or extension includes that operator with different data types. An operator / data type combination may be part of multiple profiles or extensions. If so, each profile and extension will be listed in the Supported Data Types table. In addition, a table listing all operations for each profile can be found in Appendix B. The following are required for compliant TOSA implementations: * A TOSA implementation must implement at least one profile. * A TOSA implementation may choose to implement any extensions. * If a TOSA implementation chooses to implement an extension, it must implement the complete extension. * If an operator / data type combination requires multiple extensions, the combination is only required to be implemented if all extensions are implemented ** For example, a CAST from bf16 to fp8 is only required if both extensions are implemented. .Profiles include::{generated}/profiles.adoc[] .Profile Extensions include::{generated}/profile_extensions.adoc[] === Levels A TOSA level defines operator argument ranges that an implementation shall support. This is distinct from a profile that defines the operations and data-types supported. One level must apply to all profiles and extensions supported by an implementation. This version of the specification defines two TOSA levels: * No level : allows the full range of arguments specified by the operations according to the operation data types. * Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K. Later versions of the specification may define additional levels. The following table defines the value ranges for each level. These ranges are checked using the LEVEL_CHECK() function with the operator descriptions. .Level maxima include::{generated}/levels.adoc[] === Status This specification is the release candidate for TOSA 1.0. The specific status of each profile and extension is contained in the tables in <>. Possible values for status are: * Complete : All operators are specified, conformance tests are provided, no changes are expected. * Unstable : Operators are specified, conformance tests provided, but less content has been tested. * Incomplete : Operators or conformnace tests may be missing. Changes are likely in future versions of the specification. === Supported Number Formats The following number formats are defined in TOSA. The number formats supported by a given operator are listed in its table of supported types. A TOSA implementation must support the number formats listed in the supported data types for operators contained in that profile. Number formats not required for any operators in a profile do not need to be implemented. .Number formats [cols="1,1,1,5"] |=== |Format|Minimum|Maximum|Description |bool_t | - | - |Boolean value that is either `true` or `false`. Size is implementation defined. The TOSA reference model implements this as int8_t with 0 for `false` and 1 for `true`. All non-zero values are accepted on input as `true`. |i4_t | - | - |Signless 4-bit integer type. Will be interpreted as int4_t by all operators |int4_t | -7 | +7 |Signed 4-bit two's-complement value. Excludes -8 to maintain a symmetric about zero range for weights. |i8_t | - | - |Signless 8-bit integer value. Will be interpreted as int8_t unless otherwise specified by an operator. |int8_t | -128 | +127 |Signed 8-bit two's-complement value. |uint8_t | 0 | 255 |Unsigned 8-bit integer value. |i16_t | - | - |Signless 16-bit integer type. Will be interpreted as int16_t unless otherwise specified by an operator. |int16_t | -32768 | +32767 |Signed 16-bit two's-complement value. |uint16_t | 0 | 65535 |Unsigned 16-bit value. |i32_t | - | - |Signless 32-bit integer value. Will be interpreted as int32_t by all operators. |int32_t | -(1<<31) | (1<<31)-1 |Signed 32-bit two's-complement value. |i48_t | - | - |Signless 48-bit integer value. Will be interpreted as int48_t by all operators. |int48_t | -(1<<47) | (1<<47)-1 |Signed 48-bit two's-complement value. |fp8e4m3_t | -448 | 448 | 8-bit floating-point defined by <> with four bits of exponent and three bits of mantissa. + Normal values must be supported. + Denormal values must be supported. + NaN encodings must be supported. + Signed zero must be supported. + This format has no encoding for infinities. + The range is extended by using a mantissa-exponent bit pattern to encode NaN instead of sacrificing an exponent value. |fp8e5m2_t | -infinity | +infinity | 8-bit floating-point defined by <> with five bits of exponent and two bits of mantissa. + Normal values must be supported. + Denormal values must be supported. + Positive and negative infinity must be supported. + NaN encodings must be supported. + Signed zero must be supported. |fp16_t | -infinity | +infinity | 16-bit half-precision floating-point defined by <> . + Normal values must be supported. + Denormal values must either be supported or flushed to zero. + Positive and negative infinity must be supported. + At least one NaN encoding must be supported. + Signed zero must be supported. |bf16_t | -infinity | +infinity | 16-bit brain floating-point defined as bits [31:16] of the fp32_t format. + Normal values must be supported. + Denormal values must either be supported or flushed to zero. + Positive and negative infinity must be supported. + At least one NaN encoding must be supported. + Signed zero must be supported. |fp32_t | -infinity | +infinity | 32-bit single-precision floating-point defined by <> . + Normal values must be supported. + Denormal values must either be supported or flushed to zero. + Positive and negative infinity must be supported. + At least one NaN encoding must be supported. + Signed zero must be supported. |fp64_t | -infinity | + infinity | 64-bit double-precision floating-point defined by <>. + Normal values must be supported. + Denormal values must either be supported or flushed to zero. + Positive and negative infinity must be supported. + At least one NaN encoding must be supported. + Signed zero must be supported. |=== Note: In this specification, minimum and maximum will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point). The minimum and maximum values for each type are given in the preceding table. Note: Integer number formats smaller than 8 bits may be used provided that the numerical result is the same as using a sequence of 8-bit TOSA operations. For example, the result of a convolution with low precision data must equal that of running the convolution at 8 bits and then clipping the result to the permitted output range. This ensures that a Base Inference profile TOSA implementation can calculate the same result. === Compliance This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level. To be compliant an implementation must achieve the results and accuracy defined by this specification. TOSA also defines a set of conformance tests. A compliant implementation must pass the conformance tests. The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests. ==== TOSA Graph Compliance The <> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph. This behavior is captured in the pseudocode function tosa_execute_graph(). For a given input graph (with attributes) and input tensors there are three possible tosa_graph_result values after executing the graph: * tosa_unpredictable: The result of the graph on the given inputs cannot be relied upon. * tosa_error: The graph does not meet the specification and is recognised as an illegal graph. * tosa_valid: The result is defined and predictable and the list of output tensors defines the result. An implementation must behave as follows given the above tosa_graph result values: * For tosa_unpredictable, the implementation can return whatever result it chooses (including error) * For tosa_error, the implementation must return an error result (and there is no requirement on how much of the graph is executed, if any) * For tosa_valid, the implementation must execute the entire graph without error and return the result defined by this specification. In terms of pseudocode, if *graph* is a TOSA graph consisting of TOSA operators and *input_list* is a list of input tensors then the following test must pass. [source,c++] ---- // Global result status value // Will be updated by REQUIRE and ERROR_IF statements when evaluating the TOSA graph tosa_result_t tosa_graph_result; // Tracks the nesting depth of TOSA operators to allow a limit on nesting depth to be checked. int32_t tosa_nesting_depth; bool tosa_test_compliance(tosa_graph_t graph, tensor_list_t input_list, tosa_level_t level) { shape_list_t output_list_spec = tosa_allocate_list(tosa_output_shape(graph)); shape_list_t output_list_test = tosa_allocate_list(tosa_output_shape(graph)); tosa_graph_result = tosa_valid; // result starts as valid tosa_nesting_depth = 0; // if/while nesting level tosa_execute_graph(graph, input_list, output_list_spec, level); if (tosa_graph_result == tosa_unpredictable) { return true; // No requirement to match an unpredictable result } result_test = execute_implementation_under_test(graph, input_list, output_list_test); if (tosa_graph_result == tosa_error) { return result_test == tosa_error; // result must be an error } if (exact_tensor_match(output_list_spec, output_list_test)) { // Predictable bit-exact value match required return true; } return false; } ---- ==== Base Inference Profile Compliance A Base Inference compliant implementation must satisfy the following: * The implementation must support all operator and data type combinations listed in <> ** The operations must meet the <> * The implementation must follow the <> behavior ===== Base Inference Precision Requirements In a compliant implementation, individual integer operations within the graph must match exactly. ==== Main Inference Profile Compliance A Main Inference compliant implementation must satisfy the following: * The implementation must support all operator and data type combinations listed in <
> ** The operations must meet the <
> ** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype * The implementation must follow the <> behavior ===== Main Inference Precision Requirements In a compliant implementation, individual integer operations must match exactly. In a compliant implementation, individual floating-point operations within the graph must meet the accuracy bounds listed in the table following, for all operations where no input is a NaN. In the table, _ulp_ means unit of the last place. The function tosa_reference_check_fp() defines the error range permitted by a given number of units of last place in this specification. The following criteria apply to all operations: * If any input is a NaN and the result is floating-point then the result must be a NaN * If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false * if any input is a NaN and the operation is conversion to an integer or Boolean then the result is unpredictable [cols="1,3"] |=== | Operation | Accuracy bound | <>, <>, <>, <