// // This confidential and proprietary software may be used only as // authorised by a licensing agreement from ARM Limited // (C) COPYRIGHT 2020-2024 ARM Limited // ALL RIGHTS RESERVED // The entire notice above must be reproduced on all authorised // copies and copies may only be made to the extent permitted // by a licensing agreement from ARM Limited. == Introduction === Overview Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor operations commonly employed by Deep Neural Networks. The intent is to enable a variety of implementations running on a diverse range of processors, with the results at the TOSA level consistent across those implementations. Applications or frameworks which target TOSA can therefore be deployed on a wide range of different processors, such as SIMD CPUs, GPUs and custom hardware such as NPUs/TPUs, with defined accuracy and compatibility constraints. Most operators from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible in TOSA. It is expected that there will be tools to lower from ML frameworks into TOSA. === Goals The goals of TOSA include the following: * A minimal and stable set of tensor-level operators to which machine learning framework operators can be reduced. * Full support for both quantized integer and floating-point content. * Precise functional description of the behavior of every operator, including the treatment of their numerical behavior in the case of precision, saturation, scaling, and range as required by quantized datatypes. * Independent of any single high-level framework, compiler backend stack or particular target. * The detailed functional and numerical description enables precise code construction for a diverse range of targets – SIMD CPUs, GPUs and custom hardware such as NPUs/TPUs. === Specification The TOSA Specification is written as AsciiDoc mark-up and developed in its raw mark-up form, managed through a git repository here: https://git.mlplatform.org/tosa/specification.git/. The specification is developed and versioned much like software. While the mark-up is legible and can be read fairly easily in its raw form, it is recommended to build or “render” the mark-up into PDF or HTML. To do this, please follow the instructions in the README.md in the root of the specification repository. === Operator Selection Principles TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way. To remain effective and efficient to implement, the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed. The following principles govern the selection of operators within TOSA. .Principles [cols="1,5,5"] |=== |ID|Principle|Reason for this |P0 |An operator shall be a primitive operation or building block that cannot be decomposed into simpler whole tensor operations. |If the operator can be broken down, then we should look at the component operators. |P1 |An operator shall be usable as a component out of which more complex operations can be constructed. |Single use operators have a high architectural cost and a more reusable version should be considered instead. |P2 |Precision should be appropriate for the input and output data types. |Precision higher than that needed to calculate the result leads to extra implementation complexity. |P3 |Numerical definition of common sub-operations should be consistent between operators (for example: value scaling). |Consistent sub-operation definition reduces the operator implementation complexity. |P4 |The valid input and output ranges for all arguments shall be specified. |Ranges are required to make consistent (numerically agreeing) implementations possible. |P5 |Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets. |Reduces implementation cost and gives consistent inference results. |=== === Versioning TOSA follows a semantic versioning policy with a major.minor.patch.draft scheme. See below for the TOSA definition of backward compatibility. * Major version changes may break backwards compatibility. * Minor numbers may add functionality in a backwards compatible way. * Patch versions are for bug fixes, clarifications, or trivial changes. * The draft flag notes whether the version referenced is finalized. Major, minor, and patch numbers are limited to eight bits. Draft is a single bit flag. If stored in a 32-bit value, the remaining bits are reserved for future use. ==== Backwards Compatibility TOSA graphs created with previous minor versions within a major version must continue to work. The following portions of the specification and implementation will not change within a major version: * Operator Names * Arguments including ordering, input/attribute/output, name, rank * ERROR_IF statements * Functionality of the pseudocode for each operator * Level definitions and checks * Supported Data Type tables * Conformance test definitions * Enumerated types and values Changes to the following do not break compatibility: * Order of operations within the XML * Operator section names * Descriptive text that does not affect functionality * Non-functional changes to pseudocode (for example: cleanup, local variable name changes) Minor versions are allowed to add new operators or other functionality as long as the above guarantees hold. In addition, new extensions may be added to the specification between TOSA releases. They may not change anything that would break backward compatibility according to the above definitions. === Profiles TOSA profiles enable efficient implementation on different classes of device. Each profile is an independent set of operations and data type combinations. TOSA profile extensions define optional operation and data type combinations. Each operator's Supported Data Types table will define which profile or extension an operator and data type is in. An operator / data type combination may be part of multiple profiles or extensions. If so, each profile and extension will be listed in the Supported Data Types table. In addition, a table listing all operations for each profile can be found in Appendix B. The following are required for compliant TOSA implementations: * A TOSA implementation must implement at least one profile. * A TOSA implementation may choose to implement any extensions. * If a TOSA implementation chooses to implement an extension, it must implement the complete extension. * If a operator / data type combination requires multiple extensions, the combination is only required to be implemented if all extensions are implemented ** For example, a CAST from bf16 to fp8 is only required if both extensions are implemented. .Profiles include::{generated}/profiles.adoc[] .Profile Extensions include::{generated}/profile_extensions.adoc[] === Levels A TOSA level defines operator argument ranges that an implementation shall support. This is distinct from a profile that defines the operations and data-types supported. One level must apply to all profiles and extensions supported by an implementation. This version of the specification defines two TOSA levels: * No level : allows the full range of arguments specified by the operations according to the operation data types. * Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K. Later versions of the specification may define additional levels. The following table defines the value ranges for each level. These ranges are checked using the LEVEL_CHECK() function with the operator descriptions. .Level maxima include::{generated}/levels.adoc[] === Status This specification is the release candidate for TOSA 1.0. The specific status of each profile and extension is contained in the tables in <>. Possible values for status are: * Complete : All operators are specified, conformance tests are provided, no changes are expected. * Unstable : Operators are specified, conformance tests provided, but less content has been tested. * Incomplete : Operators or conformnace tests may be missing. Changes are likely in future versions of the specification. === Compliance This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level. To be compliant an implementation must achieve the results and accuracy defined by this specification. TOSA also defines a set of conformance tests. A compliant implementation must pass the conformance tests. The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests. ==== Base Inference Profile Compliance The <> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph. This behavior is captured in the pseudo-code function tosa_execute_graph(). For a given input graph (with attributes) and input tensors there are three possible tosa_graph_result values after executing the graph: * tosa_unpredictable: The result of the graph on the given inputs cannot be relied upon. * tosa_error: The graph does not meet the specification and is recognised as an illegal graph. * tosa_valid: The result is defined and predictable and the list of output tensors defines the result. An implementation is compliant to the TOSA Baseline Inference Profile if it matches the above results as follows: * For tosa_unpredictable, the implementation can return whatever result it chooses (including error) * For tosa_error, the implementation must return an error result (and there is no requirement on how much of the graph is executed, if any) * For tosa_valid, the implementation must execute the entire graph without error and return the result defined by this specification. In terms of psuedo-code, if *graph* is a TOSA graph consisting of Baseline Inference Profile operators and *input_list* is a list of input tensors then the following test must pass. [source,c++] ---- bool tosa_test_compliance(tosa_graph_t graph, tensor_list_t input_list, tosa_level_t level) { shape_list_t output_list_spec = tosa_allocate_list(tosa_output_shape(graph)); shape_list_t output_list_test = tosa_allocate_list(tosa_output_shape(graph)); tosa_graph_result = tosa_valid; // result starts as valid tosa_nesting_depth = 0; // if/while nesting level tosa_execute_graph(graph, input_list, output_list_spec, level); if (tosa_graph_result == tosa_unpredictable) { return true; // No requirement to match an unpredictable result } result_test = execute_implementation_under_test(graph, input_list, output_list_test); if (tosa_graph_result == tosa_error) { return result_test == tosa_error; // result must be an error } if (exact_tensor_match(output_list_spec, output_list_test)) { // Predictable bit-exact value match required return true; } return false; } ---- ==== Main Inference Profile Compliance A Main Inference compliant implementation must satisfy the following: * The implementation must meet <> for all Base inference compliant graphs * The implementation must support all Main Inference operations using the datatype fp32_t ** The operations must meet the precision requirements of <
> * The implementation must support all Main Inference operations using the datatype fp16_t ** The operations must meet the precision requirements of <
> ** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype As with <> the pseudo-code function tosa_execute_graph() can return one of three possible results. A compliant implementation must satisfy the following: * For a graph returning tosa_error the implementation must also return an error * For a graph returning tosa_valid the implementation must execute the entire graph without error * For a graph returning tosa_valid and consisting only of integer operators the results must match exactly ===== Main Inference precision requirements In a compliant implementation, individual floating-point operations within the graph must meet the accuracy bounds listed in the table following. In the table _ulp_ means unit of the last place. The function tosa_reference_check_fp() defines the error range permitted by a given number of units of last place in this specification. The following criteria apply to all operations: * If any input is a NaN and the result is floating-point then the result must be a NaN * If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false * if any input is a NaN and the operation is conversion to an integer or boolean then the result is unpredictable [cols="1,3"] |=== | Operation | Accuracy bound | <>, <>, <>, <>, <>, <>, <>, <