chapters/appendix_a.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330

//
// This confidential and proprietary software may be used only as
// authorised by a licensing agreement from ARM Limited
// (C) COPYRIGHT 2023 ARM Limited
// ALL RIGHTS RESERVED
// The entire notice above must be reproduced on all authorised
// copies and copies may only be made to the extent permitted
// by a licensing agreement from ARM Limited.

== Appendix A

NOTE: This appendix is at an early stage of development at this point in time

=== Random data generation

The following function generates a pseudo-random floating-point value in the range -1.0 to +1.0 for use as test data.
It uses a modulo (1<<32) recurrent sequence with multiplier derived from "TOSASETS" and the set number.

[source,c++]
----
float set_data(uint32_t set, uint32_t index)
{
    uint32_t m = (8*set + 1) * 0x705A5E75;   // mod (1<<32) calculation
    uint32_t r = m + 1;                      // mod (1<<32) calculation
    for (uint32_t i = 0; i < index; i++) {
        r = r * m + 1;                       // mod (1<<32) calculation
    }
    float  sign = (r>>31)==0 ? +1 : -1;
    return sign * (float)(r & 0x7FFFFFFF) / (float)(0x7FFFFFFF);
}
----

=== Main Inference test data generator

This section describes the function tosa_mi_data(S, KS, p, k, i) that generates test data for main inference compliance.
This function takes the following arguments:

* S is the test set number which identifies which generator is used
* KS is the kernel size
* p is the parameter number of:
** 0 for the first input (usually data)
** 1 for the second input (usually weights)
** 2 for the third input if present (usually bias)
* k is the index within the kernel in the range 0 \<= k < KS
* i is the index within the tensor to write

Some test data values are scaled by the bound parameter B which is defined in the table below.
B is set to be the largest value that is both representable by the input type and such that B*B does not overflow the accumulator precision.

|===
| inputs type | accumulator type | B value
| fp16        | fp16             | (1<<8)  - (1/8)  = 255.875
| fp16        | fp32             | (1<<16) - (1<<5) = 65504
| bf16        | fp32             | (1<<64) - (1<<56)
| fp32        | fp32             | (1<<64) - (1<<40)
|===

==== Test set S=0 generator

The aim of this generator is to check that sum of products with zero gives zero result.

[cols="1,9"]
|===
| p | tosa_mi_data(S, KS, p, k, i) =
| 0 | set_data(2*S, i) < 0 ? 0.0 : set_data(2*S+1, i)
| 1 | set_data(2*S, i) < 0 ? set_data(2*S+1, i) : 0.0
| 2 | 0.0
|===

==== Test set S=1

The aim of this test set is to check values with large exponents.

[cols="1,9"]
|===
| p | tosa_mi_data(S, KS, p, k, i) =
| 0 | (B/sqrt(KS+1))*(0.75 + 0.25*set_data(3*S+0, i))
| 1 | (B/sqrt(KS+1))*(0.75 + 0.25*set_data(3*S+1, i))
| 2 | (B*B/(KS+1))*(0.75 + 0.25*set_data(3*S+2, i))
|===

==== Test set S=2

The aim of this test set is to check rounding error when accumulating small values onto a large value.
In this case the small values are of similar magnitude.
If the implementation changes the order of the sum, then the test data must also be reordered so that the largest values occur first in the sum.

[cols="1,9"]
|===
| p | tosa_mi_data(S, KS, p, k, i) =
| 0 | (k==0) ? 1.0 : set_data(2*S+0, i)/sqrt(KS)
| 1 | (k==0) ? 1.0 : set_data(2*S+1, i)/sqrt(KS)
| 2 | 0.0
|===

==== Test set S=3

The aim of this test set is to check rounding error when accumulating small values onto a large value.
In this case the small values are of varying magnitude.
If the implementation changes the order of the sum, then the test data must also be reordered so that the largest values occur first in the sum.

[cols="1,9"]
|===
| p | tosa_mi_data(S, KS, p, k, i) =
| 0 | (k==0) ? 16.0 : exp(2*set_data(2*S+0, 2*i+0)) * set_data(2*S+0, 2*i+1)
| 1 | (k==0) ? 16.0 : exp(2*set_data(2*S+1, 2*i+0)) * set_data(2*S+1, 2*i+1)
| 2 | 0.0
|===

==== Test set S=4

The aim of this test set is to check a mixture of zero and non-zero products.

[cols="1,9"]
|===
| p | tosa_mi_data(S, KS, p, k, i) =
| 0 | (k==KS/2) ? (set_data(2*S, i) < 0 ? -0.5 : +0.5) : (set_data(2*S, i) < 0 ? 0.0 : (B/sqrt(KS))*set_data(2*S+1, i))
| 1 | (k==KS/2) ? (set_data(2*S, i) < 0 ? +0.5 : -0.5) : (set_data(2*S, i) < 0 ? (B/sqrt(KS))*set_data(2*S+1, i) : 0.0)
| 2 | 0.0
|===

==== Test set S=5

The aim of this test set is to check signed inputs of large range.

[cols="1,9"]
|===
| p | tosa_mi_data(S, KS, p, k, i) =
| 0 | (B/sqrt(KS))*set_data(3*S+0, i)
| 1 | (B/sqrt(KS))*set_data(3*S+1, i)
| 2 | 0.0
|===

=== Main Inference operator test data

For each operator, this section defines how to generate test data for test set S.
For the results to be statistically significant the operation must calculate at least MIN_DOT_PRODUCTS dot products.
For most operations this means that the output tensor must have at least MIN_DOT_PRODUCTS output values.
For most operations batch size can be increased if necessary so that this holds.
For this version of the specification, MIN_DOT_PRODUCTS is set to 1000.

==== CONV2D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*OH*OW*OC >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = KW*KH*IC;
for (0 <= n < N, 0 <= iy < IH, 0 <= ix < IW, 0 <= ic < IC) {
  input [ n, iy, ix, ic] = tosa_mi_data(S, KS, 0, ((iy % KH)*KW+(ix % KW))*IC+ic, ((n*IH+iy)*IW+ix)*IC+ic);
}
for (0 <= oc < OC, 0 <= ky < KH, 0 <= kx < KW, 0 <= ic < IC) {
  weight[oc, ky, kx, ic] = tosa_mi_data(S, KS, 1, (ky*KW+kx)*IC+ic, ((oc*KH+ky)*KW+kx)*IC+ic);
}
for (0 <= oc < BC) {
  bias[oc] = tosa_mi_data(S, KS, 2, oc)
}
----

==== CONV3D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*OD*OH*OW*OC >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = KD*KW*KH*IC;
for (0 <= n < N, 0 <= id < UD, 0 <= iy < IH, 0 <= ix < IW, 0 <= ic < IC) {
  input [ n, id, iy, ix, ic] = tosa_mi_data(S, KS, 0, (((id % KD)*KH+(iy % KH))*KW+(ix % KW))*IC+ic, (((n*ID+id)*IH+iy)*IW+ix)*IC+ic);
}
for (0 <= oc < OC, 0 <= kd < KD, 0 <= ky < KH, 0 <= kx < KW, 0 <= ic < IC) {
  weight[oc, kd, ky, kx, ic] = tosa_mi_data(S, KS, 1, ((kd*KH+ky)*KW+kx)*IC+ic, (((oc*KD+kd)*KH+ky)*KW+kx)*IC+ic);
}
for (0 <= oc < BC) {
  bias[oc] = tosa_mi_data(S, KS, 2, oc)
}
----

==== DEPTHWISE_CONV2D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*OH*OW*C*M >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = KW*KH;
for (0 <= n < N, 0 <= iy < IH, 0 <= ix < IW, 0 <= c < C) {
  input [ n, iy, ix, c] = tosa_mi_data(S, KS, 0, (iy % KH)*KW+(ix % KW), ((n*IH+iy)*IW+ix)*C+c);
}
for (0 <= ky < KH, 0 <= kx < KW, 0 <= c < C, 0 <= m < M) {
  weight[ky, kx,  c, m] = tosa_mi_data(S, KS, 1, (ky*KW+kx), ((ky*KW+kx)*C+c)*M+m);
}
for (0 <= oc < C*M) {
  bias[oc] = tosa_mi_data(S, KS, 2, oc)
}
----

==== FULLY_CONNECTED

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*OC >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = IC;
for (0 <= n < N, 0 <= ic < IC) {
  input [ n, ic] = tosa_mi_data(S, KS, 0, ic,  n*IC+ic);
}
for (0 <= oc < OC, 0 <= ic < IC) {
  weight[oc, ic] = tosa_mi_data(S, KS, 1, ic, oc*IC+ic);
}
for (0 <= oc < BC) {
  bias[oc] = tosa_mi_data(S, KS, 2, oc)
}
----

==== MATMUL

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*H*W >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = C;
for (0 <= n < N, 0 <= y < H, 0 <= c < C) {
  A[n, y, c] = tosa_mi_data(S, KS, 0, c, (n*H+y)*C+c);
}
for (0 <= n < N, 0 <= c < C, 0 <= x < W) {
  B[n, c, x] = tosa_mi_data(S, KS, 1, c, (n*C+c)*W+x);
}
----

==== TRANSPOSE_CONV2D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*OH*OW*OC >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = KW*KH*IC;
for (0 <= n < N, 0 <= iy < IH, 0 <= ix < IW, 0 <= ic < IC) {
  input [ n, iy, ix, ic] = tosa_mi_data(S, KS, 0, ((iy % KH)*KW+(ix % KW))*IC+ic, ((n*IH+iy)*IW+ix)*IC+ic);
}
for (0 <= oc < OC, 0 <= ky < KH, 0 <= kx < KW, 0 <= ic < IC) {
  weight[oc, ky, kx, ic] = tosa_mi_data(S, KS, 1, (ky*KW+kx)*IC+ic, ((oc*KH+ky)*KW+kx)*IC+ic);
}
for (0 <= oc < BC) {
  bias[oc] = tosa_mi_data(S, KS, 2, oc)
}
----

==== FFT2D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*H*W >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = 2*H*W;
for (0 <= n < N, 0 <= y < H, 0 <= x < W) {
  input_real[n, y, x] = tosa_mi_data(S, KS, 0, y*W+x, ((0*N+n)*H+y)*IW+x);
  input_imag[n, y, x] = tosa_mi_data(S, KS, 0, y*W+x, ((1*N+n)*H+y)*IW+x);
}
for (0 <= y < H, 0 <= x < W, 0 <= m < H, 0 <= n < W) {
  weight_real[y, x, m, n] = real(exp(2*pi*i*((m*h/H) + (n*w/W))));
  weight_imag[y, x, m, n] = imag(exp(2*pi*i*((m*h/H) + (n*w/W))));
}
----

==== RFFT2D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*H*W >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = H*W;
for (0 <= n < N, 0 <= y < H, 0 <= x < W) {
  input_real[n, y, x] = tosa_mi_data(S, KS, 0, y*W+x, ((0*N+n)*H+y)*IW+x);
}
for (0 <= y < H, 0 <= x < W, 0 <= m < H, 0 <= n < W) {
  weight_real[y, x, m, n] = real(exp(2*pi*i*((m*h/H) + (n*w/W))));
  weight_imag[y, x, m, n] = imag(exp(2*pi*i*((m*h/H) + (n*w/W))));
}
----

==== REDUCE_SUM

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`tensor_size(shape) >= MIN_DOT_PRODUCTS`

[source,c++]
----
KS = shape1[axis];
for (index in shape1) {
  input[index] = tosa_mi_data(S, KS, 0, index[axis], tensor_index_to_offset(index));
}
for (0 <= c < KS) {
  weight[c] = 1;
}
----

==== AVG_POOL2D

The following generates input test data for test set S.
For compliant implementation, the test must pass whenever the attributes satisfy:
`N*OH*OW*C >= MIN_DOT_PRODUCTS`

[source,c++]
----
KX = kernel_x;
KY = kernel_y;
KS = KX*KY;
for (0 <= n < N, 0 <= iy < IH, 0 <= ix < IW, 0 <= c < C) {
  input [ n, iy, ix, c] = tosa_mi_data(S, KS, 0, ((iy % KY)*KX+(ix % KX))*C+c, ((n*IH+iy)*IW+ix)*C+c);
}
for (0 <= ky < KY, 0 <= kx < KX, 0 <= c < C, 0 <= m < M) {
  weight[ky, kx] = 1/KS;
}
----