aboutsummaryrefslogtreecommitdiff
path: root/docs/02_tests.dox
blob: 0eb6cee4874b3a89f0771dffd4c07035b2aeab6c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
namespace arm_compute
{
namespace test
{
/**
@page tests Validation and benchmarks tests

@tableofcontents

@section tests_overview Overview

Benchmark and validation tests are based on the same framework to setup and run
the tests. In addition to running simple, self-contained test functions the
framework supports fixtures and data test cases. The former allows to share
common setup routines between various backends thus reducing the amount of
duplicated code. The latter can be used to parameterize tests or fixtures with
different inputs, e.g. different tensor shapes. One limitation is that
tests/fixtures cannot be parameterized based on the data type if static type
information is needed within the test (e.g. to validate the results).

@subsection tests_overview_structure Directory structure

    .
    |-- computer_vision <- Legacy tests. No new test must be added. <!-- FIXME: Remove before release -->
    `-- tests <- Top level test directory. All files in here are shared among validation and benchmark.
        |-- framework <- Underlying test framework.
        |-- CL   \
        |-- NEON -> Backend specific files with helper functions etc.
        |-- VX   / <!-- FIXME: Remove VX -->
        |-- benchmark <- Top level directory for the benchmarking files.
        |   |-- fixtures <- Fixtures for benchmark tests.
        |   |-- CL <- OpenCL backend test cases on a function level.
        |   |   `-- SYSTEM <- OpenCL system tests, e.g. whole networks
        |   `-- NEON <- Same for NEON
        |       `-- SYSTEM
        |-- datasets <- Datasets for benchmark and validation tests.
        |-- main.cpp <- Main entry point for the tests. Currently shared between validation and benchmarking.
        |-- networks <- Network classes for system level tests.
        |-- validation_old <- Old validation framework. No new tests must be added! <!-- FIXME: Remove before release -->
        |   |-- dataset <- Old datasets for boost. Not to be used for new tests! <!-- FIXME: Remove before release -->
        |   |-- model_objects <- Old helper files for system level validation. Not to be used for new tests! <!-- FIXME: Remove before release -->
        |   |-- CL   \
        |   |-- DEMO  \
        |   |-- NEON --> Backend specific test cases
        |   |-- UNIT  /
        |   |-- VX   / <!-- FIXME: Remove VX -->
        |   `-- system_tests -> System level tests
        |       |-- CL
        |       `-- NEON
        `-- validation -> Top level directory for validation files.
            |-- CPP -> C++ reference code
            |-- CL   \
            |-- NEON -> Backend specific test cases
            |-- VX   / <!-- FIXME: Remove VX -->
            `-- fixtures -> Fixtures shared among all backends. Used to setup target function and tensors.

@subsection tests_overview_fixtures Fixtures

Fixtures can be used to share common setup, teardown or even run tasks among
multiple test cases. For that purpose a fixture can define a `setup`,
`teardown` and `run` method. Additionally the constructor and destructor might
also be customized.

An instance of the fixture is created immediately before the actual test is
executed. After construction the @ref framework::Fixture::setup method is called. Then the test
function or the fixtures `run` method is invoked. After test execution the
@ref framework::Fixture::teardown method is called and lastly the fixture is destructed.

@subsubsection tests_overview_fixtures_fixture Fixture

Fixtures for non-parameterized test are straightforward. The custom fixture
class has to inherit from @ref framework::Fixture and choose to implement any of the
`setup`, `teardown` or `run` methods. None of the methods takes any arguments
or returns anything.

    class CustomFixture : public framework::Fixture
    {
        void setup()
        {
            _ptr = malloc(4000);
        }

        void run()
        {
            ARM_COMPUTE_ASSERT(_ptr != nullptr);
        }

        void teardown()
        {
            free(_ptr);
        }

        void *_ptr;
    };

@subsubsection tests_overview_fixtures_data_fixture Data fixture

The advantage of a parameterized fixture is that arguments can be passed to the setup method at runtime. To make this possible the setup method has to be a template with a type parameter for every argument (though the template parameter doesn't have to be used). All other methods remain the same.

    class CustomFixture : public framework::Fixture
    {
    #ifdef ALTERNATIVE_DECLARATION
        template <typename ...>
        void setup(size_t size)
        {
            _ptr = malloc(size);
        }
    #else
        template <typename T>
        void setup(T size)
        {
            _ptr = malloc(size);
        }
    #endif

        void run()
        {
            ARM_COMPUTE_ASSERT(_ptr != nullptr);
        }

        void teardown()
        {
            free(_ptr);
        }

        void *_ptr;
    };

@subsection tests_overview_test_cases Test cases

All following commands can be optionally prefixed with `EXPECTED_FAILURE_` or
`DISABLED_`.

@subsubsection tests_overview_test_cases_test_case Test case

A simple test case function taking no inputs and having no (shared) state.

- First argument is the name of the test case (has to be unique within the
  enclosing test suite).
- Second argument is the dataset mode in which the test will be active.


    TEST_CASE(TestCaseName, DatasetMode::PRECOMMIT)
    {
        ARM_COMPUTE_ASSERT_EQUAL(1 + 1, 2);
    }

@subsubsection tests_overview_test_cases_fixture_fixture_test_case Fixture test case

A simple test case function taking no inputs that inherits from a fixture. The
test case will have access to all public and protected members of the fixture.
Only the setup and teardown methods of the fixture will be used. The body of
this function will be used as test function.

- First argument is the name of the test case (has to be unique within the
  enclosing test suite).
- Second argument is the class name of the fixture.
- Third argument is the dataset mode in which the test will be active.


    class FixtureName : public framework::Fixture
    {
        public:
            void setup() override
            {
                _one = 1;
            }

        protected:
            int _one;
    };

    FIXTURE_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT)
    {
        ARM_COMPUTE_ASSERT_EQUAL(_one + 1, 2);
    }

@subsubsection tests_overview_test_cases_fixture_register_fixture_test_case Registering a fixture as test case

Allows to use a fixture directly as test case. Instead of defining a new test
function the run method of the fixture will be executed.

- First argument is the name of the test case (has to be unique within the
  enclosing test suite).
- Second argument is the class name of the fixture.
- Third argument is the dataset mode in which the test will be active.


    class FixtureName : public framework::Fixture
    {
        public:
            void setup() override
            {
                _one = 1;
            }

            void run() override
            {
                ARM_COMPUTE_ASSERT_EQUAL(_one + 1, 2);
            }

        protected:
            int _one;
    };

    REGISTER_FIXTURE_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT);


@subsubsection tests_overview_test_cases_data_test_case Data test case

A parameterized test case function that has no (shared) state. The dataset will
be used to generate versions of the test case with different inputs.

- First argument is the name of the test case (has to be unique within the
  enclosing test suite).
- Second argument is the dataset mode in which the test will be active.
- Third argument is the dataset.
- Further arguments specify names of the arguments to the test function. The
  number must match the arity of the dataset.


    DATA_TEST_CASE(TestCaseName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}), num)
    {
        ARM_COMPUTE_ASSERT(num < 4);
    }

@subsubsection tests_overview_test_cases_fixture_data_test_case Fixture data test case

A parameterized test case that inherits from a fixture. The test case will have
access to all public and protected members of the fixture. Only the setup and
teardown methods of the fixture will be used. The setup method of the fixture
needs to be a template and has to accept inputs from the dataset as arguments.
The body of this function will be used as test function. The dataset will be
used to generate versions of the test case with different inputs.

- First argument is the name of the test case (has to be unique within the
  enclosing test suite).
- Second argument is the class name of the fixture.
- Third argument is the dataset mode in which the test will be active.
- Fourth argument is the dataset.


    class FixtureName : public framework::Fixture
    {
        public:
            template <typename T>
            void setup(T num)
            {
                _num = num;
            }

        protected:
            int _num;
    };

    FIXTURE_DATA_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}))
    {
        ARM_COMPUTE_ASSERT(_num < 4);
    }

@subsubsection tests_overview_test_cases_register_fixture_data_test_case Registering a fixture as data test case

Allows to use a fixture directly as parameterized test case. Instead of
defining a new test function the run method of the fixture will be executed.
The setup method of the fixture needs to be a template and has to accept inputs
from the dataset as arguments. The dataset will be used to generate versions of
the test case with different inputs.

- First argument is the name of the test case (has to be unique within the
  enclosing test suite).
- Second argument is the class name of the fixture.
- Third argument is the dataset mode in which the test will be active.
- Fourth argument is the dataset.


    class FixtureName : public framework::Fixture
    {
        public:
            template <typename T>
            void setup(T num)
            {
                _num = num;
            }

            void run() override
            {
                ARM_COMPUTE_ASSERT(_num < 4);
            }

        protected:
            int _num;
    };

    REGISTER_FIXTURE_DATA_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}));

@section writing_tests Writing validation tests

Before starting a new test case have a look at the existing ones. They should
provide a good overview how test cases are structured.

- The C++ reference needs to be added to `tests/validation/CPP/`. The
  reference function is typically a template parameterized by the underlying
  value type of the `SimpleTensor`. This makes it easy to specialise for
  different data types.
- If all backends have a common interface it makes sense to share the setup
  code. This can be done by adding a fixture in
  `tests/validation/fixtures/`. Inside of the `setup` method of a fixture
  the tensors can be created and initialised and the function can be configured
  and run. The actual test will only have to validate the results. To be shared
  among multiple backends the fixture class is usually a template that accepts
  the specific types (data, tensor class, function class etc.) as parameters.
- The actual test cases need to be added for each backend individually.
  Typically the will be multiple tests for different data types and for
  different execution modes, e.g. precommit and nightly.

<!-- FIXME: Remove before release -->
@section building_test_dependencies Building dependencies

@note Only required when tests from the old validation framework need to be run.

The tests currently make use of Boost (Test and Program options) for
validation. Below are instructions about how to build these 3rd party
libraries.

@note By default the build of the validation and benchmark tests is disabled, to enable it use `validation_tests=1` and `benchmark_tests=1`

@subsection building_boost Building Boost

First follow the instructions from the Boost library on how to setup the Boost
build system
(http://www.boost.org/doc/libs/1_64_0/more/getting_started/index.html).
Afterwards the required libraries can be build with:

    ./b2 --with-program_options --with-test link=static \
    define=BOOST_TEST_ALTERNATIVE_INIT_API

Additionally, depending on your environment, it might be necessary to specify
the ```toolset=``` option to choose the right compiler. Moreover,
```address-model=32``` can be used to force building for 32bit and
```target-os=android``` must be specified to build for Android.

After executing the build command the libraries
```libboost_program_options.a``` and ```libboost_unit_test_framework.a``` can
be found in ```./stage/lib```.
<!-- FIXME: end remove -->

@section tests_running_tests Running tests
@subsection tests_running_tests_benchmarking Benchmarking
@subsubsection tests_running_tests_benchmarking_filter Filter tests
All tests can be run by invoking

    ./arm_compute_benchmark ./data

where `./data` contains the assets needed by the tests.

If only a subset of the tests has to be executed the `--filter` option takes a
regular expression to select matching tests.

    ./arm_compute_benchmark --filter='NEON/.*AlexNet' ./data

Additionally each test has a test id which can be used as a filter, too.
However, the test id is not guaranteed to be stable when new tests are added.
Only for a specific build the same the test will keep its id.

    ./arm_compute_benchmark --filter-id=10 ./data

All available tests can be displayed with the `--list-tests` switch.

    ./arm_compute_benchmark --list-tests

More options can be found in the `--help` message.

@subsubsection tests_running_tests_benchmarking_runtime Runtime
By default every test is run once on a single thread. The number of iterations
can be controlled via the `--iterations` option and the number of threads via
`--threads`.

@subsubsection tests_running_tests_benchmarking_output Output
By default the benchmarking results are printed in a human readable format on
the command line. The colored output can be disabled via `--no-color-output`.
As an alternative output format JSON is supported and can be selected via
`--log-format=json`. To write the output to a file instead of stdout the
`--log-file` option can be used.

@subsubsection tests_running_tests_benchmarking_mode Mode
Tests contain different datasets of different sizes, some of which will take several hours to run.
You can select which datasets to use by using the `--mode` option, we recommed you use `--mode=precommit` to start with.

@subsubsection tests_running_tests_benchmarking_instruments Instruments
You can use the `--instruments` option to select one or more instruments to measure the execution time of the benchmark tests.

`PMU` will try to read the CPU PMU events from the kernel (They need to be enabled on your platform)

`MALI` will try to collect Mali hardware performance counters. (You need to have a recent enough Mali driver)

`WALL_CLOCK` will measure time using `gettimeofday`: this should work on all platforms.

You can pass a combinations of these instruments: `--instruments=PMU,MALI,WALL_CLOCK`

@note You need to make sure the instruments have been selected at compile time using the `pmu=1` or `mali=1` scons options.

<!-- FIXME: Remove before release and change above to benchmark and validation -->
@subsection tests_running_tests_validation Validation

@note The new validation tests have the same interface as the benchmarking tests.

@subsubsection tests_running_tests_validation_filter Filter tests
All tests can be run by invoking

    ./arm_compute_validation -- ./data

where `./data` contains the assets needed by the tests.

As running all tests can take a lot of time the suite is split into "precommit" and "nightly" tests. The precommit tests will be fast to execute but still cover the most important features. In contrast the nightly tests offer more extensive coverage but take longer. The different subsets can be selected from the command line as follows:

    ./arm_compute_validation -t @precommit -- ./data
    ./arm_compute_validation -t @nightly -- ./data

Additionally it is possible to select specific suites or tests:

    ./arm_compute_validation -t CL -- ./data
    ./arm_compute_validation -t NEON/BitwiseAnd/RunSmall/_0 -- ./data

All available tests can be displayed with the `--list_content` switch.

    ./arm_compute_validation --list_content -- ./data

For a complete list of possible selectors please see: http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/runtime_config/test_unit_filtering.html

@subsubsection tests_running_tests_validation_verbosity Verbosity
There are two separate flags to control the verbosity of the test output. `--report_level` controls the verbosity of the summary produced after all tests have been executed. `--log_level` controls the verbosity of the information generated during the execution of tests. All available settings can be found in the Boost documentation for [--report_level](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/report_level.html) and [--log_level](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/log_level.html), respectively.
<!-- FIXME: end remove -->
*/
} // namespace test
} // namespace arm_compute