Juno supports CPU, cluster and system power down states, corresponding to power
levels 0, 1 and 2 respectively. It does not support any retention states.
-We used the upstream `TF master as of 31/01/2017`_, building the platform using
-the ``ENABLE_RUNTIME_INSTRUMENTATION`` option:
-
-.. code:: shell
-
- make PLAT=juno ENABLE_RUNTIME_INSTRUMENTATION=1 \
- SCP_BL2=<path/to/scp-fw.bin> \
- BL33=<path/to/test-fw.bin> \
- all fip
-
-When using the debug build of TF, there was no noticeable difference in the
-results.
-
-The tests are based on an ARM-internal test framework. The release build of this
-framework was used because the results in the debug build became skewed; the
-console output prevented some of the tests from executing in parallel.
-
-The tests consist of both parallel and sequential tests, which are broadly
-described as follows:
-
-- **Parallel Tests** This type of test powers on all the non-lead CPUs and
- brings them and the lead CPU to a common synchronization point. The lead CPU
- then initiates the test on all CPUs in parallel.
+Given that runtime instrumentation using PMF is invasive, there is a small
+(unquantified) overhead on the results. PMF uses the generic counter for
+timestamps, which runs at 50MHz on Juno.
-- **Sequential Tests** This type of test powers on each non-lead CPU in
- sequence. The lead CPU initiates the test on a non-lead CPU then waits for the
- test to complete before proceeding to the next non-lead CPU. The lead CPU then
- executes the test on itself.
+The following source trees and binaries were used:
+
+- TF-A [`v2.9-rc0`_]
+- TFTF [`v2.9-rc0`_]
+
+Please see the Runtime Instrumentation `Testing Methodology`_ page for more
+details.
+
+Procedure
+---------
+
+#. Build TFTF with runtime instrumentation enabled:
+
+ .. code:: shell
+
+ make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
+ TESTS=runtime-instrumentation all
+
+#. Fetch Juno's SCP binary from TF-A's archive:
+
+ .. code:: shell
+
+ curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
+ https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
+
+#. Build TF-A with the following build options:
+
+ .. code:: shell
+
+ make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
+ BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
+ ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
+
+#. Load the following images onto the development board: ``fip.bin``,
+ ``scp_bl2.bin``.
+
+Results
+-------
+
+``CPU_SUSPEND`` to deepest power level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
+ parallel
+
+ +---------+------+-----------+---------+-------------+
+ | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+ +=========+======+===========+=========+=============+
+ | 0 | 0 | 243.76 | 239.92 | 6.32 |
+ +---------+------+-----------+---------+-------------+
+ | 0 | 1 | 663.5 | 30.32 | 167.82 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 0 | 105.12 | 22.84 | 5.88 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 1 | 384.16 | 19.06 | 4.7 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 2 | 523.98 | 270.46 | 4.74 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 3 | 950.54 | 220.9 | 89.2 |
+ +---------+------+-----------+---------+-------------+
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
+ serial
+
+ +---------+------+-----------+---------+-------------+
+ | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+ +=========+======+===========+=========+=============+
+ | 0 | 0 | 266.96 | 31.74 | 167.92 |
+ +---------+------+-----------+---------+-------------+
+ | 0 | 1 | 266.9 | 31.52 | 167.82 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 0 | 279.86 | 23.42 | 87.52 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 1 | 101.38 | 18.8 | 4.64 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 2 | 101.18 | 19.28 | 4.64 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 3 | 101.32 | 19.02 | 4.62 |
+ +---------+------+-----------+---------+-------------+
+
+``CPU_SUSPEND`` to power level 0
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
+ parallel
+
+ +---------+------+-----------+---------+-------------+
+ | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+ +=========+======+===========+=========+=============+
+ +---------+------+-----------+---------+-------------+
+ | 0 | 0 | 661.94 | 22.88 | 9.66 |
+ +---------+------+-----------+---------+-------------+
+ | 0 | 1 | 801.64 | 23.38 | 9.62 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 0 | 105.56 | 16.02 | 8.12 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 1 | 245.42 | 16.26 | 7.78 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 2 | 384.42 | 16.1 | 7.84 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 3 | 523.74 | 15.4 | 8.02 |
+ +---------+------+-----------+---------+-------------+
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial
+
+ +---------+------+-----------+---------+-------------+
+ | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+ +=========+======+===========+=========+=============+
+ | 0 | 0 | 102.16 | 23.64 | 6.7 |
+ +---------+------+-----------+---------+-------------+
+ | 0 | 1 | 101.66 | 23.78 | 6.6 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 0 | 277.74 | 15.96 | 4.66 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 1 | 98.0 | 15.88 | 4.64 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 2 | 97.66 | 15.88 | 4.62 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 3 | 97.76 | 15.38 | 4.64 |
+ +---------+------+-----------+---------+-------------+
+
+``CPU_OFF`` on all non-lead CPUs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
+core to the deepest power level.
+
+.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs
+
+ +---------+------+-----------+---------+-------------+
+ | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+ +=========+======+===========+=========+=============+
+ | 0 | 0 | 265.38 | 34.12 | 167.36 |
+ +---------+------+-----------+---------+-------------+
+ | 0 | 1 | 265.72 | 33.98 | 167.48 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 0 | 185.3 | 23.18 | 87.42 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 1 | 101.58 | 23.46 | 4.48 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 2 | 101.66 | 22.02 | 4.72 |
+ +---------+------+-----------+---------+-------------+
+ | 1 | 3 | 101.48 | 22.22 | 4.52 |
+ +---------+------+-----------+---------+-------------+
+
+``CPU_VERSION`` in parallel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores
+
+ +-------------+--------+--------------+
+ | Cluster | Core | Latency |
+ +=============+========+==============+
+ | 0 | 0 | 1.22 |
+ +-------------+--------+--------------+
+ | 0 | 1 | 1.2 |
+ +-------------+--------+--------------+
+ | 1 | 0 | 0.6 |
+ +-------------+--------+--------------+
+ | 1 | 1 | 1.08 |
+ +-------------+--------+--------------+
+ | 1 | 2 | 1.04 |
+ +-------------+--------+--------------+
+ | 1 | 3 | 1.04 |
+ +-------------+--------+--------------+
+
+Annotated Historic Results
+--------------------------
+
+The following results are based on the upstream `TF master as of 31/01/2017`_.
+TF-A was built using the same build instructions as detailed in the procedure
+above.
In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
CPU.
-``PSCI_ENTRY`` refers to the time taken from entering the TF PSCI implementation
-to the point the hardware enters the low power state (WFI). Referring to the TF
-runtime instrumentation points, this corresponds to:
-``(RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI)``.
-
-``PSCI_EXIT`` refers to the time taken from the point the hardware exits the low
-power state to exiting the TF PSCI implementation. This corresponds to:
-``(RT_INSTR_EXIT_PSCI - RT_INSTR_EXIT_HW_LOW_PWR)``.
-
-``CFLUSH_OVERHEAD`` refers to the part of ``PSCI_ENTRY`` taken to flush the
-caches. This corresponds to: ``(RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH)``.
-
-Note there is very little variance observed in the values given (~1us), although
-the values for each CPU are sometimes interchanged, depending on the order in
-which locks are acquired. Also, there is very little variance observed between
-executing the tests sequentially in a single boot or rebooting between tests.
-
-Given that runtime instrumentation using PMF is invasive, there is a small
-(unquantified) overhead on the results. PMF uses the generic counter for
-timestamps, which runs at 50MHz on Juno.
-
-Results and Commentary
-----------------------
+``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
+``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
+.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
+.. _Testing Methodology: ../perf/psci-performance-methodology.html