From a3077ae1e96e9a3ae690cad2c5497f4d0374635e Mon Sep 17 00:00:00 2001 From: Harrison Mutai Date: Wed, 17 May 2023 13:09:16 +0100 Subject: [PATCH] docs: add Juno runtime instrumentation data Add results from running the TFTF test suite Runtime Instrumentation on Juno. Change-Id: I4c5b64e1a80b5b88e42835f0700294a02edc8032 Signed-off-by: Harrison Mutai --- docs/perf/psci-performance-juno.rst | 229 ++++++++++++++++++++++------ 1 file changed, 179 insertions(+), 50 deletions(-) diff --git a/docs/perf/psci-performance-juno.rst b/docs/perf/psci-performance-juno.rst index 741866922..7a484b88e 100644 --- a/docs/perf/psci-performance-juno.rst +++ b/docs/perf/psci-performance-juno.rst @@ -25,62 +25,189 @@ x Cortex-A57 clusters running at the following frequencies: Juno supports CPU, cluster and system power down states, corresponding to power levels 0, 1 and 2 respectively. It does not support any retention states. -We used the upstream `TF master as of 31/01/2017`_, building the platform using -the ``ENABLE_RUNTIME_INSTRUMENTATION`` option: - -.. code:: shell - - make PLAT=juno ENABLE_RUNTIME_INSTRUMENTATION=1 \ - SCP_BL2= \ - BL33= \ - all fip - -When using the debug build of TF, there was no noticeable difference in the -results. - -The tests are based on an ARM-internal test framework. The release build of this -framework was used because the results in the debug build became skewed; the -console output prevented some of the tests from executing in parallel. - -The tests consist of both parallel and sequential tests, which are broadly -described as follows: - -- **Parallel Tests** This type of test powers on all the non-lead CPUs and - brings them and the lead CPU to a common synchronization point. The lead CPU - then initiates the test on all CPUs in parallel. +Given that runtime instrumentation using PMF is invasive, there is a small +(unquantified) overhead on the results. PMF uses the generic counter for +timestamps, which runs at 50MHz on Juno. -- **Sequential Tests** This type of test powers on each non-lead CPU in - sequence. The lead CPU initiates the test on a non-lead CPU then waits for the - test to complete before proceeding to the next non-lead CPU. The lead CPU then - executes the test on itself. +The following source trees and binaries were used: + +- TF-A [`v2.9-rc0`_] +- TFTF [`v2.9-rc0`_] + +Please see the Runtime Instrumentation `Testing Methodology`_ page for more +details. + +Procedure +--------- + +#. Build TFTF with runtime instrumentation enabled: + + .. code:: shell + + make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ + TESTS=runtime-instrumentation all + +#. Fetch Juno's SCP binary from TF-A's archive: + + .. code:: shell + + curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \ + https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin + +#. Build TF-A with the following build options: + + .. code:: shell + + make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ + BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \ + ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip + +#. Load the following images onto the development board: ``fip.bin``, + ``scp_bl2.bin``. + +Results +------- + +``CPU_SUSPEND`` to deepest power level +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in + parallel + + +---------+------+-----------+---------+-------------+ + | Cluster | Core | Powerdown | Wakekup | Cache Flush | + +=========+======+===========+=========+=============+ + | 0 | 0 | 243.76 | 239.92 | 6.32 | + +---------+------+-----------+---------+-------------+ + | 0 | 1 | 663.5 | 30.32 | 167.82 | + +---------+------+-----------+---------+-------------+ + | 1 | 0 | 105.12 | 22.84 | 5.88 | + +---------+------+-----------+---------+-------------+ + | 1 | 1 | 384.16 | 19.06 | 4.7 | + +---------+------+-----------+---------+-------------+ + | 1 | 2 | 523.98 | 270.46 | 4.74 | + +---------+------+-----------+---------+-------------+ + | 1 | 3 | 950.54 | 220.9 | 89.2 | + +---------+------+-----------+---------+-------------+ + +.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in + serial + + +---------+------+-----------+---------+-------------+ + | Cluster | Core | Powerdown | Wakekup | Cache Flush | + +=========+======+===========+=========+=============+ + | 0 | 0 | 266.96 | 31.74 | 167.92 | + +---------+------+-----------+---------+-------------+ + | 0 | 1 | 266.9 | 31.52 | 167.82 | + +---------+------+-----------+---------+-------------+ + | 1 | 0 | 279.86 | 23.42 | 87.52 | + +---------+------+-----------+---------+-------------+ + | 1 | 1 | 101.38 | 18.8 | 4.64 | + +---------+------+-----------+---------+-------------+ + | 1 | 2 | 101.18 | 19.28 | 4.64 | + +---------+------+-----------+---------+-------------+ + | 1 | 3 | 101.32 | 19.02 | 4.62 | + +---------+------+-----------+---------+-------------+ + +``CPU_SUSPEND`` to power level 0 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in + parallel + + +---------+------+-----------+---------+-------------+ + | Cluster | Core | Powerdown | Wakekup | Cache Flush | + +=========+======+===========+=========+=============+ + +---------+------+-----------+---------+-------------+ + | 0 | 0 | 661.94 | 22.88 | 9.66 | + +---------+------+-----------+---------+-------------+ + | 0 | 1 | 801.64 | 23.38 | 9.62 | + +---------+------+-----------+---------+-------------+ + | 1 | 0 | 105.56 | 16.02 | 8.12 | + +---------+------+-----------+---------+-------------+ + | 1 | 1 | 245.42 | 16.26 | 7.78 | + +---------+------+-----------+---------+-------------+ + | 1 | 2 | 384.42 | 16.1 | 7.84 | + +---------+------+-----------+---------+-------------+ + | 1 | 3 | 523.74 | 15.4 | 8.02 | + +---------+------+-----------+---------+-------------+ + +.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial + + +---------+------+-----------+---------+-------------+ + | Cluster | Core | Powerdown | Wakekup | Cache Flush | + +=========+======+===========+=========+=============+ + | 0 | 0 | 102.16 | 23.64 | 6.7 | + +---------+------+-----------+---------+-------------+ + | 0 | 1 | 101.66 | 23.78 | 6.6 | + +---------+------+-----------+---------+-------------+ + | 1 | 0 | 277.74 | 15.96 | 4.66 | + +---------+------+-----------+---------+-------------+ + | 1 | 1 | 98.0 | 15.88 | 4.64 | + +---------+------+-----------+---------+-------------+ + | 1 | 2 | 97.66 | 15.88 | 4.62 | + +---------+------+-----------+---------+-------------+ + | 1 | 3 | 97.76 | 15.38 | 4.64 | + +---------+------+-----------+---------+-------------+ + +``CPU_OFF`` on all non-lead CPUs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead +core to the deepest power level. + +.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs + + +---------+------+-----------+---------+-------------+ + | Cluster | Core | Powerdown | Wakekup | Cache Flush | + +=========+======+===========+=========+=============+ + | 0 | 0 | 265.38 | 34.12 | 167.36 | + +---------+------+-----------+---------+-------------+ + | 0 | 1 | 265.72 | 33.98 | 167.48 | + +---------+------+-----------+---------+-------------+ + | 1 | 0 | 185.3 | 23.18 | 87.42 | + +---------+------+-----------+---------+-------------+ + | 1 | 1 | 101.58 | 23.46 | 4.48 | + +---------+------+-----------+---------+-------------+ + | 1 | 2 | 101.66 | 22.02 | 4.72 | + +---------+------+-----------+---------+-------------+ + | 1 | 3 | 101.48 | 22.22 | 4.52 | + +---------+------+-----------+---------+-------------+ + +``CPU_VERSION`` in parallel +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores + + +-------------+--------+--------------+ + | Cluster | Core | Latency | + +=============+========+==============+ + | 0 | 0 | 1.22 | + +-------------+--------+--------------+ + | 0 | 1 | 1.2 | + +-------------+--------+--------------+ + | 1 | 0 | 0.6 | + +-------------+--------+--------------+ + | 1 | 1 | 1.08 | + +-------------+--------+--------------+ + | 1 | 2 | 1.04 | + +-------------+--------+--------------+ + | 1 | 3 | 1.04 | + +-------------+--------+--------------+ + +Annotated Historic Results +-------------------------- + +The following results are based on the upstream `TF master as of 31/01/2017`_. +TF-A was built using the same build instructions as detailed in the procedure +above. In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead CPU. -``PSCI_ENTRY`` refers to the time taken from entering the TF PSCI implementation -to the point the hardware enters the low power state (WFI). Referring to the TF -runtime instrumentation points, this corresponds to: -``(RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI)``. - -``PSCI_EXIT`` refers to the time taken from the point the hardware exits the low -power state to exiting the TF PSCI implementation. This corresponds to: -``(RT_INSTR_EXIT_PSCI - RT_INSTR_EXIT_HW_LOW_PWR)``. - -``CFLUSH_OVERHEAD`` refers to the part of ``PSCI_ENTRY`` taken to flush the -caches. This corresponds to: ``(RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH)``. - -Note there is very little variance observed in the values given (~1us), although -the values for each CPU are sometimes interchanged, depending on the order in -which locks are acquired. Also, there is very little variance observed between -executing the tests sequentially in a single boot or rebooting between tests. - -Given that runtime instrumentation using PMF is invasive, there is a small -(unquantified) overhead on the results. PMF uses the generic counter for -timestamps, which runs at 50MHz on Juno. - -Results and Commentary ----------------------- +``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and +``CFLUSH_OVERHEAD`` the latency of the cache flush operation. ``CPU_SUSPEND`` to deepest power level on all CPUs in parallel ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -290,3 +417,5 @@ effects, given that these measurements are at the nano-second level. .. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ .. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d +.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0 +.. _Testing Methodology: ../perf/psci-performance-methodology.html -- 2.39.5