Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size()
and paca allocation.
The first is that on a Radix capable machine but with "disable_radix" on
the command line, there is a window during early boot where
early_radix_enabled() is true, even though it will later become false.
early_init_devtree: <- early_radix_enabled() = false
early_init_dt_scan_cpus: <- early_radix_enabled() = false
...
check_cpu_pa_features: <- early_radix_enabled() = false
... ^ <- early_radix_enabled() = TRUE
allocate_paca: | <- early_radix_enabled() = TRUE
... |
ppc64_bolted_size: | <- early_radix_enabled() = TRUE
if (early_radix_enabled())| <- early_radix_enabled() = TRUE
return ULONG_MAX; |
... |
... | <- early_radix_enabled() = TRUE
... | <- early_radix_enabled() = TRUE
mmu_early_init_devtree() V
... <- early_radix_enabled() = false
This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's
paca allocation, even though later it will return a different value.
This is not currently a bug because the paca allocation is also limited
by the RMA size, but that is very fragile.
The second issue is that when using the Hash MMU, when we call
ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet
detected whether 1T segments are available. That causes
ppc64_bolted_size() to return 256MB, even if the machine can actually
support up to 1T. This is usually OK, we generally have space below
256MB for one paca, but for a kdump kernel placed above 256MB it causes
the boot to fail.
At boot we cannot discover all the features of the machine
instantaneously, so there will always be some periods where we have
incomplete knowledge of the system. However both the above problems stem
from the fact that we allocate the boot CPU's paca (and paca pointers
array) before we decide which MMU we are using, or discover its exact
features.
Moving the paca allocation slightly later still can solve both the
issues described above, and means for a normal boot we don't do any
permanent allocations until after we've discovered the MMU.
Note that although we move the boot CPU's paca allocation later, we
still have a temporary paca (boot_paca) accessible via r13, so code that
does read only access to paca fields is safe. The only risk is that some
code writes to the boot_paca, and that write will then be lost when we
switch away from the boot_paca later in early_setup().
The additional code that runs before the paca allocation is primarily
mmu_early_init_devtree(), which is scanning the device tree and
populating globals and cur_cpu_spec with MMU related flags. I do not see
any additional code that writes to paca fields.
[1]: https://lore.kernel.org/r/
20211018084434.217772-2-sourabhjain@linux.ibm.com
[2]: https://lore.kernel.org/r/
20211018084434.217772-3-sourabhjain@linux.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
be32_to_cpu(intserv[found_thread]));
boot_cpuid = found;
+ // Pass the boot CPU's hard CPU id back to our caller
+ *((u32 *)data) = be32_to_cpu(intserv[found_thread]);
+
/*
* PAPR defines "logical" PVR values for cpus that
* meet various levels of the architecture:
cur_cpu_spec->cpu_features &= ~CPU_FTR_SMT;
else if (!dt_cpu_ftrs_in_use())
cur_cpu_spec->cpu_features |= CPU_FTR_SMT;
- allocate_paca(boot_cpuid);
#endif
- set_hard_smp_processor_id(found, be32_to_cpu(intserv[found_thread]));
return 0;
}
void __init early_init_devtree(void *params)
{
+ u32 boot_cpu_hwid;
phys_addr_t limit;
DBG(" -> early_init_devtree(%px)\n", params);
* FIXME .. and the initrd too? */
move_device_tree();
- allocate_paca_ptrs();
-
DBG("Scanning CPUs ...\n");
dt_cpu_ftrs_scan();
/* Retrieve CPU related informations from the flat tree
* (altivec support, boot CPU ID, ...)
*/
- of_scan_flat_dt(early_init_dt_scan_cpus, NULL);
+ of_scan_flat_dt(early_init_dt_scan_cpus, &boot_cpu_hwid);
if (boot_cpuid < 0) {
printk("Failed to identify boot CPU !\n");
BUG();
mmu_early_init_devtree();
+ // NB. paca is not installed until later in early_setup()
+ allocate_paca_ptrs();
+ allocate_paca(boot_cpuid);
+ set_hard_smp_processor_id(boot_cpuid, boot_cpu_hwid);
+
#ifdef CONFIG_PPC_POWERNV
/* Scan and build the list of machine check recoverable ranges */
of_scan_flat_dt(early_init_dt_scan_recoverable_ranges, NULL);