The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online. Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.
When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes. The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array. This results in:
[ 0.881970] Unable to handle kernel paging request at virtual address
fffffb1008b926a4
[ 1.970095] pgd =
fffffc00094b0000
[ 1.973530] [
fffffb1008b926a4] *pgd=
0000000000000000, *pud=
0000000000000000, *pmd=
0000000000000000
[ 1.982610] Internal error: Oops:
96000004 [#1] SMP
[ 1.987541] Modules linked in:
[ 1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: G W 4.8.0-rc6-preempt-vol+ #9
[ 1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[ 2.005159] task:
fffffe0fe89cc300 task.stack:
fffffe0fe8b8c000
[ 2.011158] PC is at try_to_wake_up+0x194/0x34c
[ 2.015737] LR is at try_to_wake_up+0x150/0x34c
[ 2.020318] pc : [<
fffffc00080e7468>] lr : [<
fffffc00080e7424>] pstate:
600000c5
[ 2.027803] sp :
fffffe0fe8b8fb10
[ 2.031149] x29:
fffffe0fe8b8fb10 x28:
0000000000000000
[ 2.036522] x27:
fffffc0008c63bc8 x26:
0000000000001000
[ 2.041896] x25:
fffffc0008c63c80 x24:
fffffc0008bfb200
[ 2.047270] x23:
00000000000000c0 x22:
0000000000000004
[ 2.052642] x21:
fffffe0fe89d25bc x20:
0000000000001000
[ 2.058014] x19:
fffffe0fe89d1d00 x18:
0000000000000000
[ 2.063386] x17:
0000000000000000 x16:
0000000000000000
[ 2.068760] x15:
0000000000000018 x14:
0000000000000000
[ 2.074133] x13:
0000000000000000 x12:
0000000000000000
[ 2.079505] x11:
0000000000000000 x10:
0000000000000000
[ 2.084879] x9 :
0000000000000000 x8 :
0000000000000000
[ 2.090251] x7 :
0000000000000040 x6 :
0000000000000000
[ 2.095621] x5 :
ffffffffffffffff x4 :
0000000000000000
[ 2.100991] x3 :
0000000000000000 x2 :
0000000000000000
[ 2.106364] x1 :
fffffc0008be4c24 x0 :
ffffff0ffffada80
[ 2.111737]
[ 2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfffffe0fe8b8c020)
[ 2.120102] Stack: (0xfffffe0fe8b8fb10 to 0xfffffe0fe8b90000)
[ 2.125914] fb00:
fffffe0fe8b8fb80 fffffc00080e7648
.
.
.
[ 2.442859] Call trace:
[ 2.445327] Exception stack(0xfffffe0fe8b8f940 to 0xfffffe0fe8b8fa70)
[ 2.451843] f940:
fffffe0fe89d1d00 0000040000000000 fffffe0fe8b8fb10 fffffc00080e7468
[ 2.459767] f960:
fffffe0fe8b8f980 fffffc00080e4958 ffffff0ff91ab200 fffffc00080e4b64
[ 2.467690] f980:
fffffe0fe8b8f9d0 fffffc00080e515c fffffe0fe8b8fa80 0000000000000000
[ 2.475614] f9a0:
fffffe0fe8b8f9d0 fffffc00080e58e4 fffffe0fe8b8fa80 0000000000000000
[ 2.483540] f9c0:
fffffe0fe8d10000 0000000000000040 fffffe0fe8b8fa50 fffffc00080e5ac4
[ 2.491465] f9e0:
ffffff0ffffada80 fffffc0008be4c24 0000000000000000 0000000000000000
[ 2.499387] fa00:
0000000000000000 ffffffffffffffff 0000000000000000 0000000000000040
[ 2.507309] fa20:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.515233] fa40:
0000000000000000 0000000000000000 0000000000000000 0000000000000018
[ 2.523156] fa60:
0000000000000000 0000000000000000
[ 2.528089] [<
fffffc00080e7468>] try_to_wake_up+0x194/0x34c
[ 2.533723] [<
fffffc00080e7648>] wake_up_process+0x28/0x34
[ 2.539275] [<
fffffc00080d3764>] create_worker+0x110/0x19c
[ 2.544824] [<
fffffc00080d69dc>] alloc_unbound_pwq+0x3cc/0x4b0
[ 2.550724] [<
fffffc00080d6bcc>] wq_update_unbound_numa+0x10c/0x1e4
[ 2.557066] [<
fffffc00080d7d78>] workqueue_online_cpu+0x220/0x28c
[ 2.563234] [<
fffffc00080bd288>] cpuhp_invoke_callback+0x6c/0x168
[ 2.569398] [<
fffffc00080bdf74>] cpuhp_up_callbacks+0x44/0xe4
[ 2.575210] [<
fffffc00080be194>] cpuhp_thread_fun+0x13c/0x148
[ 2.581027] [<
fffffc00080dfbac>] smpboot_thread_fn+0x19c/0x1a8
[ 2.586929] [<
fffffc00080dbd64>] kthread+0xdc/0xf0
[ 2.591776] [<
fffffc0008083380>] ret_from_fork+0x10/0x50
[ 2.597147] Code:
b00057e1 91304021 91005021 b8626822 (
b8606821)
[ 2.603464] ---[ end trace
58c0cd36b88802bc ]---
[ 2.608138] Kernel panic - not syncing: Fatal exception
Fix by moving call to numa_store_cpu_info() for all CPUs into
smp_prepare_cpus(), which happens before wq_numa_init(). Since
smp_store_cpu_info() now contains only a single function call,
simplify by removing the function and out-lining its contents.
Suggested-by: Robert Richter <rric@kernel.org>
Fixes: d04da22c1066 ("arm64, numa: Add NUMA support for arm64 platforms.")
Cc: <stable@vger.kernel.org> # 4.7.x-
Signed-off-by: David Daney <david.daney@cavium.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Tested-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
return ret;
}
-static void smp_store_cpu_info(unsigned int cpuid)
-{
- store_cpu_topology(cpuid);
- numa_store_cpu_info(cpuid);
-}
-
/*
* This is the secondary CPU boot entry. We're using this CPUs
* idle thread stack, but a set of temporary page tables.
*/
notify_cpu_starting(cpu);
- smp_store_cpu_info(cpu);
+ store_cpu_topology(cpu);
/*
* OK, now it's safe to let the boot CPU continue. Wait for
{
int err;
unsigned int cpu;
+ unsigned int this_cpu;
init_cpu_topology();
- smp_store_cpu_info(smp_processor_id());
+ this_cpu = smp_processor_id();
+ store_cpu_topology(this_cpu);
+ numa_store_cpu_info(this_cpu);
/*
* If UP is mandated by "nosmp" (which implies "maxcpus=0"), don't set
continue;
set_cpu_present(cpu, true);
+ numa_store_cpu_info(cpu);
}
}