arm64: cmpxchg_double*: hazard against entire exchange variable
[ Upstream commit
031af50045ea97ed4386eb3751ca2c134d0fc911 ]
The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.
GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.
This is similar to what we fixed back in commit:
6026c9bade14e865 ("arm64: xchg: hazard against entire exchange variable")
... but we forgot to adjust cmpxchg_double*() similarly at the same
time.
The same problem applies, as demonstrated with the following test:
| struct big {
| u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
| u64 hi_old, hi_new;
|
| hi_old = b->hi;
| cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
| hi_new = b->hi;
|
| return hi_old ^ hi_new;
| }
... which GCC 12.1.0 compiles as:
|
0000000000000000 <foo>:
| 0:
d503233f paciasp
| 4:
aa0003e4 mov x4, x0
| 8:
1400000e b 40 <foo+0x40>
| c:
d2800240 mov x0, #0x12 // #18
| 10:
d2800681 mov x1, #0x34 // #52
| 14:
aa0003e5 mov x5, x0
| 18:
aa0103e6 mov x6, x1
| 1c:
d2800ac2 mov x2, #0x56 // #86
| 20:
d2800f03 mov x3, #0x78 // #120
| 24:
48207c82 casp x0, x1, x2, x3, [x4]
| 28:
ca050000 eor x0, x0, x5
| 2c:
ca060021 eor x1, x1, x6
| 30:
aa010000 orr x0, x0, x1
| 34:
d2800000 mov x0, #0x0 // #0 <--- BANG
| 38:
d50323bf autiasp
| 3c:
d65f03c0 ret
| 40:
d2800240 mov x0, #0x12 // #18
| 44:
d2800681 mov x1, #0x34 // #52
| 48:
d2800ac2 mov x2, #0x56 // #86
| 4c:
d2800f03 mov x3, #0x78 // #120
| 50:
f9800091 prfm pstl1strm, [x4]
| 54:
c87f1885 ldxp x5, x6, [x4]
| 58:
ca0000a5 eor x5, x5, x0
| 5c:
ca0100c6 eor x6, x6, x1
| 60:
aa0600a6 orr x6, x5, x6
| 64:
b5000066 cbnz x6, 70 <foo+0x70>
| 68:
c8250c82 stxp w5, x2, x3, [x4]
| 6c:
35ffff45 cbnz w5, 54 <foo+0x54>
| 70:
d2800000 mov x0, #0x0 // #0 <--- BANG
| 74:
d50323bf autiasp
| 78:
d65f03c0 ret
Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().
This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.
With this change, GCC 12.1.0 compiles the above test as:
|
0000000000000000 <foo>:
| 0:
f9400407 ldr x7, [x0, #8]
| 4:
d503233f paciasp
| 8:
aa0003e4 mov x4, x0
| c:
1400000f b 48 <foo+0x48>
| 10:
d2800240 mov x0, #0x12 // #18
| 14:
d2800681 mov x1, #0x34 // #52
| 18:
aa0003e5 mov x5, x0
| 1c:
aa0103e6 mov x6, x1
| 20:
d2800ac2 mov x2, #0x56 // #86
| 24:
d2800f03 mov x3, #0x78 // #120
| 28:
48207c82 casp x0, x1, x2, x3, [x4]
| 2c:
ca050000 eor x0, x0, x5
| 30:
ca060021 eor x1, x1, x6
| 34:
aa010000 orr x0, x0, x1
| 38:
f9400480 ldr x0, [x4, #8]
| 3c:
d50323bf autiasp
| 40:
ca0000e0 eor x0, x7, x0
| 44:
d65f03c0 ret
| 48:
d2800240 mov x0, #0x12 // #18
| 4c:
d2800681 mov x1, #0x34 // #52
| 50:
d2800ac2 mov x2, #0x56 // #86
| 54:
d2800f03 mov x3, #0x78 // #120
| 58:
f9800091 prfm pstl1strm, [x4]
| 5c:
c87f1885 ldxp x5, x6, [x4]
| 60:
ca0000a5 eor x5, x5, x0
| 64:
ca0100c6 eor x6, x6, x1
| 68:
aa0600a6 orr x6, x5, x6
| 6c:
b5000066 cbnz x6, 78 <foo+0x78>
| 70:
c8250c82 stxp w5, x2, x3, [x4]
| 74:
35ffff45 cbnz w5, 5c <foo+0x5c>
| 78:
f9400480 ldr x0, [x4, #8]
| 7c:
d50323bf autiasp
| 80:
ca0000e0 eor x0, x7, x0
| 84:
d65f03c0 ret
... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.
For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.
I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.
Fixes: 3e190551a196 ("arm64: xchg: Implement cmpxchg_double")
Fixes: 4bc5e410420c ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>