kernel.git
2 years agopowerpc/inst: Add __copy_inst_from_kernel_nofault()
Christophe Leroy [Mon, 9 May 2022 05:36:18 +0000 (07:36 +0200)]
powerpc/inst: Add __copy_inst_from_kernel_nofault()

On the same model as get_user() versus __get_user(),
introduce __copy_inst_from_kernel_nofault() which doesn't
check address.

To be used by callers that have already checked that the adress
is a kernel address.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1f3702890d6dbd64702b61834753bcc96851c18c.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Minimise number of #ifdefs
Christophe Leroy [Mon, 9 May 2022 05:36:17 +0000 (07:36 +0200)]
powerpc/ftrace: Minimise number of #ifdefs

A lot of #ifdefs can be replaced by IS_ENABLED()

Do so.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fold in changes suggested by Naveen and Christophe on list]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18ce6708d6f8c71d87436f9c6019f04df4125128.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Simplify expected_nop_sequence()
Christophe Leroy [Mon, 9 May 2022 05:36:16 +0000 (07:36 +0200)]
powerpc/ftrace: Simplify expected_nop_sequence()

Avoid ifdefs around expected_nop_sequence().

While at it make it a bool.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/305d22472f1f92127fba09692df6bb5d079a8cd0.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use size macro instead of opencoding
Christophe Leroy [Mon, 9 May 2022 05:36:15 +0000 (07:36 +0200)]
powerpc/ftrace: Use size macro instead of opencoding

0x80000000 is SZ_2G. Use it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix comparison against unsigned -SZ_2G]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bb6626e884acffe87b58736291df57db3deaa9b9.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding.
Christophe Leroy [Mon, 9 May 2022 05:36:14 +0000 (07:36 +0200)]
powerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding.

PPC_RAW_xxx() macros are self explanatory and less error prone
than open coding.

Use them in ftrace.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9292094c9a69cef6d29ee83f435a557b59c45065.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use BRANCH_SET_LINK instead of value 1
Christophe Leroy [Mon, 9 May 2022 05:36:13 +0000 (07:36 +0200)]
powerpc/ftrace: Use BRANCH_SET_LINK instead of value 1

To make it explicit, use BRANCH_SET_LINK instead of value 1
when calling create_branch().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d57847063ac93660a5af620d4df1847f10edf61a.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Remove ftrace_plt_tramps[]
Christophe Leroy [Mon, 9 May 2022 05:36:12 +0000 (07:36 +0200)]
powerpc/ftrace: Remove ftrace_plt_tramps[]

ftrace_plt_tramps table is never filled so it is useless.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daeeb618a6619e3a7e3f82f1bd83ca7c25af6330.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE
Christophe Leroy [Mon, 9 May 2022 05:36:11 +0000 (07:36 +0200)]
powerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE

Since commit 16b642fcedb5 ("powerpc: Only support DYNAMIC_FTRACE not
static"), CONFIG_DYNAMIC_FTRACE is always selected when
CONFIG_FUNCTION_TRACER is selected.

To avoid confusion and have the reader wonder what's happen when
CONFIG_FUNCTION_TRACER is selected and CONFIG_DYNAMIC_FTRACE is not,
use CONFIG_FUNCTION_TRACER in ifdefs instead of CONFIG_DYNAMIC_FTRACE.

As CONFIG_FUNCTION_GRAPH_TRACER depends on CONFIG_FUNCTION_TRACER,
ftrace.o doesn't need to appear for both symbols in Makefile.

Then as ftrace.o is built only when CONFIG_FUNCTION_TRACER is selected
ifdef CONFIG_FUNCTION_TRACER is not needed in ftrace.c, and since it
implies CONFIG_DYNAMIC_FTRACE, CONFIG_DYNAMIC_FTRACE is not needed
in ftrace.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/628d357503eb90b4a034f99b7df516caaff4d279.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS
Christophe Leroy [Mon, 9 May 2022 05:36:10 +0000 (07:36 +0200)]
powerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS

Since commit 760de60d077a ("powerpc/syscalls: Fix syscall tracing")
ftrace.o is not needed anymore for CONFIG_FTRACE_SYSCALLS.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/275932a5d61543b825ff9a64f61abed6da5d4a2a.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64
Christophe Leroy [Mon, 9 May 2022 05:36:09 +0000 (07:36 +0200)]
powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64

Since a36a7ff2dd16 ("powerpc/ftrace: Add module_trampoline_target()
for PPC32"), __ftrace_make_nop() for PPC32 is very similar to the
one for PPC64.

Same for __ftrace_make_call().

Make them common.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f53c237316dab4b1b8c682685266faa92da816.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Finalise cleanup around ABI use
Christophe Leroy [Mon, 9 May 2022 05:36:08 +0000 (07:36 +0200)]
powerpc: Finalise cleanup around ABI use

Now that we have CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2,
get rid of all indirect detection of ABI version.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/709d9d69523c14c8a9fba4486395dca0f2d675b1.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2}
Christophe Leroy [Mon, 9 May 2022 05:36:07 +0000 (07:36 +0200)]
powerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2}

Replace all uses of PPC64_ELF_ABI_v1 and PPC64_ELF_ABI_v2 by
resp CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba13d59e8c50bc9aa6328f1c7f0c0d0278e0a3a7.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2
Christophe Leroy [Mon, 9 May 2022 05:36:06 +0000 (07:36 +0200)]
powerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2

At the time being, we use CONFIG_CPU_LITTLE_ENDIAN and
CONFIG_CPU_BIG_ENDIAN to pass -mabi=elfv1 or elfv2 to
compiler, then define a PPC64_ELF_ABI_v1 or PPC64_ELF_ABI_v2
macro in asm/types.h based on _CALL_ELF define set by the compiler.

Make it more straight forward with a CONFIG option that
is directly usable.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1eca1addbc550167da9841c7340a010d0c4b2200.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use patch_instruction() return directly
Christophe Leroy [Mon, 9 May 2022 05:36:05 +0000 (07:36 +0200)]
powerpc/ftrace: Use patch_instruction() return directly

Instead of returning -EPERM when patch_instruction() fails,
just return what patch_instruction returns.

That simplifies ftrace_modify_code():

   0: 94 21 ff c0  stwu    r1,-64(r1)
   4: 93 e1 00 3c  stw     r31,60(r1)
   8: 7c 7f 1b 79  mr.     r31,r3
   c: 40 80 00 30  bge     3c <ftrace_modify_code+0x3c>
  10: 93 c1 00 38  stw     r30,56(r1)
  14: 7c 9e 23 78  mr      r30,r4
  18: 7c a4 2b 78  mr      r4,r5
  1c: 80 bf 00 00  lwz     r5,0(r31)
  20: 7c 1e 28 40  cmplw   r30,r5
  24: 40 82 00 34  bne     58 <ftrace_modify_code+0x58>
  28: 83 c1 00 38  lwz     r30,56(r1)
  2c: 7f e3 fb 78  mr      r3,r31
  30: 83 e1 00 3c  lwz     r31,60(r1)
  34: 38 21 00 40  addi    r1,r1,64
  38: 48 00 00 00  b       38 <ftrace_modify_code+0x38>
38: R_PPC_REL24 patch_instruction

Before:

   0: 94 21 ff c0  stwu    r1,-64(r1)
   4: 93 e1 00 3c  stw     r31,60(r1)
   8: 7c 7f 1b 79  mr.     r31,r3
   c: 40 80 00 4c  bge     58 <ftrace_modify_code+0x58>
  10: 93 c1 00 38  stw     r30,56(r1)
  14: 7c 9e 23 78  mr      r30,r4
  18: 7c a4 2b 78  mr      r4,r5
  1c: 80 bf 00 00  lwz     r5,0(r31)
  20: 7c 08 02 a6  mflr    r0
  24: 90 01 00 44  stw     r0,68(r1)
  28: 7c 1e 28 40  cmplw   r30,r5
  2c: 40 82 00 48  bne     74 <ftrace_modify_code+0x74>
  30: 7f e3 fb 78  mr      r3,r31
  34: 48 00 00 01  bl      34 <ftrace_modify_code+0x34>
34: R_PPC_REL24 patch_instruction
  38: 80 01 00 44  lwz     r0,68(r1)
  3c: 20 63 00 00  subfic  r3,r3,0
  40: 83 c1 00 38  lwz     r30,56(r1)
  44: 7c 63 19 10  subfe   r3,r3,r3
  48: 7c 08 03 a6  mtlr    r0
  4c: 83 e1 00 3c  lwz     r31,60(r1)
  50: 38 21 00 40  addi    r1,r1,64
  54: 4e 80 00 20  blr

It improves ftrace activation/deactivation duration by about 3%.

Modify patch_instruction() return on failure to -EPERM in order to
match with ftrace expectations. Other users of patch_instruction()
do not care about the exact error value returned.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/49a8597230713e2633e7d9d7b56140787c4a7e20.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Inline ftrace_modify_code()
Christophe Leroy [Mon, 9 May 2022 05:36:04 +0000 (07:36 +0200)]
powerpc/ftrace: Inline ftrace_modify_code()

Inlining ftrace_modify_code(), it increases a bit the
size of ftrace code but brings 5% improvment on ftrace
activation.

Usually in C files we let gcc decide what to do but here
it really help to 'help' gcc to decide to inline, thought
we don't want to force it with an __always_inline that
would be too much for CONFIG_CC_OPTIMIZE_FOR_SIZE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1597a06d57cfc80e6853838c4066e799bf6c7977.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/code-patching: Inline create_branch()
Christophe Leroy [Mon, 9 May 2022 05:36:03 +0000 (07:36 +0200)]
powerpc/code-patching: Inline create_branch()

create_branch() is a good candidate for inlining because:
- Flags can be folded in.
- Range tests are likely to be already done.

Hence reducing the create_branch() to only a set of instructions.

So inline it.

It improves ftrace activation by 10%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69851cc9a7bf8f03d025e6d29e165f2d0bd3bb6e.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use is_offset_in_branch_range()
Christophe Leroy [Mon, 9 May 2022 05:36:02 +0000 (07:36 +0200)]
powerpc/ftrace: Use is_offset_in_branch_range()

Use is_offset_in_branch_range() instead of create_branch()
to check if a target is within branch range.

This patch together with the previous one improves
ftrace activation time by 7%

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/912ae51782f5a53c44e435497c8c3fb5cc632387.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/code-patching: Inline is_offset_in_{cond}_branch_range()
Christophe Leroy [Mon, 9 May 2022 05:36:01 +0000 (07:36 +0200)]
powerpc/code-patching: Inline is_offset_in_{cond}_branch_range()

Test in is_offset_in_branch_range() and is_offset_in_cond_branch_range()
are simple tests that are worth inlining.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a05be0ccb7373e6a9789a1988fcd0c810f5f9269.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Remove redundant create_branch() calls
Christophe Leroy [Mon, 9 May 2022 05:36:00 +0000 (07:36 +0200)]
powerpc/ftrace: Remove redundant create_branch() calls

Since commit 55afba3325e1 ("powerpc/code-patching: Fix patch_branch()
return on out-of-range failure") patch_branch() fails with -ERANGE
when trying to branch out of range.

No need to perform the test twice. Remove redundant create_branch()
calls.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa45fbad0b4b7493080835d8276c0cb4ce146503.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Refactor prepare_ftrace_return()
Christophe Leroy [Mon, 9 May 2022 05:35:59 +0000 (07:35 +0200)]
powerpc/ftrace: Refactor prepare_ftrace_return()

When we have CONFIG_DYNAMIC_FTRACE_WITH_ARGS,
prepare_ftrace_return() is called by ftrace_graph_func()
otherwise prepare_ftrace_return() is called from assembly.

Refactor prepare_ftrace_return() into a static
__prepare_ftrace_return() that will be called by both
prepare_ftrace_return() and ftrace_graph_func().

It will allow GCC to fold __prepare_ftrace_return() inside
ftrace_graph_func().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d42deafe353980c66cf19d3132638c05ba9f4a9.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/rtas: enture rtas_call is called with MMU enabled
Nicholas Piggin [Tue, 8 Mar 2022 13:50:46 +0000 (23:50 +1000)]
powerpc/rtas: enture rtas_call is called with MMU enabled

rtas_call must not be called with the MMU disabled because in case
of rtas error, log_error is called which requires MMU enabled. Add
a test and warning for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-14-npiggin@gmail.com
2 years agopowerpc/rtas: Leave MSR[RI] enabled over RTAS call
Nicholas Piggin [Tue, 8 Mar 2022 13:50:42 +0000 (23:50 +1000)]
powerpc/rtas: Leave MSR[RI] enabled over RTAS call

PAPR specifies that RTAS may be called with MSR[RI] enabled if the
calling context is recoverable, and RTAS will manage RI as necessary.
Call the rtas entry point with RI enabled, and add a check to ensure
the caller has RI enabled.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-10-npiggin@gmail.com
2 years agopowerpc/rtas: PACA can be restored directly from SPRG
Nicholas Piggin [Tue, 8 Mar 2022 13:50:40 +0000 (23:50 +1000)]
powerpc/rtas: PACA can be restored directly from SPRG

On 64-bit, PACA is saved in a SPRG so it does not need to be saved on
stack. We also don't need to mask off the top bits for real mode
addresses because the architecture does this for us.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-8-npiggin@gmail.com
2 years agopowerpc/rtas: Call enter_rtas with MSR[EE] disabled
Nicholas Piggin [Tue, 8 Mar 2022 13:50:37 +0000 (23:50 +1000)]
powerpc/rtas: Call enter_rtas with MSR[EE] disabled

Disable MSR[EE] in C code rather than asm.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-5-npiggin@gmail.com
2 years agopowerpc/rtas: Fix whitespace in rtas_entry.S
Nicholas Piggin [Tue, 8 Mar 2022 13:50:36 +0000 (23:50 +1000)]
powerpc/rtas: Fix whitespace in rtas_entry.S

The code was moved verbatim including whitespace cruft. Fix that.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-4-npiggin@gmail.com
2 years agopowerpc/rtas: Make enter_rtas a nokprobe symbol on 64-bit
Nicholas Piggin [Tue, 8 Mar 2022 13:50:35 +0000 (23:50 +1000)]
powerpc/rtas: Make enter_rtas a nokprobe symbol on 64-bit

This symbol is marked nokprobe on 32-bit but not 64-bit, add it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-3-npiggin@gmail.com
2 years agopowerpc/rtas: Move rtas entry assembly into its own file
Nicholas Piggin [Tue, 8 Mar 2022 13:50:34 +0000 (23:50 +1000)]
powerpc/rtas: Move rtas entry assembly into its own file

This makes working on the code a bit easier.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-2-npiggin@gmail.com
2 years agopowerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZ
Nicholas Piggin [Mon, 7 Mar 2022 18:27:34 +0000 (04:27 +1000)]
powerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZ

Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to
dynamically size stack allocations in a manner forward-compatible with
new processor state saved in the signal frame

For now these statically find the maximum signal frame size rather than
doing any runtime testing of features to minimise the size.

glibc 2.34 will take advantage of this, as will applications that use
use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
References: 9c59c095080b ("arm64: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220307182734.289289-2-npiggin@gmail.com
2 years agopowerpc/64: Bump SIGSTKSZ and MINSIGSTKSZ
Nicholas Piggin [Mon, 7 Mar 2022 18:27:33 +0000 (04:27 +1000)]
powerpc/64: Bump SIGSTKSZ and MINSIGSTKSZ

The sad tale of SIGSTKSZ and MINSIGSTKSZ is documented in glibc.git
commit f7c399cff5bd ("PowerPC SIGSTKSZ"), which explains why glibc
does not use the kernel defines for these constants.

Since then in fact there has been a further expansion of the signal
stack frame size on little-endian with linux commit
1db982d5dc65 ("powerpc: Increase stack redzone for 64-bit userspace to
512 bytes"), which has caused it to exceed even the glibc defines.

See kernel commit 22e607f57960 ("powerpc: Allow 4224 bytes of stack
expansion for the signal frame") for more details of the history of the
expansion.

Increase MINSIGSTKSZ to 8192 which is double the current glibc value and
fits the current stack frame with room to grow. SIGSTKSZ is set to 4x
the minimum as convention.

glibc will have to be updated as well.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220307182734.289289-1-npiggin@gmail.com
2 years agopowerpc/vdso: Link with ld.lld when requested
Nathan Chancellor [Wed, 11 May 2022 18:50:01 +0000 (11:50 -0700)]
powerpc/vdso: Link with ld.lld when requested

The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
kernel, which uses $(LD) directly. As a result, the default linker of
the compiler is used, which may differ from the linker requested by the
builder. For example:

  $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
  ...

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [     0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [     0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
because "ld" is the default linker for clang on most Linux platforms.

This is a problem for Clang's Link Time Optimization as implemented in
the kernel because use of GNU ld with LTO requires the LLVMgold plugin,
which is not technically supported for ld.bfd per
https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is
missing from a user's system, the build will fail, even though LTO as it
is implemented in the kernel requires ld.lld to avoid this dependency in
the first place.

Ultimately, the PowerPC vDSO should be converted to compiling and
linking with $(CC) and $(LD) respectively but there were issues last
time this was tried, potentially due to older but supported tool
versions. To avoid regressing GCC + binutils, use the compiler option
'-fuse-ld', which tells the compiler which linker to use when it is
invoked as both the compiler and linker. Use '-fuse-ld=lld' when
LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [     0] Linker: LLD 14.0.0
  [    14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [     0] Linker: LLD 14.0.0
  [    14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.

Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
Link: https://lore.kernel.org/r/20220511185001.3269404-3-nathan@kernel.org
2 years agopowerpc/vdso: Remove unused ENTRY in linker scripts
Nathan Chancellor [Wed, 11 May 2022 18:50:00 +0000 (11:50 -0700)]
powerpc/vdso: Remove unused ENTRY in linker scripts

When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:

  ld.lld: warning: cannot find entry symbol _start; not setting start address
  ld.lld: warning: cannot find entry symbol _start; not setting start address

Looking at GCC + GNU ld, the entry point address is 0x0:

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
    Entry point address:               0x0
  File: vdso64.so.dbg
    Entry point address:               0x0

This matches what ld.lld emits:

  $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg

  File: vdso32.so.dbg

  String dump of section '.comment':
    [     0]  Linker: LLD 14.0.0
    [    14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: vdso64.so.dbg

  String dump of section '.comment':
    [     0]  Linker: LLD 14.0.0
    [    14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
    Entry point address:               0x0
  File: vdso64.so.dbg
    Entry point address:               0x0

Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220511185001.3269404-2-nathan@kernel.org
2 years agopowerpc: Export mmu_feature_keys[] as non-GPL
Kevin Hao [Tue, 29 Mar 2022 08:57:09 +0000 (16:57 +0800)]
powerpc: Export mmu_feature_keys[] as non-GPL

When the mmu_feature_keys[] was introduced in the commit c0f90d2745f3
("powerpc: Add option to use jump label for mmu_has_feature()"),
it is unlikely that it would be used either directly or indirectly in
the out of tree modules. So we exported it as GPL only.

But with the evolution of the codes, especially the PPC_KUAP support, it
may be indirectly referenced by some primitive macro or inline functions
such as get_user() or __copy_from_user_inatomic(), this will make it
impossible to build many non GPL modules (such as ZFS) on ppc
architecture. Fix this by exposing the mmu_feature_keys[] to the non-GPL
modules too.

Fixes: fe40a6bcea3a ("powerpc/64s/kuap: Use mmu_has_feature()")
Reported-by: Nathaniel Filardo <nwfilardo@gmail.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220329085709.4132729-1-haokexin@gmail.com
2 years agopowerpc/setup: Refactor/untangle panic notifiers
Guilherme G. Piccoli [Wed, 27 Apr 2022 22:49:02 +0000 (19:49 -0300)]
powerpc/setup: Refactor/untangle panic notifiers

The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220427224924.592546-9-gpiccoli@igalia.com
2 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Thu, 19 May 2022 13:10:42 +0000 (23:10 +1000)]
Merge branch 'topic/ppc-kvm' into next

Merge our KVM topic branch.

2 years agoKVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
Fabiano Rosas [Mon, 28 Mar 2022 21:58:31 +0000 (18:58 -0300)]
KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint

We removed most of the vcore logic from the P9 path but there's still
a tracepoint that tried to dereference vc->runner.

Fixes: db4e28aadb40 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
2 years agoKVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers
Alexey Kardashevskiy [Mon, 9 May 2022 07:11:50 +0000 (17:11 +1000)]
KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers

Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers
never execute on real P9 and later CPUs.

This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
2 years agoKVM: PPC: Book3s: PR: Enable default TCE hypercalls
Alexey Kardashevskiy [Fri, 6 May 2022 07:37:37 +0000 (17:37 +1000)]
KVM: PPC: Book3s: PR: Enable default TCE hypercalls

When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE
were already implemented and enabled by default; however H_GET_TCE
was missed out on PR KVM (probably because the handler was in
the real mode code at the time).

This enables H_GET_TCE by default. While at this, this wraps
the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506073737.3823347-1-aik@ozlabs.ru
2 years agoKVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
Alexey Kardashevskiy [Fri, 6 May 2022 05:37:55 +0000 (15:37 +1000)]
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers

LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.

The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.

If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.

On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.

So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.

This removes the real mode handlers from KVM and related code from
the powernv platform.

This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.

This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2 years agoMerge branch 'fixes' into topic/ppc-kvm
Michael Ellerman [Wed, 18 May 2022 14:43:04 +0000 (00:43 +1000)]
Merge branch 'fixes' into topic/ppc-kvm

Merge our fixes branch. In parciular this brings in the KVM TCE handling
fix, which is a prerequisite for a subsequent patch.

2 years agoKVM: PPC: Book3S HV: Initialize AMOR in nested entry
Fabiano Rosas [Mon, 25 Apr 2022 14:21:51 +0000 (11:21 -0300)]
KVM: PPC: Book3S HV: Initialize AMOR in nested entry

The hypervisor always sets AMOR to ~0, but let's ensure we're not
passing stale values around.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
2 years agoMerge branch 'fixes' into next
Michael Ellerman [Wed, 18 May 2022 14:11:51 +0000 (00:11 +1000)]
Merge branch 'fixes' into next

Merge our fixes branch from this cycle. In particular this brings in a
papr_scm.c change which a subsequent patch has a dependency on.

2 years agoKVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()
Bo Liu [Fri, 1 Apr 2022 06:52:52 +0000 (02:52 -0400)]
KVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()

The return value type defined in the function kvm_age_rmapp() is
"bool", but the return value type defined in the implementation of the
function kvm_age_rmapp() is "int".

Change the return value type to "bool".

Signed-off-by: Bo Liu <liubo03@inspur.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220401065252.36472-1-liubo03@inspur.com
2 years agoKVM: PPC: Book3S HV: fix incorrect NULL check on list iterator
Xiaomeng Tong [Thu, 14 Apr 2022 06:21:03 +0000 (14:21 +0800)]
KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator

The bug is here:
if (!p)
                return ret;

The list iterator value 'p' will *always* be set and non-NULL by
list_for_each_entry(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element is found.

To fix the bug, Use a new value 'iter' as the list iterator, while use
the old value 'p' as a dedicated variable to point to the found element.

Fixes: 7d57ac025eeb ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com
2 years agoKVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() comment
Bagas Sanjaya [Fri, 6 May 2022 07:07:47 +0000 (14:07 +0700)]
KVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() comment

kernel test robot reported kernel-doc warning for rm_host_ipi_action():

   arch/powerpc/kvm/book3s_hv_rm_xics.c:887: warning: This comment starts with '/**', but isn't a kernel-doc comment.
    * Host Operations poked by RM KVM

Since the function is static, remove the extraneous (second) asterisk at
the head of function comment.

Fixes: 3f92524e15aaf3 ("KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linux-doc/202204252334.Cd2IsiII-lkp@intel.com/
Link: https://lore.kernel.org/r/20220506070747.16309-1-bagasdotme@gmail.com
2 years agoKVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
Nicholas Piggin [Thu, 3 Mar 2022 05:33:15 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting

The L1 should not be able to adjust LPES mode for the L2. Setting LPES
if the L0 needs it clear would cause external interrupts to be sent to
L2 and missed by the L0.

Clearing LPES when it may be set, as typically happens with XIVE enabled
could cause a performance issue despite having no native XIVE support in
the guest, because it will cause mediated interrupts for the L2 to be
taken in HV mode, which then have to be injected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
Nicholas Piggin [Thu, 3 Mar 2022 05:33:14 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context

The PowerNV L0 currently pushes the OS xive context when running a vCPU,
regardless of whether it is running a nested guest. The problem is that
xive OS ring interrupts will be delivered while the L2 is running.

At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which
actually makes external interrupts go to the L0. That causes the L2 to
exit and the interrupt taken or injected into the L1, so in some
respects this behaves like an escalation. It's not clear if this was
deliberate or not, there's no comment about it and the L1 is actually
allowed to clear LPES in the L2, so it's confusing at best.

When the L2 is running, the L1 is essentially in a ceded state with
respect to external interrupts (it can't respond to them directly and
won't get scheduled again absent some additional event). So the natural
way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to
run the L2, have it arm the escalation interrupt and don't push the L1
context while running the L2.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-6-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Split !nested case out from guest entry
Nicholas Piggin [Thu, 3 Mar 2022 05:33:13 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV P9: Split !nested case out from guest entry

The differences between nested and !nested will become larger in
later changes so split them out for readability.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-5-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
Nicholas Piggin [Thu, 3 Mar 2022 05:33:12 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming

Move the cede abort logic out of xive escalation rearming and into
the caller to prepare for handling a similar case with nested guest
entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-4-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
Nicholas Piggin [Thu, 3 Mar 2022 05:33:11 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry

If there is a pending xive interrupt, inject it at guest entry (if
MSR[EE] is enabled) rather than take another interrupt when the guest
is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
should be expected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-3-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS
Nicholas Piggin [Sun, 23 Jan 2022 12:00:43 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS

KVMPPC_NR_LPIDS no longer represents any size restriction on the
LPID space and can be removed. A CPU with more than 12 LPID bits
implemented will now be able to create more than 4095 guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-7-npiggin@gmail.com
2 years agoKVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum
Nicholas Piggin [Sun, 23 Jan 2022 12:00:42 +0000 (22:00 +1000)]
KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum

Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic,
fix it to 4096 (12-bits) explicitly for now.

kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS
because the L1 partition table registration hcall already did that, and
it checks against the partition table size.

This patch also puts all the partition table size calculations into the
same form, using 12 for the architected size field shift and 4 for the
shift corresponding to the partition table entry size.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr
Nicholas Piggin [Sun, 23 Jan 2022 12:00:41 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr

This removes the fixed sized kvm->arch.nested_guests array.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-5-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: Use IDA allocator for LPID allocator
Nicholas Piggin [Sun, 23 Jan 2022 12:00:40 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator

This removes the fixed-size lpid_inuse array.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-4-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested
Nicholas Piggin [Sun, 23 Jan 2022 12:00:39 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested

The LPID allocator init is changed to:
- use mmu_lpid_bits rather than hard-coding;
- use KVM_MAX_NESTED_GUESTS for nested hypervisors;
- not reserve the top LPID on POWER9 and newer CPUs.

The reserved LPID is made a POWER7/8-specific detail.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com
2 years agoKVM: PPC: Remove kvmppc_claim_lpid
Nicholas Piggin [Sun, 23 Jan 2022 12:00:38 +0000 (22:00 +1000)]
KVM: PPC: Remove kvmppc_claim_lpid

Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to
change the underlying implementation in a future patch.

The host LPID is always 0, so that can be a detail of the allocator. If
the allocator range is restricted, that can reserve LPIDs at the top of
the range. This allows kvmppc_claim_lpid to be removed.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Optimise loads around context switch
Nicholas Piggin [Sun, 23 Jan 2022 11:47:25 +0000 (21:47 +1000)]
KVM: PPC: Book3S HV P9: Optimise loads around context switch

It is better to get all loads for the register values in flight
before starting to switch LPID, PID, and LPCR because those
mtSPRs are expensive and serialising.

This also just tidies up the code for a potential future change
to the context switching sequence.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123114725.3549202-1-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: HFSCR[PREFIX] does not exist
Nicholas Piggin [Sat, 22 Jan 2022 10:56:39 +0000 (20:56 +1000)]
KVM: PPC: Book3S HV: HFSCR[PREFIX] does not exist

This facility is controlled by FSCR only. Reserved bits should not be
set in the HFSCR register (although it's likely harmless as this
position would not be re-used, and the L0 is forgiving here too).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220122105639.3477407-1-npiggin@gmail.com
2 years agopowerpc/rtas: Keep MSR[RI] set when calling RTAS
Laurent Dufour [Wed, 4 May 2022 10:12:44 +0000 (12:12 +0200)]
powerpc/rtas: Keep MSR[RI] set when calling RTAS

RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big
endian mode (MSR[SF,LE] unset).

The change in MSR is done in enter_rtas() in a relatively complex way,
since the MSR value could be hardcoded.

Furthermore, a panic has been reported when hitting the watchdog interrupt
while running in RTAS, this leads to the following stack trace:

  watchdog: CPU 24 Hard LOCKUP
  watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago)
  ...
  Supported: No, Unreleased kernel
  CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G            E  X    5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
  NIP:  000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
  REGS: c00000000fc33d60 TRAP: 0100   Tainted: G            E  X     (5.14.21-150400.71.1.bz196362_2-default)
  MSR:  8000000002981000 <SF,VEC,VSX,ME>  CR: 48800002  XER: 20040020
  CFAR: 000000000000011c IRQMASK: 1
  GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
  GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
  GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
  GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
  GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
  GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
  GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
  NIP [000000001fb41050] 0x1fb41050
  LR [000000001fb4104c] 0x1fb4104c
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  Oops: Unrecoverable System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  ...
  Supported: No, Unreleased kernel
  CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G            E  X    5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
  NIP:  000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
  REGS: c00000000fc33d60 TRAP: 0100   Tainted: G            E  X     (5.14.21-150400.71.1.bz196362_2-default)
  MSR:  8000000002981000 <SF,VEC,VSX,ME>  CR: 48800002  XER: 20040020
  CFAR: 000000000000011c IRQMASK: 1
  GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
  GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
  GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
  GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
  GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
  GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
  GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
  NIP [000000001fb41050] 0x1fb41050
  LR [000000001fb4104c] 0x1fb4104c
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 3ddec07f638c34a2 ]---

This happens because MSR[RI] is unset when entering RTAS but there is no
valid reason to not set it here.

RTAS is expected to be called with MSR[RI] as specified in PAPR+ section
"7.2.1 Machine State":

  R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect
  its own critical regions from recursion by setting the MSR[RI] bit to
  0 when in the critical regions.

Fixing this by reviewing the way MSR is compute before calling RTAS. Now a
hardcoded value meaning real mode, 32 bits big endian mode and Recoverable
Interrupt is loaded. In the case MSR[S] is set, it will remain set while
entering RTAS as only urfid can unset it (thanks Fabiano).

In addition a check is added in do_enter_rtas() to detect calls made with
MSR[RI] unset, as we are forcing it on later.

This patch has been tested on the following machines:
Power KVM Guest
  P8 S822L (host Ubuntu kernel 5.11.0-49-generic)
PowerVM LPAR
  P8 9119-MME (FW860.A1)
  p9 9008-22L (FW950.00)
  P10 9080-HEX (FW1010.00)

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com
2 years agopowerpc/8xx: Use kmalloced data structure instead of global static
Christophe Leroy [Wed, 6 Apr 2022 06:23:21 +0000 (08:23 +0200)]
powerpc/8xx: Use kmalloced data structure instead of global static

Use a kmalloced data structure to store interrupt controller internal
data instead of static global variables.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c8f0866ee013113d5e28948943cf0586e49f5353.1649226186.git.christophe.leroy@csgroup.eu
2 years agopowerpc/8xx: Remove mpc8xx_pics_init()
Christophe Leroy [Wed, 6 Apr 2022 06:23:20 +0000 (08:23 +0200)]
powerpc/8xx: Remove mpc8xx_pics_init()

mpc8xx_pics_init() is now only a trampoline to
mpc8xx_pic_init().

Remove mpc8xx_pics_init() and use mpc8xx_pic_init()
directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9c55a698adb5ba3b7b77023170fcaf0acb5d2d81.1649226186.git.christophe.leroy@csgroup.eu
2 years agopowerpc/8xx: Convert CPM1 interrupt controller to platform_device
Christophe Leroy [Wed, 6 Apr 2022 06:23:19 +0000 (08:23 +0200)]
powerpc/8xx: Convert CPM1 interrupt controller to platform_device

In the same logic as commit e49be8134be4 ("soc: fsl: qe: convert QE
interrupt controller to platform_device"), convert CPM1 interrupt
controller to platform_device.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fb80d0b2077312079c49da0296e25591578771cd.1649226186.git.christophe.leroy@csgroup.eu
2 years agopowerpc/8xx: Convert CPM1 error interrupt handler to platform driver
Christophe Leroy [Wed, 6 Apr 2022 06:23:18 +0000 (08:23 +0200)]
powerpc/8xx: Convert CPM1 error interrupt handler to platform driver

Add CPM error interrupt as a standalone platform driver,
to simplify the init of CPM interrupt handler.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/375a72df6e4a26c5959cc81a6c6d46152efa2306.1649226186.git.christophe.leroy@csgroup.eu
2 years agopowerpc/8xx: Move CPM interrupt controller into a dedicated file
Christophe Leroy [Wed, 6 Apr 2022 06:23:17 +0000 (08:23 +0200)]
powerpc/8xx: Move CPM interrupt controller into a dedicated file

CPM interrupt controller is quite standalone. Move it into a
dedicated file. It will help for next step which will change
it to a platform driver.

This is pure code move, checkpatch report is ignored at this point,
except one parenthesis alignment which would remain at the end of
the series. All other points fly away with following patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d3a7dc832d905bed14b35d83410cdb69a7ba20e8.1649226186.git.christophe.leroy@csgroup.eu
2 years agocxl/ocxl: Prepare cleanup of powerpc's asm/prom.h
Christophe Leroy [Sat, 2 Apr 2022 09:52:33 +0000 (11:52 +0200)]
cxl/ocxl: Prepare cleanup of powerpc's asm/prom.h

powerpc's asm/prom.h brings some headers that it doesn't
need itself.

In order to clean it up, first add missing headers in
users of asm/prom.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a2bae89b280e7a7cb87889635d9911d6a245e780.1648833388.git.christophe.leroy@csgroup.eu
2 years agomacintosh: Prepare cleanup of powerpc's asm/prom.h
Christophe Leroy [Fri, 1 Apr 2022 17:15:53 +0000 (19:15 +0200)]
macintosh: Prepare cleanup of powerpc's asm/prom.h

powerpc's asm/prom.h brings some headers that it doesn't
need itself.

In order to clean it up, first add missing headers in
users of asm/prom.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/04961364547fe4556e30cb302b0e20a939b83426.1648833027.git.christophe.leroy@csgroup.eu
2 years agopowerpc/code-patching: Use jump_label to check if poking_init() is done
Christophe Leroy [Tue, 22 Mar 2022 15:40:21 +0000 (16:40 +0100)]
powerpc/code-patching: Use jump_label to check if poking_init() is done

It's only during early startup that poking_init() is not done yet,
for instance when calling ftrace_init().

Once poking_init() has been called there must be a poking area, no
need to check it everytime patch_instruction() is called.

ftrace activation time is reduced by 7% with the change on an 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8d6088aca7b63247377b6d9e4897d08d935fbe93.1647962456.git.christophe.leroy@csgroup.eu
2 years agopowerpc/code-patching: Use jump_label for testing freed initmem
Christophe Leroy [Tue, 22 Mar 2022 15:40:20 +0000 (16:40 +0100)]
powerpc/code-patching: Use jump_label for testing freed initmem

Once init is done, initmem is freed forever so no need to
test system_state at every call to patch_instruction().

Use jump_label.

This reduces by 2% the time needed to activate ftrace on an 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0aee964721cab7316cffde21a2ca223cee14d373.1647962456.git.christophe.leroy@csgroup.eu
2 years agoKVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
Alexander Graf [Tue, 10 May 2022 12:37:17 +0000 (14:37 +0200)]
KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()

Commit afc9eac9b377 ("powerpc/32s: Convert switch_mmu_context() to C")
moved the switch_mmu_context() to C. While in principle a good idea, it
meant that the function now uses the stack. The stack is not accessible
from real mode though.

So to keep calling the function, let's turn on MSR_DR while we call it.
That way, all pointer references to the stack are handled virtually.

In addition, make sure to save/restore r12 on the stack, as it may get
clobbered by the C function.

Fixes: afc9eac9b377 ("powerpc/32s: Convert switch_mmu_context() to C")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220510123717.24508-1-graf@amazon.com
2 years agopowerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES
Christophe Leroy [Tue, 22 Mar 2022 15:40:18 +0000 (16:40 +0100)]
powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES

If CONFIG_MODULES is not set, there is no point in checking
whether text is in module area.

This reduced the time needed to activate/deactivate ftrace
by more than 10% on an 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f3c701cce00a38620788c0fc43ff0b611a268c54.1647962456.git.christophe.leroy@csgroup.eu
2 years agopowerpc: align address to page boundary in change_page_attr()
Christophe Leroy [Mon, 21 Mar 2022 15:44:45 +0000 (16:44 +0100)]
powerpc: align address to page boundary in change_page_attr()

Aligning address to page boundary allows flush_tlb_kernel_range()
to know it's a single page flush and use tlbie instead of tlbia.

On 603 we now have the following code in first leg of
change_page_attr():

  2c: 55 29 00 3c  rlwinm  r9,r9,0,0,30
  30: 91 23 00 00  stw     r9,0(r3)
  34: 7c 00 22 64  tlbie   r4,r0
  38: 7c 00 04 ac  hwsync
  3c: 38 60 00 00  li      r3,0
  40: 4e 80 00 20  blr

Before we had:

  28: 55 29 00 3c  rlwinm  r9,r9,0,0,30
  2c: 91 23 00 00  stw     r9,0(r3)
  30: 54 89 00 26  rlwinm  r9,r4,0,0,19
  34: 38 84 10 00  addi    r4,r4,4096
  38: 7c 89 20 50  subf    r4,r9,r4
  3c: 28 04 10 00  cmplwi  r4,4096
  40: 41 81 00 30  bgt     70 <change_page_attr+0x70>
  44: 7c 00 4a 64  tlbie   r9,r0
  48: 7c 00 04 ac  hwsync
  4c: 38 60 00 00  li      r3,0
  50: 4e 80 00 20  blr
...
  70: 94 21 ff f0  stwu    r1,-16(r1)
  74: 7c 08 02 a6  mflr    r0
  78: 90 01 00 14  stw     r0,20(r1)
  7c: 48 00 00 01  bl      7c <change_page_attr+0x7c>
7c: R_PPC_REL24 _tlbia
  80: 80 01 00 14  lwz     r0,20(r1)
  84: 38 60 00 00  li      r3,0
  88: 7c 08 03 a6  mtlr    r0
  8c: 38 21 00 10  addi    r1,r1,16
  90: 4e 80 00 20  blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6bb118fb2ee89fa3c1f9cf90ed19f88220002cb0.1647877467.git.christophe.leroy@csgroup.eu
2 years agopowerpc/8xx: Simplify flush_tlb_kernel_range()
Christophe Leroy [Mon, 21 Mar 2022 15:44:18 +0000 (16:44 +0100)]
powerpc/8xx: Simplify flush_tlb_kernel_range()

In the same spirit as commit 0dc5f3a1794d ("powerpc/8xx: Simplify TLB
handling"), simplify flush_tlb_kernel_range() for 8xx.

8xx cannot be SMP, and has 'tlbie' and 'tlbia' instructions, so
an inline version of flush_tlb_kernel_range() for 8xx is worth it.

With this page, first leg of change_page_attr() is:

  2c: 55 29 00 3c  rlwinm  r9,r9,0,0,30
  30: 91 23 00 00  stw     r9,0(r3)
  34: 7c 00 22 64  tlbie   r4,r0
  38: 7c 00 04 ac  hwsync
  3c: 38 60 00 00  li      r3,0
  40: 4e 80 00 20  blr

Before the patch it was:

  30: 55 29 00 3c  rlwinm  r9,r9,0,0,30
  34: 91 2a 00 00  stw     r9,0(r10)
  38: 94 21 ff f0  stwu    r1,-16(r1)
  3c: 7c 08 02 a6  mflr    r0
  40: 38 83 10 00  addi    r4,r3,4096
  44: 90 01 00 14  stw     r0,20(r1)
  48: 48 00 00 01  bl      48 <change_page_attr+0x48>
48: R_PPC_REL24 flush_tlb_kernel_range
  4c: 80 01 00 14  lwz     r0,20(r1)
  50: 38 60 00 00  li      r3,0
  54: 7c 08 03 a6  mtlr    r0
  58: 38 21 00 10  addi    r1,r1,16
  5c: 4e 80 00 20  blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d2610043419ce3e0e53a85386baf2c3625af5cfb.1647877442.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Use static call for get_irq()
Christophe Leroy [Fri, 11 Mar 2022 12:38:04 +0000 (13:38 +0100)]
powerpc: Use static call for get_irq()

__do_irq() inconditionnaly calls ppc_md.get_irq()

That's definitely a hot path.

At the time being ppc_md.get_irq address is read every time
from ppc_md structure.

Replace that call by a static call, and initialise that
call after ppc_md.init_IRQ() has set ppc_md.get_irq.

Emit a warning and don't set the static call if ppc_md.init_IRQ()
is still NULL, that way the kernel won't blow up if for some
reason ppc_md.get_irq() doesn't get properly set.

With the patch:

00000000 <__SCT__ppc_get_irq>:
   0: 48 00 00 20  b       20 <__static_call_return0> <== Replaced by 'b <ppc_md.get_irq>' at runtime
...
00000020 <__static_call_return0>:
  20: 38 60 00 00  li      r3,0
  24: 4e 80 00 20  blr
...
00000058 <__do_irq>:
...
  64: 48 00 00 01  bl      64 <__do_irq+0xc>
64: R_PPC_REL24 __SCT__ppc_get_irq
  68: 2c 03 00 00  cmpwi   r3,0
...

Before the patch:

00000038 <__do_irq>:
...
  3c: 3d 20 00 00  lis     r9,0
3e: R_PPC_ADDR16_HA ppc_md+0x1c
...
  44: 81 29 00 00  lwz     r9,0(r9)
46: R_PPC_ADDR16_LO ppc_md+0x1c
...
  4c: 7d 29 03 a6  mtctr   r9
  50: 4e 80 04 21  bctrl
  54: 2c 03 00 00  cmpwi   r3,0
...

On PPC64 which doesn't implement static calls yet we get:

00000000000000d0 <__do_irq>:
...
      dc: 00 00 22 3d  addis   r9,r2,0
dc: R_PPC64_TOC16_HA .data+0x8
...
      e4: 00 00 89 e9  ld      r12,0(r9)
e4: R_PPC64_TOC16_LO_DS .data+0x8
...
      f0: a6 03 89 7d  mtctr   r12
      f4: 18 00 41 f8  std     r2,24(r1)
      f8: 21 04 80 4e  bctrl
      fc: 18 00 41 e8  ld      r2,24(r1)
...

So on PPC64 that's similar to what we get without static calls.
But at least until ppc_md.get_irq() is set the call is to
__static_call_return0.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/afb92085f930651d8b1063e4d4bf0396c80ebc7d.1647002274.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Use rol32() instead of opencoding in csum_fold()
Christophe Leroy [Wed, 9 Mar 2022 07:56:14 +0000 (08:56 +0100)]
powerpc: Use rol32() instead of opencoding in csum_fold()

rol32(x, 16) will do the rotate using rlwinm.

No need to open code using inline assembly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/794337eff7bb803d2c4e67d9eee635390c4c48fe.1646812553.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Add missing headers
Christophe Leroy [Tue, 8 Mar 2022 19:20:25 +0000 (20:20 +0100)]
powerpc: Add missing headers

Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h,
asm/pci.h etc...

Include the needed headers, and remove asm/prom.h when it was
needed exclusively for pulling necessary headers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Remove asm/prom.h from all files that don't need it
Christophe Leroy [Tue, 8 Mar 2022 19:20:24 +0000 (20:20 +0100)]
powerpc: Remove asm/prom.h from all files that don't need it

Several files include asm/prom.h for no reason.

Clean it up.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop change to prom_parse.c as reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
2 years agopowerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
Kajol Jain [Thu, 5 May 2022 15:34:51 +0000 (21:04 +0530)]
powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE

With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks for string size which can panic the kernel, like incase
of overflow detection.

In papr_scm, papr_scm_pmu_check_events function uses stat->stat_id with
string operations, to populate the nvdimm_events_map array. Since
stat_id variable is not NULL terminated, the kernel panics with
CONFIG_FORTIFY_SOURCE enabled at boot time.

Below are the logs of kernel panic:

  detected buffer overflow in __fortify_strlen
  ------------[ cut here ]------------
  kernel BUG at lib/string_helpers.c:980!
  Oops: Exception in kernel mode, sig: 5 [#1]
  NIP [c00000000077dad0] fortify_panic+0x28/0x38
  LR [c00000000077dacc] fortify_panic+0x24/0x38
  Call Trace:
  [c0000022d77836e0] [c00000000077dacc] fortify_panic+0x24/0x38 (unreliable)
  [c00800000deb2660] papr_scm_pmu_check_events.constprop.0+0x118/0x220 [papr_scm]
  [c00800000deb2cb0] papr_scm_probe+0x288/0x62c [papr_scm]
  [c0000000009b46a8] platform_probe+0x98/0x150

Fix this issue by using kmemdup_nul() to copy the content of
stat->stat_id directly to the nvdimm_events_map array.

mpe: stat->stat_id comes from the hypervisor, not userspace, so there is
no security exposure.

Fixes: 47b6f0f87f11 ("powerpc/papr_scm: Add perf interface support")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220505153451.35503-1-kjain@linux.ibm.com
2 years agopowerpc: Add missing declaration in asm/drmem.h
Christophe Leroy [Tue, 8 Mar 2022 19:20:23 +0000 (20:20 +0100)]
powerpc: Add missing declaration in asm/drmem.h

Don't rely on random inclusion of linux/of.h by users
of asm/drmem.h

Add a forward declaration of struct property and
struct device_node.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5643ec410e51b749db0636471cb7979524f9ed0e.1646767214.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Include asm/reg.h in asm/svm.h
Christophe Leroy [Tue, 8 Mar 2022 19:20:22 +0000 (20:20 +0100)]
powerpc: Include asm/reg.h in asm/svm.h

is_secure_guest() uses mfmsr().

Don't rely on users to include asm/reg.h, include
it in asm/svm.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/482c82c8a29d5fb3ea279b34f107e0e775001344.1646767214.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Don't include asm/prom.h in asm/parport.h
Christophe Leroy [Tue, 8 Mar 2022 19:20:21 +0000 (20:20 +0100)]
powerpc: Don't include asm/prom.h in asm/parport.h

parport.h needs only of_irq.h, no need to go via asm/prom.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec796ee56cf61f16ba24e62a9d3525d11931538c.1646767214.git.christophe.leroy@csgroup.eu
2 years agopowerpc/64: Move pci_device_from_OF_node() out of asm/pci-bridge.h
Christophe Leroy [Tue, 8 Mar 2022 19:20:20 +0000 (20:20 +0100)]
powerpc/64: Move pci_device_from_OF_node() out of asm/pci-bridge.h

Move pci_device_from_OF_node() in pci64.c because it needs definition
of struct device_node and is not worth inlining.

ppc32.c already has it in pci32.c.

That way pci-bridge.h doesn't need linux/of.h (Brought by asm/prom.h
via asm/pci.h)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3c88286b55413730d7784133993a46ef4a3607ce.1646767214.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Reduce csum_add() complexity for PPC64
Christophe Leroy [Sat, 12 Feb 2022 07:36:17 +0000 (08:36 +0100)]
powerpc: Reduce csum_add() complexity for PPC64

PPC64 does everything in C, gcc is able to skip calculation
when one of the operands in zero.

Move the constant folding in PPC32 part.

This helps GCC and reduces ppc64_defconfig by 170 bytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a4ca63dd4c4b09e1906d08fb814af5a41d0f3fcb.1644651363.git.christophe.leroy@csgroup.eu
2 years agopowerpc/64: remove system call instruction emulation
Nicholas Piggin [Wed, 30 Mar 2022 14:07:19 +0000 (19:37 +0530)]
powerpc/64: remove system call instruction emulation

emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.

Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.

This patch removes system call emulation and disables stepping system
calls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a412e3b3791ed83de18704c8d90f492e7a0049c0.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2 years agopowerpc: Reject probes on instructions that can't be single stepped
Naveen N. Rao [Wed, 30 Mar 2022 14:07:18 +0000 (19:37 +0530)]
powerpc: Reject probes on instructions that can't be single stepped

Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software

Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.

For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.

For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().

Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.

In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.

In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f56ee979d50b8711fae350fc97870f3ca34acd75.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2 years agopowerpc: Sort and de-dup primary opcodes in ppc-opcode.h
Naveen N. Rao [Wed, 30 Mar 2022 14:07:17 +0000 (19:37 +0530)]
powerpc: Sort and de-dup primary opcodes in ppc-opcode.h

Some of the primary opcodes are duplicated. Remove those, and sort the
rest of the primary opcodes to make it easy to read.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a05edf638a2638d708fc2db0272f6317837b5eab.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2 years agopowerpc: fix typos in comments
Julia Lawall [Sat, 30 Apr 2022 18:56:54 +0000 (20:56 +0200)]
powerpc: fix typos in comments

Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2 years agopowerpc/boot: Stop using RELACOUNT
Alexey Kardashevskiy [Wed, 6 Apr 2022 07:00:38 +0000 (17:00 +1000)]
powerpc/boot: Stop using RELACOUNT

So far the RELACOUNT tag from the ELF header was containing the exact
number of R_PPC_RELATIVE/R_PPC64_RELATIVE relocations. However the LLVM's
recent change [1] make it equal-or-less than the actual number which
makes it useless.

This replaces RELACOUNT in zImage loader with a pair of RELASZ and RELAENT.
The vmlinux relocation code is fixed in commit 7e3e3604ec7b
("powerpc/64: Add UADDR64 relocation support").

To make it more future proof, this walks through the entire .rela.dyn
section instead of assuming that the section is sorter by a relocation
type. Unlike 7e3e3604ec7b, this does not add unaligned UADDR/UADDR64
relocations as we are likely not to see those in practice - the zImage
is small and very arch specific so there is a smaller chance that some
generic feature (such as PRINK_INDEX) triggers unaligned relocations.

[1] https://github.com/llvm/llvm-project/commit/da0e5b885b25cf4

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406070038.3704604-1-aik@ozlabs.ru
2 years agopowerpc: Simplify and move arch_randomize_brk()
Christophe Leroy [Sat, 9 Apr 2022 17:17:37 +0000 (19:17 +0200)]
powerpc: Simplify and move arch_randomize_brk()

arch_randomize_brk() is only needed for hash on book3s/64, for other
platforms the one provided by the default mmap layout is good enough.

Move it to hash_utils.c and use randomize_page() like the generic one.

And properly opt out the radix case instead of making an assumption
on mmu_highuser_ssize.

Also change to a 32M range like most other architectures instead of 8M.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eafa4d18ec8ac7b98dd02b40181e61643707cc7c.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Convert to default topdown mmap layout
Christophe Leroy [Sat, 9 Apr 2022 17:17:36 +0000 (19:17 +0200)]
powerpc/mm: Convert to default topdown mmap layout

Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
remove arch/powerpc/mm/mmap.c

This change reuses the generic framework added by
commit ded4d20ca5d1 ("arm64, mm: move generic mmap layout
functions to mm") without any functional change.

Comparison between powerpc implementation and the generic one:
- mmap_is_legacy() is identical.
- arch_mmap_rnd() does exactly the same allthough it's written
slightly differently.
- MIN_GAP and MAX_GAP are identical.
- mmap_base() does the same but uses STACK_RND_MASK which provides
the same values as stack_maxrandom_size().
- arch_pick_mmap_layout() is identical.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/518f9def87d3c889d5958103e7463cf45a2f673d.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Enable full randomisation of memory mappings
Christophe Leroy [Sat, 9 Apr 2022 17:17:35 +0000 (19:17 +0200)]
powerpc/mm: Enable full randomisation of memory mappings

Do like most other architectures and provide randomisation also to
"legacy" memory mappings, by adding the random factor to
mm->mmap_base in arch_pick_mmap_layout().

See commit bcef8662d23a ("x86/mm/32: Enable full randomization on
i386 and X86_32") for all explanations and benefits of that mmap
randomisation.

At the moment, slice_find_area_bottomup() doesn't use mm->mmap_base
but uses the fixed TASK_UNMAPPED_BASE instead.
slice_find_area_bottomup() being used as a fallback to
slice_find_area_topdown(), it can't use mm->mmap_base
directly.

Instead of always using TASK_UNMAPPED_BASE as base address, leave
it to the caller. When called from slice_find_area_topdown()
TASK_UNMAPPED_BASE is used. Otherwise mm->mmap_base is used.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/417fb10dde828534c73a03138b49621d74f4e5be.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Move get_unmapped_area functions to slice.c
Christophe Leroy [Sat, 9 Apr 2022 17:17:34 +0000 (19:17 +0200)]
powerpc/mm: Move get_unmapped_area functions to slice.c

hugetlb_get_unmapped_area() is now identical to the
generic version if only RADIX is enabled, so move it
to slice.c and let it fallback on the generic one
when HASH MMU is not compiled in.

Do the same with arch_get_unmapped_area() and
arch_get_unmapped_area_topdown().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b5d9c124e82889e0cb115c150915a0c0d84eb960.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Use generic_hugetlb_get_unmapped_area()
Christophe Leroy [Sat, 9 Apr 2022 17:17:33 +0000 (19:17 +0200)]
powerpc/mm: Use generic_hugetlb_get_unmapped_area()

Use the generic version of arch_hugetlb_get_unmapped_area()
which is now available at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/05f77014c619061638ecc52a0a4136eb04cc2799.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Use generic_get_unmapped_area() and call it from arch_get_unmapped_area()
Christophe Leroy [Sat, 9 Apr 2022 17:17:32 +0000 (19:17 +0200)]
powerpc/mm: Use generic_get_unmapped_area() and call it from arch_get_unmapped_area()

Use the generic version of arch_get_unmapped_area() which
is now available at all time instead of its copy
radix__arch_get_unmapped_area()

To allow that for PPC64, add arch_get_mmap_base() and
arch_get_mmap_end() macros.

Instead of setting mm->get_unmapped_area() to either
arch_get_unmapped_area() or generic_get_unmapped_area(),
always set it to arch_get_unmapped_area() and call
generic_get_unmapped_area() from there when radix is enabled.

Do the same with radix__arch_get_unmapped_area_topdown()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/393be1fa386446443682fdb74544d733f68ef3bb.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Remove CONFIG_PPC_MM_SLICES
Christophe Leroy [Sat, 9 Apr 2022 17:17:31 +0000 (19:17 +0200)]
powerpc/mm: Remove CONFIG_PPC_MM_SLICES

CONFIG_PPC_MM_SLICES is always selected by hash book3s/64.
CONFIG_PPC_MM_SLICES is never selected by other platforms.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dc2cdc204de8978574bf7c02329b6cfc4db0bce7.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Make slice specific to book3s/64
Christophe Leroy [Sat, 9 Apr 2022 17:17:30 +0000 (19:17 +0200)]
powerpc/mm: Make slice specific to book3s/64

Since commit 7b0ede696b53 ("powerpc/8xx: MM_SLICE is not needed
anymore") only book3s/64 selects CONFIG_PPC_MM_SLICES.

Move slice.c into mm/book3s64/

Move necessary stuff in asm/book3s/64/slice.h and
remove asm/slice.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a0d74ef1966a5902b5fd4ac4b513a760a6d675a.1649523076.git.christophe.leroy@csgroup.eu
2 years agopowerpc/mm: Move vma_mmu_pagesize()
Christophe Leroy [Sat, 9 Apr 2022 17:17:29 +0000 (19:17 +0200)]
powerpc/mm: Move vma_mmu_pagesize()

vma_mmu_pagesize() is only required for slices,
otherwise there is a generic weak version doing the
exact same thing.

Move it to slice.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1302e000d529c93d07208f1fae90f938e7a551b4.1649523076.git.christophe.leroy@csgroup.eu
2 years agomm: Add len and flags parameters to arch_get_mmap_end()
Christophe Leroy [Sat, 9 Apr 2022 17:17:28 +0000 (19:17 +0200)]
mm: Add len and flags parameters to arch_get_mmap_end()

Powerpc needs flags and len to make decision on arch_get_mmap_end().

So add them as parameters to arch_get_mmap_end().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b556daabe7d2bdb2361c4d6130280da7c1ba2c14.1649523076.git.christophe.leroy@csgroup.eu
2 years agomm, hugetlbfs: Allow an arch to always use generic versions of get_unmapped_area...
Christophe Leroy [Sat, 9 Apr 2022 17:17:27 +0000 (19:17 +0200)]
mm, hugetlbfs: Allow an arch to always use generic versions of get_unmapped_area functions

Unlike most architectures, powerpc can only define at runtime
if it is going to use the generic arch_get_unmapped_area() or not.

Today, powerpc has a copy of the generic arch_get_unmapped_area()
because when selection HAVE_ARCH_UNMAPPED_AREA the generic
arch_get_unmapped_area() is not available.

Rename it generic_get_unmapped_area() and make it independent of
HAVE_ARCH_UNMAPPED_AREA.

Do the same for arch_get_unmapped_area_topdown() versus
HAVE_ARCH_UNMAPPED_AREA_TOPDOWN.

Do the same for hugetlb_get_unmapped_area() versus
HAVE_ARCH_HUGETLB_UNMAPPED_AREA.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/77f9d3e592f1c8511df9381aa1c4e754651da4d1.1649523076.git.christophe.leroy@csgroup.eu
2 years agomm: Allow arch specific arch_randomize_brk() with CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MM...
Christophe Leroy [Sat, 9 Apr 2022 17:17:26 +0000 (19:17 +0200)]
mm: Allow arch specific arch_randomize_brk() with CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT

Commit ad6530519f44 ("arm64, mm: make randomization selected by
generic topdown mmap layout") introduced a default version of
arch_randomize_brk() provided when
CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT is selected.

powerpc could select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
but needs to provide its own arch_randomize_brk().

In order to allow that, define generic version of arch_randomize_brk()
as a __weak symbol.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b222f1ca06c850daf1b2f26afdb46c6dd97d21ba.1649523076.git.christophe.leroy@csgroup.eu
2 years agoMerge tag 'v5.18-rc4' into next
Michael Ellerman [Thu, 5 May 2022 12:09:35 +0000 (22:09 +1000)]
Merge tag 'v5.18-rc4' into next

Merge master into next, to bring in commit db8fab2bf51c ("mm, hugetlb:
allow for "high" userspace addresses"), which is needed as a
prerequisite for the series converting powerpc to the generic mmap
logic.

2 years agopowerpc/vdso: Fix incorrect CFI in gettimeofday.S
Michael Ellerman [Mon, 2 May 2022 12:50:10 +0000 (22:50 +1000)]
powerpc/vdso: Fix incorrect CFI in gettimeofday.S

As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit d59b86f6ede3 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").

DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.

The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x00007fffffffd960 in ?? ()
  #3  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  Backtrace stopped: frame did not save the PC

Alan helpfully describes some rules for correctly maintaining the CFI information:

  1) Every adjustment to the current frame address reg (ie. r1) must be
     described, and exactly at the instruction where r1 changes. Why?
     Because stack unwinding might want to access previous frames.

  2) If a function changes LR or any non-volatile register, the save
     location for those regs must be given. The CFI can be at any
     instruction after the saves up to the point that the reg is
     changed.
     (Exception: LR save should be described before a bl. not after)

  3) If asychronous unwind info is needed then restores of LR and
     non-volatile regs must also be described. The CFI can be at any
     instruction after the reg is restored up to the point where the
     save location is (potentially) trashed.

Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.

Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.

Finally, add CFI directives describing the save/restore of r2.

With the fix gdb can correctly back trace and navigate up and down the stack:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x0000000100015b60 in gettime ()
  #3  0x000000010000c8bc in print_long_format ()
  #4  0x000000010000d180 in print_current_files ()
  #5  0x00000001000054ac in main ()
  (gdb) up
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #5  0x00000001000054ac in main ()
  (gdb)
  Initial frame selected; you cannot go up.
  (gdb) down
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb)

Fixes: d59b86f6ede3 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Alan Modra <amodra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/r/20220502125010.1319370-1-mpe@ellerman.id.au