Merge our fixes branch. In particular this brings in commit 2f2e917c1ba1 ("powerpc/book3e: Fix PUD allocation size in
map_kernel_page()") which fixes a build failure in next, because commit 96bb43d3c142 ("powerpc/64e: Rewrite p4d_populate() as a static inline
function") depends on it.
powerpc/powernv: delay rng platform device creation until later in boot
The platform device for the rng must be created much later in boot.
Otherwise it tries to connect to a parent that doesn't yet exist,
resulting in this splat:
This patch fixes the issue by doing the platform device creation inside
of machine_subsys_initcall.
Fixes: 555c3a79e2b8 ("powerpc/powernv: wire up rng during setup_arch") Cc: stable@vger.kernel.org Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com>
[mpe: Change "of node" to "platform device" in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220630121654.1939181-1-Jason@zx2c4.com
powerpc/memhotplug: Add add_pages override for PPC
With commit 5952aaa7a933 ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit")
the kernel now validate the addr against high_memory value. This results
in the below BUG_ON with dax pfns.
The fix is to make sure we update high_memory on memory hotplug.
This is similar to what x86 does in commit e6d361888aa0 ("mm/memory_hotplug: introduce add_pages")
Fixes: 5952aaa7a933 ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220629050925.31447-1-aneesh.kumar@linux.ibm.com
Naveen N. Rao [Mon, 27 Jun 2022 19:11:19 +0000 (00:41 +0530)]
powerpc/bpf: Fix use of user_pt_regs in uapi
Trying to build a .c file that includes <linux/bpf_perf_event.h>:
$ cat test_bpf_headers.c
#include <linux/bpf_perf_event.h>
throws the below error:
/usr/include/linux/bpf_perf_event.h:14:28: error: field ‘regs’ has incomplete type
14 | bpf_user_pt_regs_t regs;
| ^~~~
This is because we typedef bpf_user_pt_regs_t to 'struct user_pt_regs'
in arch/powerpc/include/uaps/asm/bpf_perf_event.h, but 'struct
user_pt_regs' is not exposed to userspace.
Powerpc has both pt_regs and user_pt_regs structures. However, unlike
arm64 and s390, we expose user_pt_regs to userspace as just 'pt_regs'.
As such, we should typedef bpf_user_pt_regs_t to 'struct pt_regs' for
userspace.
Within the kernel though, we want to typedef bpf_user_pt_regs_t to
'struct user_pt_regs'.
Remove arch/powerpc/include/uapi/asm/bpf_perf_event.h so that the
uapi/asm-generic version of the header is exposed to userspace.
Introduce arch/powerpc/include/asm/bpf_perf_event.h so that we can
typedef bpf_user_pt_regs_t to 'struct user_pt_regs' for use within the
kernel.
Note that this was not showing up with the bpf selftest build since
tools/include/uapi/asm/bpf_perf_event.h didn't include the powerpc
variant.
Fixes: f17a2af27d08b0 ("powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use typical naming for header include guard] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220627191119.142867-1-naveen.n.rao@linux.vnet.ibm.com
Pali Rohár [Fri, 24 Jun 2022 08:55:50 +0000 (10:55 +0200)]
powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers
CZ.NIC Turris 1.0 and 1.1 are open source routers, they have dual-core
PowerPC Freescale P2020 CPU and are based on Freescale P2020RDB-PC-A board.
Hardware design is fully open source, all firmware and hardware design
files are available at Turris project website:
Juerg Haefliger [Fri, 20 May 2022 11:54:31 +0000 (13:54 +0200)]
KVM: PPC: Kconfig: Fix indentation
The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.
powerpc/perf: Update MMCR2 to support event exclude_idle
struct perf_event_attr supports exclude counting of idle task.
This is sent to kernel via perf_event_attr.exclude_idle and
in perf tool, user can use ":I" event modifier to enable this
for specific event.
Monitor Mode Control Register 2 (MMCR2) SPR has control bits
for each PMCs to freeze counting based on the Control Register
CTRL[RUN] state. CTRL[RUN] is not set when idle task is
running. Patch adds a check for event attr.exclude_idle to
set MMCR2[FCnWAIT] bit.
PowerVM has a stricter policy about allocating TCEs for LPARs and
often there is not enough TCEs for 1:1 mapping, this adds the supported
numbers into dev_info() to help analyzing bugreports.
KVM: PPC: Do not warn when userspace asked for too big TCE table
KVM manages emulated TCE tables for guest LIOBNs by a two level table
which maps up to 128TiB with 16MB IOMMU pages (enabled in QEMU by default)
and MAX_ORDER=11 (the kernel's default). Note that the last level of
the table is allocated when actual TCE is updated.
However these tables are created via ioctl() on kvmfd and the userspace
can trigger WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp) in mm/page_alloc.c
and flood dmesg.
Hari Bathini [Fri, 10 Jun 2022 15:55:52 +0000 (21:25 +0530)]
powerpc/bpf/32: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc32, both
of which include the BPF_FETCH flag. The kernel's atomic_cmpxchg
operation fundamentally has 3 operands, but we only have two register
fields. Therefore the operand we compare against (the kernel's API
calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's
atomic_cmpxchg returns the previous value at dst_reg + off. JIT the
same for BPF too with return value put in BPF_REG_0.
Hari Bathini [Fri, 10 Jun 2022 15:55:50 +0000 (21:25 +0530)]
powerpc/bpf/64: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc64, both
of which include the BPF_FETCH flag. The kernel's atomic_cmpxchg
operation fundamentally has 3 operands, but we only have two register
fields. Therefore the operand we compare against (the kernel's API
calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's
atomic_cmpxchg returns the previous value at dst_reg + off. JIT the
same for BPF too with return value put in BPF_REG_0.
Laurent Dufour [Mon, 23 May 2022 16:43:53 +0000 (18:43 +0200)]
powerpc/64s: Don't read H_BLOCK_REMOVE characteristics in radix mode
There is no need to read the H_BLOCK_REMOVE characteristics when running in
Radix mode because this hcall is never called.
Furthermore since the commit f10b804e100f ("powerpc/64s: Move hash MMU
support code under CONFIG_PPC_64S_HASH_MMU") define
pseries_lpar_read_hblkrm_characteristics as un empty function if
CONFIG_PPC_64S_HASH_MMU is not set, the #ifdef block can be removed.
Michael Ellerman [Tue, 31 May 2022 06:59:36 +0000 (16:59 +1000)]
powerpc/64: Drop ppc_inst_as_str()
The ppc_inst_as_str() macro tries to make printing variable length,
aka "prefixed", instructions convenient. It mostly succeeds, but it does
hide an on-stack buffer, which triggers stack protector.
More problematically it doesn't compile at all with GCC 12,
with -Wdangling-pointer, due to the fact that it returns the char buffer
declared inside the macro:
arch/powerpc/kernel/trace/ftrace.c: In function '__ftrace_modify_call':
./include/linux/printk.h:475:44: error: using a dangling pointer to '__str' [-Werror=dangling-pointer=]
475 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
...
arch/powerpc/kernel/trace/ftrace.c:567:17: note: in expansion of macro 'pr_err'
567 | pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
| ^~~~~~
./arch/powerpc/include/asm/inst.h:156:14: note: '__str' declared here
156 | char __str[PPC_INST_STR_LEN]; \
| ^~~~~
This could be fixed by having the caller declare the buffer, but in some
places there'd need to be two buffers. In all cases where
ppc_inst_as_str() is used the output is not really meant for user
consumption, it's almost always indicative of a kernel bug.
A simpler solution is to just print the value as an unsigned long. For
normal instructions the output is identical. For prefixed instructions
the value is printed as a single 64-bit quantity, whereas previously the
low half was printed first. But that is good enough for debug output,
especially as prefixed instructions will be rare in kernel code in
practice.
Fabiano Rosas [Fri, 24 Jun 2022 14:27:12 +0000 (11:27 -0300)]
KVM: PPC: Align pt_regs in kvm_vcpu_arch structure
The H_ENTER_NESTED hypercall receives as second parameter the address
of a region of memory containing the values for the nested guest
privileged registers. We currently use the pt_regs structure contained
within kvm_vcpu_arch for that end.
Most hypercalls that receive a memory address expect that region to
not cross a 4K page boundary. We would want H_ENTER_NESTED to follow
the same pattern so this patch ensures the pt_regs structure sits
within a page.
Note: the pt_regs structure is currently 384 bytes in size, so
aligning to 512 is sufficient to ensure it will not cross a 4K page
and avoids punching too big a hole in struct kvm_vcpu_arch.
Fabiano Rosas [Wed, 25 May 2022 13:05:54 +0000 (10:05 -0300)]
KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path
Alter the data collection points for the debug timing code in the P9
path to be more in line with what the code does. The points where we
accumulate time are now the following:
vcpu_entry: From vcpu_run_hv entry until the start of the inner loop;
guest_entry: From the start of the inner loop until the guest entry in
asm;
in_guest: From the guest entry in asm until the return to KVM C code;
guest_exit: From the return into KVM C code until the corresponding
hypercall/page fault handling or re-entry into the guest;
hypercall: Time spent handling hcalls in the kernel (hcalls can go to
QEMU, not accounted here);
page_fault: Time spent handling page faults;
vcpu_exit: vcpu_run_hv exit (almost no code here currently).
Like before, these are exposed in debugfs in a file called
"timings". There are four values:
- number of occurrences of the accumulation point;
- total time the vcpu spent in the phase in ns;
- shortest time the vcpu spent in the phase in ns;
- longest time the vcpu spent in the phase in ns;
Fabiano Rosas [Wed, 25 May 2022 13:05:52 +0000 (10:05 -0300)]
KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path
We are currently doing the timing for debug purposes of the P9 entry
path using the accumulators and terminology defined by the old entry
path for P8 machines.
Not only the "real-mode" and "napping" mentions are out of place for
the P9 Radix entry path but also we cannot change them because the
timing code is coupled to the structures defined in struct
kvm_vcpu_arch.
Add a new CONFIG_KVM_BOOK3S_HV_P9_TIMING to enable the timing code for
the P9 entry path. For now, just add the new CONFIG and duplicate the
structures. A subsequent patch will add the P9 changes.
The "rm_exit" is always showing zero because it is the last one and
end_timing does not increment the counter of the previous entry.
We can fix it by calling accumulate_time again instead of
end_timing. That way the counter gets incremented. The rest of the
arithmetic can be ignored because there are no timing points after
this and the accumulators are reset before the next round.
Christophe Leroy [Tue, 28 Jun 2022 14:48:59 +0000 (16:48 +0200)]
powerpc/64e: KASAN Full support for BOOK3E/64
We now have memory organised in a way that allows
implementing KASAN.
Unlike book3s/64, book3e always has translation active so the only
thing needed to use KASAN is to setup an early zero shadow mapping
just after setting a stack pointer and before calling early_setup().
The memory layout is now as follows
+------------------------+ Kernel virtual map end (0xc000200000000000)
| |
| 16TB of KASAN map |
| |
+------------------------+ Kernel KASAN shadow map start
| |
| 16TB of IO map |
| |
+------------------------+ Kernel IO map start
| |
| 16TB of vmemmap |
| |
+------------------------+ Kernel vmemmap start
| |
| 16TB of vmap |
| |
+------------------------+ Kernel virt start (0xc000100000000000)
| |
| 64TB of linear mem |
| |
+------------------------+ Kernel linear (0xc.....)
Christophe Leroy [Tue, 28 Jun 2022 14:48:57 +0000 (16:48 +0200)]
powerpc/64e: Move virtual memory closer to linear memory
Today nohash/64 have linear memory based at 0xc000000000000000 and
virtual memory based at 0x8000000000000000.
In order to implement KASAN, we need to regroup both areas.
Move virtual memmory at 0xc000100000000000.
This complicates a bit TLB miss handlers. Until now, memory region
was easily identified with the 4 higher bits of address:
- 0 ==> User
- c ==> Linear Memory
- 8 ==> Virtual Memory
Now we need to rely on the 20 higher bits, with:
- 0xxxx ==> User
- c0000 ==> Linear Memory
- c0001 ==> Virtual Memory
Christophe Leroy [Tue, 28 Jun 2022 14:48:55 +0000 (16:48 +0200)]
powerpc/64e: Remove MMU_FTR_USE_TLBRSRV and MMU_FTR_USE_PAIRED_MAS
Commit a28b266b4168 ("powerpc: Remove platforms/wsp and associated
pieces") removed the last CPU having features MMU_FTRS_A2 and
commit abe841db5849 ("powerpc: Clean up MMU_FTRS_A2 and
MMU_FTR_TYPE_3E") removed MMU_FTRS_A2 which was the last user of
MMU_FTR_USE_TLBRSRV and MMU_FTR_USE_PAIRED_MAS.
Remove all code that relies on MMU_FTR_USE_TLBRSRV and
MMU_FTR_USE_PAIRED_MAS.
With this change done, TLB miss can happen before the mmu feature
fixups.
Christophe Leroy [Tue, 28 Jun 2022 14:48:54 +0000 (16:48 +0200)]
powerpc/64e: Fix early TLB miss with KUAP
With KUAP, the TLB miss handler bails out when an access to user
memory is performed with a nul TID.
But the normal TLB miss routine which is only used early during boot
does the check regardless for all memory areas, not only user memory.
By chance there is no early IO or vmalloc access, but when KASAN
come we will start having early TLB misses.
Fix it by creating a special branch for user accesses similar to the
one in the 'bolted' TLB miss handlers. Unfortunately SPRN_MAS1 is
now read too early and there are no registers available to preserve
it so it will be read a second time.
Christophe Leroy [Tue, 28 Jun 2022 14:43:35 +0000 (16:43 +0200)]
powerpc/ptdump: Fix display of RW pages on FSL_BOOK3E
On FSL_BOOK3E, _PAGE_RW is defined with two bits, one for user and one
for supervisor. As soon as one of the two bits is set, the page has
to be display as RW. But the way it is implemented today requires both
bits to be set in order to display it as RW.
Instead of display RW when _PAGE_RW bits are set and R otherwise,
reverse the logic and display R when _PAGE_RW bits are all 0 and
RW otherwise.
This change has no impact on other platforms as _PAGE_RW is a single
bit on all of them.
Christophe Leroy [Tue, 14 Jun 2022 10:32:24 +0000 (12:32 +0200)]
powerpc/32: Remove 'noltlbs' kernel parameter
Mapping without large TLBs has no added value on the 8xx.
Mapping without large TLBs is still necessary on 40x when
selecting CONFIG_KFENCE or CONFIG_DEBUG_PAGEALLOC or
CONFIG_STRICT_KERNEL_RWX, but this is done automatically
and doesn't require user selection.
Remove 'noltlbs' kernel parameter, the user has no reason
to use it.
powerpc/irq: Perform stack_overflow detection after switching to IRQ stack
When KASAN is enabled, as shown by the Oops below, the 2k limit is not
enough to allow stack dump after a stack overflow detection when
CONFIG_DEBUG_STACKOVERFLOW is selected:
As the detection is asynchronously performed at IRQs, we could be long
after the limit has been crossed, so increasing the limit would not
solve the problem completely.
In order to be sure that there is enough stack space for the stack
dump, do it after the switch to the IRQ stack. That way it is sure
that the stack is large enough, unless the IRQ stack has been
overfilled in which case the end of life is close.
powerpc/irq: Increase stack_overflow detection limit when KASAN is enabled
When KASAN is enabled, as shown by the Oops below, the 2k limit is not
enough to allow stack dump after a stack overflow detection when
CONFIG_DEBUG_STACKOVERFLOW is selected:
An investigation shows that on PPC32, calling dump_stack() requires
more than 1k when KASAN is not selected and a bit more than 2k bytes
when KASAN is selected.
On PPC64 the registers are twice the size of PPC32 registers, so the
need should be approximately twice the need on PPC32.
In the meantime we have THREAD_SIZE which is twice larger on PPC64
than PPC32, and twice larger when KASAN is selected.
So we can easily use the value of THREAD_SIZE to set the limit.
On PPC32, THREAD_SIZE is 8k without KASAN and 16k with KASAN.
On PPC64, THREAD_SIZE is 16k without KASAN.
To be on the safe side, leave 2k on PPC32 without KASAN, 4k with
KASAN, and 4k on PPC64 without KASAN. It means the limit should be
one fourth of THREAD_SIZE.
Kajol Jain [Fri, 10 Jun 2022 13:41:13 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add test for hardware cache events
The testcase checks if the transalation of a generic hardware cache
event is done properly via perf interface. The hardware cache events has
type as PERF_TYPE_HW_CACHE and each event points to raw event code id.
Testcase checks different combination of cache level, cache event
operation type and cache event result type and verify for a given event
code, whether transalation matches with the current cache event mappings
via perf interface.
Kajol Jain [Fri, 10 Jun 2022 13:41:12 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for MMCRA thresh_sel field
Thresh select bits in the event code is used to program thresh_sel field
in Monitor Mode Control Register A (MMCRA: 45-47). When scheduling
events as a group, all events in that group should match value in these
bits. Otherwise event open for the sibling events will fail.
Testcase uses event code PM_MRK_INST_CMPL (0x401e0) as leader and
another event PM_THRESH_MET (0x101ec) as sibling event, and checks if
group constraint checks for thresh_sel field added correctly via perf
interface.
Kajol Jain [Fri, 10 Jun 2022 13:41:11 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for MMCRA thresh_ctl field
Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). When scheduling
events as a group, all events in that group should match value in these
bits. Otherwise event open for the sibling events will fail.
Testcase uses event code PM_MRK_INST_CMPL (0x401e0) as leader and
another event PM_THRESH_MET (101ec) as sibling event, and checks if
group constraint checks for thresh_ctl field added correctly via perf
interface.
Kajol Jain [Fri, 10 Jun 2022 13:41:10 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint for unit and pmc field in p9
Unit and pmu bits in the event code is used to program unit and pmc
fields in Monitor Mode Control Register 1 (MMCR1). For power9 platform,
incase unit field value is within 6 to 9, one of the event in the group
should use PMC4. Otherwise event_open should fail for that group.
Kajol Jain [Fri, 10 Jun 2022 13:41:09 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for MMCRA thresh_cmp field
Thresh compare bits for a event is used to program thresh compare field
in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9 and
MMCRA: 8-18 bits for power10). When scheduling events as a group, all
events in that group should match value in thresh compare bits.
Otherwise event open for the sibling events will fail.
Testcase uses event code "0x401e0" as leader and another event "0x101ec"
as sibling event, and checks for thresh compare constraint via perf
interface.
Kajol Jain [Fri, 10 Jun 2022 13:41:08 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for MMCR1 cache bits
Data and instruction cache qualifier bits in the event code is used to
program cache select field in Monitor Mode Control Register 1 (MMCR1:
16-17). When scheduling events as a group, all events in that group
should match value in these bits. Otherwise event open for the sibling
events will fail.
Testcase uses event code "0x1100fc" as leader and other events like
"0x23e054" and "0x13e054" as sibling events to checks for l1 cache
select field constraints via perf interface.
Kajol Jain [Fri, 10 Jun 2022 13:41:07 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for MMCR0 l2l3_sel bits
In power10, L2L3 select bits in the event code is used to program
l2l3_sel field in Monitor Mode Control Register 0 (MMCR0: 56-60). When
scheduling events as a group, all events in that group should match
value in these bits. Otherwise event open for the sibling events will
fail.
Testcase uses event code "0x010000046080" as leader and another events
"0x26880" and "0x010000026880" as sibling events, and checks for
l2l3_sel constraints via perf interface for ISA v3.1 platform.
Athira Rajeev [Fri, 10 Jun 2022 13:41:06 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for PERF_TYPE_HARDWARE events valid check
Testcase to ensure that using invalid event in generic event for
PERF_TYPE_HARDWARE will fail. Invalid generic events in power10 are:
- PERF_COUNT_HW_BUS_CYCLES
- PERF_COUNT_HW_STALLED_CYCLES_FRONTEND
- PERF_COUNT_HW_STALLED_CYCLES_BACKEND
- PERF_COUNT_HW_REF_CPU_CYCLES
Invalid generic events in power9 are:
- PERF_COUNT_HW_BUS_CYCLES
- PERF_COUNT_HW_REF_CPU_CYCLES
Testcase does event open for valid and invalid generic events to ensure
event open works for all valid events and fails for invalid events.
Athira Rajeev [Fri, 10 Jun 2022 13:41:05 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for event alternatives for power10
Platform specific PMU supports alternative event for some of the event
codes. During perf_event_open, it any event group doesn't match
constraint check criteria, further lookup is done to find alternative
event. Code checks to see if it is possible to schedule event as group
using alternative events.
Testcase exercises the alternative event find code for power10. Example,
Using PMC1 to PMC4 in a group and again trying to schedule
PM_CYC_ALT (0x0001e) will fail since this exceeds number of programmable
events in group. But since 0x600f4 is an alternative event for 0x0001e,
it is possible to use 0x0001e in the group. Testcase uses such
combination all events in power10 which has alternative event.
Athira Rajeev [Fri, 10 Jun 2022 13:41:04 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for event alternatives for power9
Platform specific PMU supports alternative event for some of the event
codes. During perf_event_open, it any event group doesn't match
constraint check criteria, further lookup is done to find alternative
event. Code checks to see if it is possible to schedule event as group
using alternative events.
Testcase exercises the alternative event find code for power9. Example,
since events in same PMC can't go in as a group, ideally using
PM_RUN_CYC_ALT (0x200f4) and PM_BR_TAKEN_CMPL (0x200fa) will fail. But
since RUN_CYC (0x600f4) is alternative event for 0x200f4, it is possible
to use 0x600f4 and 0x200fa as group. Testcase uses such combination for
all events in power9 which has an alternative event.
Athira Rajeev [Fri, 10 Jun 2022 13:41:03 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for blacklist events check in power9
Some of the events are blacklisted in power9. The list of blacklisted
events are noted in power9-events-list.h When trying to do event open
for any of these blacklisted event will cause a failure. Testcase
ensures that using blacklisted events will cause event_open to fail in
power9. This test is only applicable on power9 DD2.1 and DD2.2 and hence
test adds checks to skip on other platforms.
Athira Rajeev [Fri, 10 Jun 2022 13:41:02 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for reserved bit check for MMCRA thresh_ctl field
Testcase for reserved bits in Monitor Mode Control Register A (MMCRA)
thresh_ctl bits. For MMCRA[48:51]/[52:55]) Threshold Start/Stop, 0b11110000/0b00001111 is reserved.
Athira Rajeev [Fri, 10 Jun 2022 13:41:01 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for checking invalid bits in event code
Some of the bits in the event code is reserved for specific platforms.
Event code bits 52-59 are reserved in power9, whereas in power10, these
are used for programming Monitor Mode Control Register 3 (MMCR3). Bit 9
in event code is reserved in power9, whereas it is used for programming
"radix_scope_qual" bit 18 in Monitor Mode Control Register 1 (MMCR1).
Testcase to ensure that using reserved bits in event code should cause
event_open to fail.
Athira Rajeev [Fri, 10 Jun 2022 13:41:00 +0000 (19:11 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check MMCRA sample bits
Events with different "sample" field values which is used to program
Monitor Mode Control Register A (MMCRA) in a group will fail to
schedule. Testcase uses event with load only sampling mode as group
leader and event with store only sampling as sibling event. So that it
can check that using different sample bits in event code will fail in
event open for group of events
Athira Rajeev [Fri, 10 Jun 2022 13:40:59 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint for MMCRA Sampling Mode field
Testcase for reserved bits in Monitor Mode Control Register A (MMCRA)
Random Sampling Mode (SM) value. As per Instruction Set
Architecture (ISA), the values 0x5, 0x9, 0xD, 0x19, 0x1D, 0x1A, 0x1E are
reserved for sampling mode field. Test that having these reserved bit
values should cause event_open to fail. Input event code in testcases
uses these sampling bits along with 401e0 (PM_MRK_INST_CMPL).
Athira Rajeev [Fri, 10 Jun 2022 13:40:58 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for radix_scope_qual field
Testcase for group constraint check for radix_scope_qual field which is
used to program Monitor Mode Control Register (MMCR1) bit 18. All events
in the group should match radix_scope_qual bit, otherwise event_open for
the group should fail. Testcase uses "0x14242" (PM_DATA_RADIX_PROCESS_L2_PTE_FROM_L2)
with radix_scope_qual bit set for power10.
Athira Rajeev [Fri, 10 Jun 2022 13:40:57 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check when using same PMC
Testcase for group constraint check when using events with same PMC.
Multiple events in a group asking for same PMC should fail. Testcase
uses "0x22C040" on PMC2 as leader and also subling which is expected to
fail. Using PMC1 for sibling event should pass the test.
Athira Rajeev [Fri, 10 Jun 2022 13:40:56 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest to check constraint for number of counters in use.
Testcase for group constraint check for number of counters in use. The
number of programmable counters is from PMC1 to PMC4. Testcase uses four
events with PMC1 to PMC4 and 5th event without any PMC which is expected
to fail since it is exceeding the number of counters in use.
Athira Rajeev [Fri, 10 Jun 2022 13:40:55 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest to check PMC5/6 is excluded from some constraint checks
Events using Performance Monitor Counter 5 (PMC5) and Performance
Monitor Counter 6 (PMC6) should be excluded from constraint check when
scheduled along with group of events. Example, combination of PMC5,
PMC6, and an event with cache bit will succeed to schedule though first
two events doesn't have cache bit set. Testcase use three events, ie,
600f4(cycles), 500fa(instructions), 22C040 with cache bit (dc_ic) set to
test this constraint check.
Athira Rajeev [Fri, 10 Jun 2022 13:40:54 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest for group constraint check for PMC5 and PMC6
Events using Performance Monitor Counter 5 (PMC5) and Performance
Monitor Counter 6 (PMC6) can't have other fields in event code like
cache bits, thresholding or marked bit. PMC5 and PMC6 only supports base
events: ie 500fa and 600f4. Other combinations should fail. Testcase
tries setting other bits in event code for 500fa and 600f4 to check this
scenario.
Athira Rajeev [Fri, 10 Jun 2022 13:40:53 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add support for perf event code tests
Add new folder for enabling perf event code tests which includes
checking for group constraints, valid/invalid events, also checks for
event excludes, alternatives so on. A new folder "event_code_tests", is
created under "selftests/powerpc/pmu".
Also updates the corresponding Makefiles in "selftests/powerpc" and
"event_code_tests" folder.
Kajol Jain [Fri, 10 Jun 2022 13:40:52 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add interface test for bhrb disable field for non-branch samples
The testcase uses "instructions" event to generate the samples and fetch
Monitor Mode Control Register A (MMCRA) when overflow. Branch History
Rolling Buffer(bhrb) disable bit is part of MMCRA which need to be
verified by perf interface. Incase sample is not of branch type, bhrb
disable bit is explicitly set to 1. Testcase checks if the bhrb disable
bit is set of MMCRA register via perf interface for ISA v3.1 platform
Athira Rajeev [Fri, 10 Jun 2022 13:40:51 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest for mmcr1 pmcxsel/unit/cache fields
The testcase uses event code "0x21c040" to verify the settings for
different fields in Monitor Mode Control Register 1 (MMCR1). The fields
include PMCxSEL, PMCXCOMB PMCxUNIT, cache. Checks if these fields are
translated correctly via perf interface to MMCR1
Athira Rajeev [Fri, 10 Jun 2022 13:40:50 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest for checking valid and invalid bhrb filter maps
For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type,
ie branch filters are supported. All the branch filters are not
supported in powerpc. Example, power10 platform supports any, ind_call
and cond branch filters. Whereas, it is different in power9. Testcase
checks event open for invalid and valid branch sample types. The branch
types for testcase are picked from "perf_branch_sample_type" in
perf_event.h
Athira Rajeev [Fri, 10 Jun 2022 13:40:49 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest to check PERF_SAMPLE_REGS_INTR option will not crash on any platforms
With sampling, --intr-regs option is used for capturing
interrupt regs. When --intr-regs option is used, PMU code
uses is_sier_available() function which uses PMU flags in
the code. In environment where platform specific PMU is
not registered, PMU flags is not defined. A fix was added
in kernel to address crash while accessing is_sier_available()
function when pmu is not set. commit 6d2cbbc46623 ("powerpc/perf:
Fix crash with is_sier_available when pmu is not set").
Add perf sampling test to exercise this code and make sure
enabling intr_regs shouldn't crash in any platform. Testcase
uses software event cycles since software event will work even
in cases without PMU.
Athira Rajeev [Fri, 10 Jun 2022 13:40:48 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add selftest to check branch stack enablement will not crash on any platforms
While enabling branch stack for an event, BHRB (Branch History
Rolling Buffer) filter is set using bhrb_filter_map() callback.
This callback is not defined for cases like generic_compat_pmu
or in case where there is no PMU registered. A fix was added
in kernel to address a crash issue observed while enabling branch
stack for environments which doesn't have this callback.
commit f65453e6f40d ("powerpc/perf: Fix crashes with
generic_compat_pmu & BHRB").
Add perf sampling test to exercise this code path and make
sure enabling branch stack shouldn't crash in any platform.
Testcase uses software event cycles since software event is
available and can be used even in cases without PMU.
Athira Rajeev [Fri, 10 Jun 2022 13:40:47 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Refactor the platform check and add macros to find array size/PVR
The platform check for selftest support "check_pvr_for_sampling_tests"
is specific to sampling tests which includes PVR check, presence of
PMU and extended regs support. Extended regs support is needed for
sampling tests which tests whether PMU registers are programmed
correctly. There could be other sampling tests which may not need
extended regs, example, bhrb filter tests which only needs validity
check via event open.
Hence refactor the platform check to have a common function
"platform_check_for_tests" that checks only for PVR check
and presence of PMU. The existing function
"check_pvr_for_sampling_tests" will invoke the common function
and also will include checks for extended regs specific for
sampling. The common function can also be used by tests other
than sampling like event code tests.
Add macro to find array size ("ARRAY_SIZE") to sampling
tests "misc.h" file. This can be used in next tests to
find event array size. Also update "include/reg.h" to
add macros to find minor and major version from PVR which
will be used in testcases.
Kajol Jain [Fri, 10 Jun 2022 13:40:46 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add interface test for bhrb disable field
The testcase uses "instructions" event to generate the
samples and fetch Monitor Mode Control Register A (MMCRA)
when overflow. Branch History Rolling Buffer(bhrb) disable bit
is part of MMCRA which need to be verified by perf interface.
Testcase checks if the bhrb disable bit of MMCRA register is
programmed correctly via perf interface for ISA v3.1 platform
Also make get_mmcra_ifm return type as u64.
Kajol Jain [Fri, 10 Jun 2022 13:40:45 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add interface test for mmcra_ifm field for conditional branch type
The testcase uses "instructions" event to check if the
Instruction filtering mode(IFM) bits are programmed correctly
for conditional branch type. Testcase checks if IFM bits is
programmed correctly to Monitor Mode Control Register A (MMCRA)
via perf interface for ISA v3.1 platform.
Kajol Jain [Fri, 10 Jun 2022 13:40:44 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add interface test for mmcra_ifm field for any branch type
The testcase uses "instructions" event to check if the
Instruction filtering mode(IFM) bits are programmed correctly
for type any branch. Testcase checks if IFM bits is
programmed correctly to Monitor Mode Control Register A (MMCRA)
via perf interface.
Kajol Jain [Fri, 10 Jun 2022 13:40:43 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add interface test for mmcra_ifm field of indirect call type
The testcase uses "instructions" event to check if the
Instruction filtering mode(IFM) bits are programmed correctly
for indirect branch type. Testcase checks if IFM bits are
programmed correctly to Monitor Mode Control Register A (MMCRA)
via perf interface for ISA v3.1 platform.
Kajol Jain [Fri, 10 Jun 2022 13:40:42 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add support for branch sampling in get_intr_regs function
Add support for sample type as PERF_SAMPLE_BRANCH_STACK in sampling
tests. This change is a precursor/helper for sampling testcases, that
test branck stack feature in perf interface.
Kajol Jain [Fri, 10 Jun 2022 13:40:41 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add interface test for mmcra_thresh_cmp fields
The testcase uses event code 0x35340401e0 for load only sampling, to
verify the settings of thresh compare field in Monitor Mode Control
Register A (MMCRA: 9-18 bits for power9 and MMCRA: 8-18 bits for
power10). Testcase checks if the thresh compare field is programmed
correctly via perf interface to MMCRA register.
Athira Rajeev [Fri, 10 Jun 2022 13:40:40 +0000 (19:10 +0530)]
selftests/powerpc: Add support to fetch "platform" and "base platform" from auxv to detect platform.
The /proc/self/auxv contains information about "platform" on any system.
Also "base platform" which is an indication about platform string
corresponding to the real PVR. When systems are booted in compat mode,
say, power10 booted in power9 mode, "platform" will point to power9
whereas base platform will point to power10. Incase, if the distro
doesn't support platform indicated by real PVR, base platform will have
a default value.
The mismatch of platform/base platform is an indication of system booted
in compat mode. In such cases, distro will have a Generic Compat
registered which supports basic features for performance monitoring.
Some of the selftest needs to be handled differently ( ex: generic
events, alternative events, bhrb filter map) in Generic Compat PMU.
Hence selftest framework needs utility functions to identify such cases.
One way is make sure of auxv information. Below condition can be used to
detect if Generic Compat PMU is registered. ie:
if ((AT_PLATFORM != AT_BASE_PLATFORM) && (AT_BASE_PLATFORM != PVR))
this indicates Generic Compat PMU.
Add utility function in "include/utils.h" to return:
AT_PLATFORM and AT_BASE_PLATFORM from auxv. Also update misc.c in
"sampling_tests" folder to add function to use above check to determine
presence of generic compat pmu.
In other architecture ( like x86 ), pmu_name is exposed via
"/sys/bus/event_source/devices/cpu/caps". The same could be used in
powerpc in future. Since currently we don't have the "caps" support in
powerpc, patch uses auxv information to detect platform type and compat
mode. But as placeholder utility function is added considering
possiblity of getting "caps" information via sysfs. If that doesn't
exist, fallback to using auxv information.
Kajol Jain [Fri, 10 Jun 2022 13:40:39 +0000 (19:10 +0530)]
selftests/powerpc/pmu: Add mask/shift bits for extracting threshold compare field
In power10, threshold compare field is not part of the raw event code
and provided via event attribute config1. Hence add the mask and shift
bits based on event attribute config1, to extract the threshold compare
value for power10
Also add a new function called get_thresh_cmp_val to compute and return
the threshold compare field for a given platform, since incase of
power10, threshold compare value provided is decimal.
Athira Rajeev [Sun, 22 May 2022 14:22:56 +0000 (19:52 +0530)]
powerpc/perf: Optimize clearing the pending PMI and remove WARN_ON for PMI check in power_pmu_disable
commit da9049cf0109 ("powerpc/perf: Fix PMU callbacks to clear
pending PMI before resetting an overflown PMC") added a new
function "pmi_irq_pending" in hw_irq.h. This function is to check
if there is a PMI marked as pending in Paca (PACA_IRQ_PMI).This is
used in power_pmu_disable in a WARN_ON. The intention here is to
provide a warning if there is PMI pending, but no counter is found
overflown.
During some of the perf runs, below warning is hit:
This means that there is no PMC overflown among the active events
in the PMU, but there is a PMU pending in Paca. The function
"any_pmc_overflown" checks the PMCs on active events in
cpuhw->n_events. Code snippet:
<<>>
if (any_pmc_overflown(cpuhw))
clear_pmi_irq_pending();
else
WARN_ON(pmi_irq_pending());
<<>>
Here the PMC overflown is not from active event. Example: When we do
perf record, default cycles and instructions will be running on PMC6
and PMC5 respectively. It could happen that overflowed event is currently
not active and pending PMI is for the inactive event. Debug logs from
trace_printk:
<<>>
any_pmc_overflown: idx is 5: pmc value is 0xd9a
power_pmu_disable: PMC1: 0x0, PMC2: 0x0, PMC3: 0x0, PMC4: 0x0, PMC5: 0xd9a, PMC6: 0x80002011
<<>>
Here active PMC (from idx) is PMC5 , but overflown PMC is PMC6(0x80002011).
When we handle PMI interrupt for such cases, if the PMC overflown is
from inactive event, it will be ignored. Reference commit:
commit 9651d67001ac ("powerpc/perf: Fix finding overflowed PMC in interrupt")
Patch addresses two changes:
1) Fix 1 : Removal of warning ( WARN_ON(pmi_irq_pending()); )
We were printing warning if no PMC is found overflown among active PMU
events, but PMI pending in PACA. But this could happen in cases where
PMC overflown is not in active PMC. An inactive event could have caused
the overflow. Hence the warning is not needed. To know pending PMI is
from an inactive event, we need to loop through all PMC's which will
cause more SPR reads via mfspr and increase in context switch. Also in
existing function: perf_event_interrupt, already we ignore PMI's
overflown when it is from an inactive PMC.
2) Fix 2: optimization in clearing pending PMI.
Currently we check for any active PMC overflown before clearing PMI
pending in Paca. This is causing additional SPR read also. From point 1,
we know that if PMI pending in Paca from inactive cases, that is going
to be ignored during replay. Hence if there is pending PMI in Paca, just
clear it irrespective of PMC overflown or not.
In summary, remove the any_pmc_overflown check entirely in
power_pmu_disable. ie If there is a pending PMI in Paca, clear it, since
we are in pmu_disable. There could be cases where PMI is pending because
of inactive PMC ( which later when replayed also will get ignored ), so
WARN_ON could give false warning. Hence removing it.
Fixes: da9049cf0109 ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220522142256.24699-1-atrajeev@linux.vnet.ibm.com
Michael Ellerman [Fri, 17 Jun 2022 08:02:43 +0000 (18:02 +1000)]
powerpc: Update asm-prototypes.h comment
This header was recently cleaned up in commit d7d9e972fe02 ("powerpc:
Move C prototypes out of asm-prototypes.h"), update the comment to
reflect it's proper purpose.
Michael Ellerman [Sun, 19 Jun 2022 23:31:03 +0000 (09:31 +1000)]
selftests/powerpc: Skip energy_scale_info test on older firmware
Older machines don't have the firmware feature that enables the code
this test is testing. Skip the test if the sysfs directory doesn't
exist. Also use the FAIL_IF() macro to provide more verbose error
reporting if an error is encountered.
The buggy address belongs to the object at c00000001d1d0118
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
8-byte region [c00000001d1d0118, c00000001d1d0120)
Memory state around the buggy address: c00000001d1d0000: fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc c00000001d1d0080: fc fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
>c00000001d1d0100: fc fc fc 02 fc fc fc fc fc fc fc fc fc fc fc fc
^ c00000001d1d0180: fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc fc c00000001d1d0200: fc fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc
This happens because the allocation uses the wrong unit (bits) when it
should pass (BITS_TO_LONGS(count) * sizeof(long)) or equivalent. With small
numbers of bits, the allocated object can be smaller than sizeof(long),
which results in invalid accesses.
Use bitmap_zalloc() to allocate and initialize the irq bitmap, paired with
bitmap_free() for consistency.
The platform's RNG must be available before random_init() in order to be
useful for initial seeding, which in turn means that it needs to be
called from setup_arch(), rather than from an init call.
Complicating things, however, is that POWER8 systems need some per-cpu
state and kmalloc, which isn't available at this stage. So we split
things up into an early phase and a later opportunistic phase. This
commit also removes some noisy log messages that don't add much.
Fixes: f017f5f493de ("powerpc: Implement arch_get_random_long/int() for powernv") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add of_node_put(), use pnv naming, minor change log editing] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220621140849.127227-1-Jason@zx2c4.com
Andy Shevchenko [Sat, 7 May 2022 10:01:46 +0000 (13:01 +0300)]
powerpc/52xx: Get rid of of_node assignment
Let GPIO library assign of_node from the parent device.
This allows to move GPIO library and drivers to use fwnode
APIs instead of being stuck with OF-only interfaces.
Andy Shevchenko [Sat, 7 May 2022 10:01:45 +0000 (13:01 +0300)]
powerpc/mpc5xxx: Switch mpc5xxx_get_bus_frequency() to use fwnode
Switch mpc5xxx_get_bus_frequency() to use fwnode in order to help
cleaning up other parts of the kernel from OF specific code.
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Chris Packham <chris.packham@alliedtelesis.co.nz> # for i2c-mpc Acked-by: Wolfram Sang <wsa@kernel.org> # for the I2C part Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for mscan/mpc5xxx_can Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220507100147.5802-2-andriy.shevchenko@linux.intel.com