powerpc/eeh: Remove unnecessary config_addr from eeh_dev
The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.
This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.
Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/eeh: Reduce to one the number of places where edev is allocated
arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
is allocated; other 2 places allocate it on stack and in the heap for
a very short period of time to use eeh_pe_get() as takes edev.
This changes eeh_pe_get() to receive required parameters explicitly.
This removes unnecessary temporary allocation of edev.
This uses the "pe_no" name instead of the "pe_config_addr" name as
it actually is a PE number and not a config space address as it seemed.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Arvind Yadav [Mon, 28 Aug 2017 06:05:15 +0000 (11:35 +0530)]
powerpc/512x: Constify clk_div_tables
clk_div_tables are not supposed to change at runtime.
mpc512x_clk_divtable function working with const clk_div_table. So
mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dan Carpenter [Fri, 25 Aug 2017 10:33:40 +0000 (13:33 +0300)]
powerpc/44x: Fix mask and shift to zero bug
My static checker complains that 0x00001800 >> 13 is zero. Looking at
the context, it seems like a copy and paste bug from the line below
and probably 0x3 << 13 or 0x00006000 was intended.
Fixes: 03a30973838b ("[POWERPC] 4xx: Add 405GPr and 405EP support in boot wrapper") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dan Carpenter [Wed, 28 Jun 2017 11:49:07 +0000 (14:49 +0300)]
powerpc/83xx: Use sizeof correct type when ioremapping
There is a cut and paste error here so we use sizeof(struct mpc83xx_pmc)
to remap the memory for "clock_regs". That sizeof() is 20 bytes and we
only need to remap 12 bytes. It presumably doesn't affect run time too
much...
I changed them to both use sizeof(*variable_name) because that's the
preferred kernel style these days.
Fixes: e642c9f57caf ("powerpc/mpc83xx: Power Management support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[mpe: It will map at least one page anyway, but still a good cleanup] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 19 Jul 2017 06:59:12 +0000 (16:59 +1000)]
powerpc: Machine check interrupt is a non-maskable interrupt
Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 19 Jul 2017 06:59:11 +0000 (16:59 +1000)]
powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.
There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.
As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:
Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
[ 127.426651616,0] OPAL: Reboot requested due to Platform error.
Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
opal: Reboot type 1 not supported
Kernel panic - not syncing: PowerNV Unrecovered Machine Check
CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35
Call Trace:
[ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
buggy/mising code in OPAL for this BMC
Rebooting in 10 seconds..
Trying to free IRQ 496 from IRQ context!
After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):
Nicholas Piggin [Wed, 19 Jul 2017 06:59:10 +0000 (16:59 +1000)]
powerpc/powernv: Flush console before platform error reboot
Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.
Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.
The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.
Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 5 Jul 2017 03:56:27 +0000 (13:56 +1000)]
powerpc: Do not send system reset request through the oops path
A system reset is a request to crash / debug the system rather than
necessarily caused by encountering a BUG. So there is no need to
serialize all CPUs behind the die lock, adding taints to all
subsequent traces beyond the first, breaking console locks, etc.
The system reset is NMI context which has its own printk buffers to
prevent output being interleaved. Then it's better to have all
secondaries print out their debug as quickly as possible and the
primary will flush out all printk buffers during panic().
So remove the 0x100 path from die, and move it into system_reset. Name
the crash/dump reasons "System Reset".
This gives "not tained" traces when crashing an untainted kernel. It
also gives the panic reason as "System Reset" as opposed to "Fatal
exception in interrupt" (or "die oops" for fadump).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 5 Jul 2017 03:56:26 +0000 (13:56 +1000)]
powerpc/pseries/le: Work around a firmware quirk
Some PowerVM firmware when delivering a system reset interrupt to a
little endian OS will mess up SRR registers. They are byteswapped, and
SRR1 is incorrect. An example from a crash:
It's possible to detect this pattern in SRR1 (that would never happen
in normal operation), and at least fix the NIP. After this patch, the
same interrupt reports NIP properly:
Nicholas Piggin [Wed, 5 Jul 2017 03:56:25 +0000 (13:56 +1000)]
powerpc: Do not call ppc_md.panic in fadump panic notifier
If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.
Currently, system reset injection causes the guest to silently stop.
Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.
Remove ppc_md.panic. Move fadump panic notifier into fadump code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This fixes a couple more bits of fallout from the new hard lockup watchdog
patch.
It restores the required hw_nmi_get_sample_period() function for the
perf watchdog, and removes some function declarations on 64e that are only
defined for 64s. This fixes the 64e build when the hardlockup detector is
enabled.
It restores the default behaviour of disabling the perf watchdog, and also
fixes disabling the 64s watchdog when running as a guest.
Fixes: f5a97cad75 ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 25 Aug 2017 04:30:35 +0000 (14:30 +1000)]
powerpc/64s: idle POWER9 can execute stop in virtual mode
The hardware can execute stop in any context, and KVM does not
require real mode because siblings do not share MMU state. This
saves a switch to real-mode when going idle.
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 29 Aug 2017 11:40:35 +0000 (21:40 +1000)]
powerpc/64s: Drop no longer used IDLE_STATE_ENTER_SEQ
There are no longer any callers of IDLE_STATE_ENTER_SEQ, all callers
use IDLE_STATE_ENTER_SEQ_NORET. So drop the former.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch, write change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 29 Aug 2017 11:34:40 +0000 (21:34 +1000)]
powerpc/64s: POWER9 can execute stop without a sync sequence
We don't need to use IDLE_STATE_ENTER_SEQ_NORET on Power9.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 29 Aug 2017 11:36:35 +0000 (21:36 +1000)]
powerpc/64s: Move IDLE_STATE_ENTER_SEQ[_NORET] into idle_book3s.S
This macro is only used in idle_book3s.S, move it in there and add a
more descriptive comment.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch and write change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 25 Aug 2017 04:30:33 +0000 (14:30 +1000)]
KVM: PPC: Book3S HV: POWER9 does not require secondary thread management
POWER9 CPUs have independent MMU contexts per thread, so KVM does not
need to quiesce secondary threads, so the hwthread_req/hwthread_state
protocol does not have to be used. So patch it away on POWER9, and patch
away the branch from the Linux idle wakeup to kvm_start_guest that is
never used.
Add a warning and error out of kvmppc_grab_hwthread in case it is ever
called on POWER9.
This avoids a hwsync in the idle wakeup path on POWER9.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Use WARN(...) instead of WARN_ON()/pr_err(...)] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:38:03 +0000 (15:38 +1000)]
powerpc/configs/6xx: Switch CONFIG_USB_EHCI_FSL to =m
In commit 78e134fb66b5 ("drivers:usb:fsl:Make fsl ehci drv an
independent driver module"), CONFIG_USB_EHCI_FSL was switched from
built-in to modular. Update the defconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:38:02 +0000 (15:38 +1000)]
powerpc/configs/6xx: Drop no longer needed CONFIG_BT_HCIUART_H4
Since commit 457ec384b02a ("Bluetooth: bpa10x: Use h4_recv_buf helper
for frame reassembly") we no longer need to set CONFIG_BT_HCIUART_H4
in our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:38:01 +0000 (15:38 +1000)]
powerpc/configs/6xx: Drop no longer needed CONFIG_NETFILTER_XT_MATCH_SOCKET
Since commit 67d000b30e13 ("netfilter: move socket lookup
infrastructure to nf_socket_ipv{4,6}.c") we no longer need to set
CONFIG_NETFILTER_XT_MATCH_SOCKET in our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit 2bb7f19bd4cf ("cpufreq: stats: Make the stats code
non-modular"), the CPU_FREQ_STAT code was made non-modular. Our
defconfig still said =m though, which meant we no longer got the
code at all. Switch the defconfig to =y.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:59 +0000 (15:37 +1000)]
powerpc/configs/6xx: Drop no longer needed CONFIG_NF_CONNTRACK_PROC_COMPAT
Since commit f3055fda3412 ("netfilter: remove ip_conntrack* sysctl
compat code") we no longer need to set CONFIG_NF_CONNTRACK_PROC_COMPAT
in our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:55 +0000 (15:37 +1000)]
powerpc/configs/6xx: Turn CONFIG_DRM_RADEON back on
In commit 9e3a74f2b512 ("drm: hide legacy drivers with CONFIG_DRM_LEGACY")
CONFIG_DRM_RADEON was moved behind CONFIG_DRM_LEGACY meaning it
stopped being enabled by ppc6xx_defconfig. Although no one has
noticed, given this is basically a legacy platform, it seems anyone
who is using it probably still wants this driver. So turn it back on
for now.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:52 +0000 (15:37 +1000)]
powerpc/configs: Update for CONFIG_INPUT_MOUSEDEV=n
In commit 428e9da8ebc9 ("Input: mousedev - stop offering PS/2 to
userspace by default") the symbol INPUT_MOUSEDEV went from being
'default y' to 'default n' (implied).
That means we no longer need to explicitly disable it in our
defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:50 +0000 (15:37 +1000)]
powerpc/configs: Turn CONFIG_R128 back in pmac32_defconfig
In commit 9e3a74f2b512 ("drm: hide legacy drivers with
CONFIG_DRM_LEGACY") CONFIG_R128 was moved behind CONFIG_DRM_LEGACY
meaning it stopped being enabled by pmac32_defconfig. Although no one
has noticed, given this is basically a legacy platform, it seems
anyone who is using it probably still wants this driver. So turn it
back on for now.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:45 +0000 (15:37 +1000)]
powerpc/configs: Add CONFIG_RAS now required for CONFIG_EDAC
In commit b87e32d5643a ("EDAC: Remove EDAC_MM_EDAC") CONFIG_EDAC grew
a dependency on CONFIG_RAS. Some of our defconfigs don't have the
latter, which means we lose CONFIG_EDAC, so add CONFIG_RAS to fix
that.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:44 +0000 (15:37 +1000)]
powerpc/configs: Drop no longer needed CONFIG_AUDITSYSCALL
Since commit 1c020e2c2c9d ("audit: always enable syscall auditing when
supported and audit is enabled") we no longer need to set
CONFIG_AUDITSYSCALL in our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:43 +0000 (15:37 +1000)]
powerpc/configs: Drop CONFIG_SERIAL_TXX9_* from cell/ppc64
In commit 0f977b272333 ("powerpc: Remove the celleb support") we
dropped the celleb support, which made these symbols unselectable
because we no longer select HAS_TX99_SERIAL. So drop them from the
defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:41 +0000 (15:37 +1000)]
powerpc/configs: Drop unnecessary CONFIG_POWERNV_OP_PANEL
In commit 4b634438fece ("powerpc/powernv: Add driver for operator
panel on FSP machines") we added CONFIG_POWERNV_OP_PANEL=m to the
powernv defconfig, but it's default m so that's no necessary.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:40 +0000 (15:37 +1000)]
powerpc/configs: Drop no longer needed PCI_MSI on powernv
In commit a8c5e285a7f6 ("powerpc/powernv: Make PCI non-optional") we
made PCI (and therefore PCI_MSI) non-optional on powernv, so it
doesn't need to be in the defconfig anymore.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:39 +0000 (15:37 +1000)]
powerpc/configs: Drop no longer needed CONFIG_SMP for pseries/ppc64/powernv
In commit 28511e4b4f13 ("powerpc/powernv: Always enable SMP when
building powernv") and 5d3643f6142e ("powerpc/pseries: Always enable
SMP when building pseries") we forced CONFIG_SMP on for some configs.
Therefore we don't need to set it in those configs anymore.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:37 +0000 (15:37 +1000)]
powerpc/configs: Drop unnecessary CONFIG_NUMA_BALANCING_DEFAULT_ENABLED
In commit f90c908dd96f ("powerpc: Enable NUMA balancing in
pseries[_le]_defconfig") we added CONFIG_NUMA_BALANCING_DEFAULT_ENABLED
to our defconfigs. But it's already enabled by default, so drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:36 +0000 (15:37 +1000)]
powerpc/configs: Drop no longer needed CONFIG_DEVPTS_MULTIPLE_INSTANCES
Since commit b8bc9698d2f4 ("devpts: Make each mount of devpts an
independent filesystem.") we no longer need to set
CONFIG_DEVPTS_MULTIPLE_INSTANCES in our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:29 +0000 (15:37 +1000)]
powerpc/configs: Drop no longer needed CONFIG_CRYPTO_DEV_VMX_ENCRYPT
Since commit 0258ccf4849f ("crypto: vmx - Convert to CPU feature based
module autoloading") we no longer need to set
CONFIG_CRYPTO_DEV_VMX_ENCRYPT in our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:28 +0000 (15:37 +1000)]
powerpc/configs: Update for CONFIG_NF_CT_PROTO_(SCTP|UDPLITE)=y
In commit 17d931ceeac1 ("netfilter: conntrack: built-in support for
SCTP"), NF_CT_PROTO_SCTP switched from tristate to bool and became
default y. Similarly in commit f2974481187c ("netfilter: conntrack:
built-in support for UDPlite"), NF_CT_PROTO_UDPLITE switched from
tristate to bool and became default y.
We had a few configs which set them to =m, which is no longer valid.
We don't need to change them to =y because both symbols are default y
and are enabled automatically based on the other symbols in the
affected defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:26 +0000 (15:37 +1000)]
powerpc/configs: Update for CONFIG_DEBUG_FS being selected via CONFIG_RCU_TRACE
In commit 8698132872e7 ("rcu: Enable RCU tracepoints by default to aid
in debugging"), CONFIG_RCU_TRACE was made default y (if CONFIG_TREE_RCU=y,
which it is for some of our configs).
That in turn causes CONFIG_TREE_RCU_TRACE to be enabled, which selects
CONFIG_DEBUG_FS. The end result is that CONFIG_DEBUG_FS is forced on,
meaning we don't have to enable it in some of our configs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 05:37:23 +0000 (15:37 +1000)]
powerpc/configs: Explicitly drop CONFIG_INPUT_MOUSEDEV
In commit 428e9da8ebc9 ("Input: mousedev - stop offering PS/2 to userspace by
default") (Jan 2017), CONFIG_INPUT_MOUSEDEV was switched from default y to
default n, with the explanation:
Evdev interface has been available for many years and by now everyone
is switched to using it, so let's stop offering /dev/input/mouseN
and /dev/psaux by default.
We had a number of configs which had it enabled, but going by the above
explanation probably don't need it enabled anymore.
So drop the last remnants of it from our defconfigs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 13:56:21 +0000 (23:56 +1000)]
powerpc/oops: Print the kernel's endian in the oops
Although the MSR tells you what endian you're in it's possible that
isn't the same endian the kernel was built for, and if that happens
you're usually having a very bad day. So print a marker to make
it 100% clear which endian the kernel was built for.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 13:56:20 +0000 (23:56 +1000)]
powerpc/oops: Fix the oops markers to use pr_cont()
When we oops we print a few markers for significant config options
such as PREEMPT, SMP etc. Currently these appear on separate lines
because we're not using pr_cont() properly. Fix it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rashmica Gupta [Thu, 1 Jun 2017 05:34:40 +0000 (15:34 +1000)]
Add documentation for the powerpc memtrace debugfs files
CONFIG_PPC_MEMTRACE must be set to use this feature. This can only be
used on powernv platforms.
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
[mpe: Update dates and kernel versions, mention size is in bytes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rashmica Gupta [Thu, 1 Jun 2017 05:34:38 +0000 (15:34 +1000)]
powerpc/powernv: Enable removal of memory for in memory tracing
The hardware trace macro feature requires access to a chunk of real
memory. This patch provides a debugfs interface to do this. By
writing an integer containing the size of memory to be unplugged into
/sys/kernel/debug/powerpc/memtrace/enable, the code will attempt to
remove that much memory from the end of each NUMA node.
This patch also adds additional debugsfs files for each node that
allows the tracer to interact with the removed memory, as well as
a trace file that allows userspace to read the generated trace.
Note that this patch does not invoke the hardware trace macro, it
only allows memory to be removed during runtime for the trace macro
to utilise.
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
[mpe: Minor formatting etc fixups] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This helper is used to detect if a uprobe'd function has returned
through a setjmp/longjmp, rather than branching to the LR that was
updated previously by us. This fixes a SIGSEGV that gets generated when
programs use setjmp/longjmp with uretprobes.
We use the arm64 model (arch/arm64/kernel/probes/uprobes.c:
arch_uretprobe_is_alive()) for detecting when stack frames have been
removed from under us.
Reference:
https://marc.info/?l=linux-kernel&m=143748610330073
commit 8e02fca0c9575 ("uprobes/x86: Reimplement arch_uretprobe_is_alive()")
commit 2763f8c19412c ("uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more
clever")
Tested with the test program from:
https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/bz5274.c;hb=HEAD
And this script:
$ cat test.sh
#!/bin/bash
perf probe -x ./bz5274 -a bz5274_main_return=main%return
perf probe -x ./bz5274 -a bz5274_funca_return=funca%return
perf probe -x ./bz5274 -a bz5274_funcb_return=funcb%return
perf probe -x ./bz5274 -a bz5274_funcc_return=funcc%return
perf probe -x ./bz5274 -a bz5274_funcd_return=funcd%return
perf record -e 'probe_bz5274:*' -aR ./bz5274
Reported-by: Gustavo Luiz Duarte <gduarte@redhat.com> Reported-by: zsun@redhat.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This happens because we're being called with our affinity mask set to
irq_default_affinity. That in turn was populated using
cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids
worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a
mask which has > nr_cpu_ids bits set.
Fix it by limiting the value returned by cpumask_weight().
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Add change log details on actual cause] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 11 Aug 2017 16:39:07 +0000 (02:39 +1000)]
powerpc/64: Optimise set/clear of CTRL[RUN] (runlatch)
On modern CPUs the CTRL register is read-only except bit 63 which is
the run latch control. This means it can be updated with a mtspr
rather than mfspr/mtspr.
To accomodate older CPUs (Cell at least), where there are other bits
in the register, we still do a read/modify/write on pre 2.06 CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Update change log to mention 2.06 workaround] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 11 Aug 2017 16:39:04 +0000 (02:39 +1000)]
powerpc/64s: Use the HV handler for external IRQ replay in HV mode on POWER9
POWER9 host external interrupts use the h_virt_irq_common handler, so
use that to replay them rather than using the hardware_interrupt_common
handler. Both call do_IRQ, but using the correct handler reduces
i-cache footprint.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 11 Aug 2017 16:39:03 +0000 (02:39 +1000)]
powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replay
This results in smaller code, and fewer branches. This relies on the
fact that both the 0xe80 and 0xa00 handlers call the same upper level
code, namely doorbell_exception().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 11 Aug 2017 16:39:02 +0000 (02:39 +1000)]
powerpc/64: Cleanup __check_irq_replay()
Move the clearing of irq_happened bits into the condition where they
were found to be set. This reduces instruction count slightly, and
reduces stores into irq_happened.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 11 Aug 2017 16:39:01 +0000 (02:39 +1000)]
powerpc/64s: masked_interrupt() returns to kernel so avoid restoring r13
Places in the kernel where r13 is not the PACA pointer must have
maskable interrupts disabled, so r13 does not have to be restored when
returning from a soft-masked interrupt. We should never have
interrupts soft disabled when we're in user space.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's too big to be inline, there is no reason to keep it
that way.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework to incorporate the comment changes via fixes branch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm: Optimize detection of thread local mm's
Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 22 Aug 2017 01:51:37 +0000 (11:51 +1000)]
powerpc/64s: Fix replay interrupt return label name
In __replay_interrupt() we take the address of a local label so we can
return to it later. However the assembler turns the local label into a
symbol with a name like ".L1^B42" - where "^B" is literally "\002".
This does not make for pleasant stack traces. Fix it by giving the
label a sensible name.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rob Herring [Mon, 21 Aug 2017 15:16:49 +0000 (10:16 -0500)]
powerpc: pseries: remove dlpar_attach_node dependency on full path
In preparation to stop storing the full node path in full_name, remove the
dependency on full_name from dlpar_attach_node(). Callers of
dlpar_attach_node() already have the parent device_node, so just pass the
parent node into dlpar_attach_node instead of the path. This avoids doing
a lookup of the parent node by the path.
Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rob Herring [Mon, 21 Aug 2017 15:16:47 +0000 (10:16 -0500)]
powerpc: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anatolij Gustschin <agust@denx.de> Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 22 Aug 2017 05:14:50 +0000 (15:14 +1000)]
powerpc/vio: Use device_type to detect family
Currently in the vio.c code we use a comparision against the parent
device node's full path to decide if the device is a PFO or VIO family
device.
Both the ibm,platform-facilities and vdevice nodes are defined by PAPR,
and must have a matching device_type. So instead of using the path we
can instead compare the device_type.
I've checked Qemu and kvmtool both do this correctly, and all the
PowerVM systems I have access to do also. So it seems to be safe.
This removes the dependency on full_name, which is being removed
upstream.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 23 Aug 2017 12:20:10 +0000 (22:20 +1000)]
Merge branch 'fixes' into next
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
Arvind Yadav [Mon, 19 Jun 2017 05:44:25 +0000 (11:14 +0530)]
macintosh/rack-meter: Make of_device_ids const
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
407 576 0 983 3d7 drivers/macintosh/rack-meter.o
File size after constify rackmeter_match.
text data bss dec hex filename
807 176 0 983 3d7 drivers/macintosh/rack-meter.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.
Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.
The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.
Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comments in the code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>