Kan Liang [Thu, 11 Jan 2018 19:15:43 +0000 (11:15 -0800)]
perf/x86/rapl: Fix Haswell and Broadwell server RAPL event
Perf-fuzzer triggers non-existent MSR access in RAPL driver on
Haswell-EX.
Haswell/Broadwell server and client have differnt RAPL events.
Since 'commit 7f2236d0bf9a ("perf/x86/rapl: Use Intel family macros for
RAPL")', it accidentally assign RAPL client events to server.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-kernel@vger.kernel.org Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Fri, 12 Jan 2018 00:57:32 +0000 (16:57 -0800)]
Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
stable"
* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
rbd: set max_segments to USHRT_MAX
rbd: reacquire lock should update lock owner client id
Linus Torvalds [Fri, 12 Jan 2018 00:54:35 +0000 (16:54 -0800)]
Merge tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"Fix a raw vs elaborate GPIO descriptor bug introduced by yours truly"
* tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Ingo Molnar [Thu, 11 Jan 2018 05:53:06 +0000 (06:53 +0100)]
Merge tag 'perf-core-for-mingo-4.16-20180110' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- The 'perf test bpf' entry hooked a eBPF proggie to the
SyS_epoll_wait() kernel function and expected it to be hit when calling
the epoll_wait() libc wrapper, which changed recently, in systems such
as Fedora 27, with the glibc wrapper calling instead the epoll_pwait()
syscall, so switch to epoll_pwait() for both the kernel and libc
function, getting it to work both in old and new systems (Arnaldo Carvalho de Melo)
- Beautify 'gettid' syscall result in 'perf trace', and in doing so
noticed that we need to handle namespaces in 'perf trace', will be
dealt with in follow up patches where we'll try to figure out if
the recent support for namespace in tools/perf/ can be used for this
purpose as well. (Arnaldo Carvalho de Melo)
- Introduce 'perf report --mmaps' and 'perf report --tasks' to show
info present in 'perf.data' (Jiri Olsa, Arnaldo Carvalho de Melo)
- Fix a wrong offset issue when using /proc/kcore (Jin Yao)
- Fix bug that prevented annotating symbols in perf.data files
generated with 'perf record --branch-any' (Jin Yao)
- Add infrastructure to record first and last sample time to the
perf.data file header, so that when processing all samples in
a 'perf record' session, such as when doing build-id processing,
or when specifically requesting that that info be recorded, use
that in 'perf report --time', that also got support for percent
slices in addition to absolute ones.
I.e. now it is possible to ask for the samples in the 10%-20%
time slice of a perf.data file (Jin Yao)
- Enable building with libbabeltrace by default (Jiri Olsa)
- Display perf_event_attr::namespaces when duping the attributes
in verbose mode (Jiri Olsa)
- Allocate context task_ctx_data for child event (Jiri Olsa)
- Update comments for PERF_RECORD_ITRACE_START and PERF_RECORD_MISC_* (Jiri Olsa)
- Add support for showing PERF_RECORD_LOST events in 'perf script' (Jiri Olsa)
- Add 'perf report --stats' option to display quick statistics about
metadata events (PERF_RECORD_*) i.e. what we get at the end of 'perf
report -D' (Jiri Olsa)
- Fix compile error with libunwind x86 (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
1) BPF speculation prevention and BPF_JIT_ALWAYS_ON, from Alexei
Starovoitov.
2) Revert dev_get_random_name() changes as adjust the error code
returns seen by userspace definitely breaks stuff.
3) Fix TX DMA map/unmap on older iwlwifi devices, from Emmanuel
Grumbach.
4) From wrong AF family when requesting sock diag modules, from Andrii
Vladyka.
5) Don't add new ipv6 routes attached to the null_entry, from Wei Wang.
6) Some SCTP sockopt length fixes from Marcelo Ricardo Leitner.
7) Don't leak when removing VLAN ID 0, from Cong Wang.
8) Hey there's a potential leak in ipv6_make_skb() too, from Eric
Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: sr: fix TLVs not being copied using setsockopt
ipv6: fix possible mem leaks in ipv6_make_skb()
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
mlxsw: pci: Wait after reset before accessing HW
nfp: always unmask aux interrupts at init
8021q: fix a memory leak for VLAN 0 device
of_mdio: avoid MDIO bus removal when a PHY is missing
caif_usb: use strlcpy() instead of strncpy()
doc: clarification about setting SO_ZEROCOPY
net: gianfar_ptp: move set_fipers() to spinlock protecting area
sctp: make use of pre-calculated len
sctp: add a ceiling to optlen in some sockopts
sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events
bpf: introduce BPF_JIT_ALWAYS_ON config
bpf: avoid false sharing of map refcount with max_entries
ipv6: remove null_entry before adding default route
SolutionEngine771x: add Ether TSU resource
SolutionEngine771x: fix Ether platform data
docs-rst: networking: wire up msg_zerocopy
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
...
Al Viro [Wed, 10 Jan 2018 23:47:05 +0000 (18:47 -0500)]
Fix a leak in socket(2) when we fail to allocate a file descriptor.
Got broken by "make sock_alloc_file() do sock_release() on failures" -
cleanup after sock_map_fd() failure got pulled all the way into
sock_alloc_file(), but it used to serve the case when sock_map_fd()
failed *before* getting to sock_alloc_file() as well, and that got
lost. Trivial to fix, fortunately.
Fixes: 8e1611e23579 (make sock_alloc_file() do sock_release() on failures) Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mathieu Xhonneux [Wed, 10 Jan 2018 13:35:49 +0000 (13:35 +0000)]
ipv6: sr: fix TLVs not being copied using setsockopt
Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
to a socket through setsockopt, but the current implementation doesn't
copy possible TLVs at the end of the SRH received from userspace.
Therefore, the execution of the following branch if (sr_has_hmac(sr_phdr))
{ ... } will never complete since the len and type fields of a possible
HMAC TLV are not copied, hence seg6_get_tlv_hmac will return an error,
and the HMAC will not be computed.
This commit adds a memcpy in case TLVs have been appended to the SRH.
Fixes: a149e7c7ce81 ("ipv6: sr: add support for SRH injection through setsockopt") Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 10 Jan 2018 11:45:49 +0000 (03:45 -0800)]
ipv6: fix possible mem leaks in ipv6_make_skb()
ip6_setup_cork() might return an error, while memory allocations have
been done and must be rolled back.
Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Reported-by: Mike Maloney <maloney@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 10 Jan 2018 10:42:44 +0000 (11:42 +0100)]
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.
Fixes: 96f17e0776c2 ("mlxsw: spectrum: Support RED qdisc offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 10 Jan 2018 10:42:43 +0000 (11:42 +0100)]
mlxsw: pci: Wait after reset before accessing HW
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.
Wait for 100ms before starting the polling.
Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 10 Jan 2018 02:14:28 +0000 (18:14 -0800)]
nfp: always unmask aux interrupts at init
The link state and exception interrupts may be masked when we probe.
The firmware should in theory prevent sending (and automasking) those
interrupts if the device is disabled, but if my reading of the FW code
is correct there are firmwares out there with race conditions in this
area. The interrupt may also be masked if previous driver which used
the device was malfunctioning and we didn't load the FW (there is no
other good way to comprehensively reset the PF).
Note that FW unmasks the data interrupts by itself when vNIC is
enabled, such helpful operation is not performed for LSC/EXN interrupts.
Always unmask the auxiliary interrupts after request_irq(). On the
remove path add missing PCI write flush before free_irq().
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 9 Jan 2018 21:40:41 +0000 (13:40 -0800)]
8021q: fix a memory leak for VLAN 0 device
A vlan device with vid 0 is allow to creat by not able to be fully
cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.
Also, VLAN 0 is probably not a valid number and it is kinda
"reserved" for HW accelerating devices, but it is probably too
late to reject it from creation even if makes sense. Instead,
just remove the check in unregister_vlan_dev().
Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 9 Jan 2018 12:43:34 +0000 (14:43 +0200)]
of_mdio: avoid MDIO bus removal when a PHY is missing
If one of the child devices is missing the of_mdiobus_register_phy()
call will return -ENODEV. When a missing device is encountered the
registration of the remaining PHYs is stopped and the MDIO bus will
fail to register. Propagate all errors except ENODEV to avoid it.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiongfeng Wang [Tue, 9 Jan 2018 11:58:18 +0000 (19:58 +0800)]
caif_usb: use strlcpy() instead of strncpy()
gcc-8 reports
net/caif/caif_usb.c: In function 'cfusbl_device_notify':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kornilios Kourtis <kou@zurich.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Tue, 9 Jan 2018 03:02:33 +0000 (11:02 +0800)]
net: gianfar_ptp: move set_fipers() to spinlock protecting area
set_fipers() calling should be protected by spinlock in
case that any interrupt breaks related registers setting
and the function we expect. This patch is to move set_fipers()
to spinlock protecting area in ptp_gianfar_adjtime().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Jan 2018 19:53:23 +0000 (14:53 -0500)]
Merge branch 'sctp-Some-sockopt-optlen-fixes'
Marcelo Ricardo Leitner says:
====================
sctp: Some sockopt optlen fixes
Hangbin Liu reported that some SCTP sockopt are allowing the user to get
the kernel to allocate really large buffers by not having a ceiling on
optlen.
This patchset address this issue (in patch 2), replace an GFP_ATOMIC
that isn't needed and avoid calculating the option size multiple times
in some setsockopt.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Some sockopt handling functions were calculating the length of the
buffer to be written to userspace and then calculating it again when
actually writing the buffer, which could lead to some write not using
an up-to-date length.
This patch updates such places to just make use of the len variable.
Also, replace some sizeof(type) to sizeof(var).
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu reported that some sockopt calls could cause the kernel to log
a warning on memory allocation failure if the user supplied a large optlen
value. That is because some of them called memdup_user() without a ceiling
on optlen, allowing it to try to allocate really large buffers.
This patch adds a ceiling by limiting optlen to the maximum allowed that
would still make sense for these sockopt.
Reported-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Prevent out-of-bounds speculation in BPF maps by masking the
index after bounds checks in order to fix spectre v1, and
add an option BPF_JIT_ALWAYS_ON into Kconfig that allows for
removing the BPF interpreter from the kernel in favor of
JIT-only mode to make spectre v2 harder, from Alexei.
2) Remove false sharing of map refcount with max_entries which
was used in spectre v1, from Daniel.
3) Add a missing NULL psock check in sockmap in order to fix
a race, from John.
4) Fix test_align BPF selftest case since a recent change in
verifier rejects the bit-wise arithmetic on pointers
earlier but test_align update was missing, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
None of those changes have an effect on tooling, so do a plain copy.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-qqzcs8ri3vks8cypg0puk0ae@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Similar to --tasks, producing the same output plus /proc/<PID>/maps
similar lines for each mmap record present in a perf.data file.
Please note that not all mmaps are stored, for instance, some of the
non-executable mmaps are only stored when 'perf record --data' is used,
when the user wants to resolve data accesses in addition to asking for
executable mmaps to get the DSO with symtabs.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-zulwdlg5rfowogr1qznorvvc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 7 Jan 2018 16:03:56 +0000 (17:03 +0100)]
perf report: Add --tasks option to display monitored tasks
Add --tasks option to display monitored tasks stored in perf.data.
Displaying pid/tid/ppid plus the command string aligned to distinguish
parent and child tasks.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-13-jolsa@kernel.org
[ Make it --tasks, plural, --task works as well, as its unambiguous ]
[ Use machine__find_thread(), not findnew(), as pointed out by Namhyung ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-kyg4gz2yy0vkrrh2vtq29u71@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 7 Jan 2018 16:03:55 +0000 (17:03 +0100)]
perf report: Add --stats option to display quick data statistics
Add --stats option to display quick data statistics of event numbers,
without any further processing, like the one at the end of the perf
report -D command.
I found this useful when hunting lost events for another change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-12-jolsa@kernel.org
[ Rename it to --stats, plural ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 7 Jan 2018 16:03:54 +0000 (17:03 +0100)]
perf tools: Make the tool's warning messages optional
I want to display the pure events status coming in the next patch and
the tool's warnings are superfluous in the output. Making it optional,
enabled by default.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-11-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-10-jolsa@kernel.org
[ Use PRIu64 when printing u64 values, fixing the build in some arches ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Since commit f11a04464ae57e8d ("i2c: gpio: Enable working over slow
can_sleep GPIOs"), probing the i2c RTC connected to an i2c-gpio bus on
r8a7740/armadillo fails with:
rtc-s35390a 0-0030: error resetting chip
rtc-s35390a: probe of 0-0030 failed with error -5
More debug code reveals:
i2c i2c-0: master_xfer[0] R, addr=0x30, len=1
i2c i2c-0: NAK from device addr 0x30 msg #0
s35390a_get_reg: ret = -6
Commit 02e479808b5d62f8 ("gpio: Alter semantics of *raw* operations to
actually be raw") moved open drain/source handling from
gpiod_set_raw_value_commit() to gpiod_set_value(), but forgot to take
into account that gpiod_set_value_cansleep() also needs this handling.
The i2c protocol mandates that i2c signals are open drain, hence i2c
communication fails.
Fix this by adding the missing handling to gpiod_set_value_cansleep(),
using a new common helper gpiod_set_value_nocheck().
Fixes: 02e479808b5d62f8 ("gpio: Alter semantics of *raw* operations to actually be raw") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[removed underscore syntax, added kerneldoc] Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Tue, 9 Jan 2018 23:45:06 +0000 (15:45 -0800)]
Merge tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains what I hope are the last RISC-V changes to go into 4.15.
I know it's a bit last minute, but I think they're all fairly small
changes:
- SR_* constants have been renamed to match the latest ISA
specification.
- Some CONFIG_MMU #ifdef cruft has been removed. We've never
supported !CONFIG_MMU.
- __NR_riscv_flush_icache is now visible to userspace. We were hoping
to avoid making this public in order to force userspace to call the
vDSO entry, but it looks like QEMU's user-mode emulation doesn't
want to emulate a vDSO. In order to allow glibc to fall back to a
system call when the vDSO entry doesn't exist we're just
- Our defconfig is no long empty. This is another one that just
slipped through the cracks. The defconfig isn't perfect, but it's
at least close to what users will want for the first RISC-V
development board. Getting closer is kind of splitting hairs here:
none of the RISC-V specific drivers are in yet, so it's not like
things will boot out of the box.
The only one that's strictly necessary is the __NR_riscv_flush_icache
change, as I want that to be part of the public API starting from our
first kernel so nobody has to worry about it. The others are nice to
haves, but they seem sane for 4.15 to me"
* tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
riscv: rename SR_* constants to match the spec
riscv: remove CONFIG_MMU ifdefs
RISC-V: Make __NR_riscv_flush_icache visible to userspace
RISC-V: Add a basic defconfig
Linus Torvalds [Tue, 9 Jan 2018 23:43:13 +0000 (15:43 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.15.
- Maciej Rozycki found another series of FP issues which requires a
seven part series to restructure and fix.
- James fixes a warning about .set mt which gas doesn't like when
building for R1 processors"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
MIPS: Factor out NT_PRFPREG regset access helpers
MIPS: CPS: Fix r1 .set mt assembler warning
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Linus Torvalds [Tue, 9 Jan 2018 19:20:55 +0000 (11:20 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes that should go into this release. This contains:
- An NVMe pull request from Christoph, with a few critical fixes for
NVMe.
- A block drain queue fix from Ming.
- The concurrent lo_open/release fix for loop"
* 'for-linus' of git://git.kernel.dk/linux-block:
loop: fix concurrent lo_open/lo_release
block: drain queue before waiting for q_usage_counter becoming zero
nvme-fcloop: avoid possible uninitialized variable warning
nvme-mpath: fix last path removal during traffic
nvme-rdma: fix concurrent reset and reconnect
nvme: fix sector units when going between formats
nvme-pci: move use_sgl initialization to nvme_init_iod()
Daniel Borkmann [Tue, 9 Jan 2018 12:17:44 +0000 (13:17 +0100)]
bpf: avoid false sharing of map refcount with max_entries
In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 121, holes: 1, sum holes: 7 */
};
Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.
Quoting from Google's Project Zero blog [1]:
Additionally, at least on the Intel machine on which this was
tested, bouncing modified cache lines between cores is slow,
apparently because the MESI protocol is used for cache coherence
[8]. Changing the reference counter of an eBPF array on one
physical CPU core causes the cache line containing the reference
counter to be bounced over to that CPU core, making reads of the
reference counter on all other CPU cores slow until the changed
reference counter has been written back to memory. Because the
length and the reference counter of an eBPF array are stored in
the same cache line, this also means that changing the reference
counter on one physical CPU core causes reads of the eBPF array's
length to be slow on other physical CPU cores (intentional false
sharing).
While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.
Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.
Wei Wang [Mon, 8 Jan 2018 18:34:00 +0000 (10:34 -0800)]
ipv6: remove null_entry before adding default route
In the current code, when creating a new fib6 table, tb6_root.leaf gets
initialized to net->ipv6.ip6_null_entry.
If a default route is being added with rt->rt6i_metric = 0xffffffff,
fib6_add() will add this route after net->ipv6.ip6_null_entry. As
null_entry is shared, it could cause problem.
In order to fix it, set fn->leaf to NULL before calling
fib6_add_rt2node() when trying to add the first default route.
And reset fn->leaf to null_entry when adding fails or when deleting the
last default route.
syzkaller reported the following issue which is fixed by this commit:
Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 6 Jan 2018 18:53:27 +0000 (21:53 +0300)]
SolutionEngine771x: add Ether TSU resource
After the Ether platform data is fixed, the driver probe() method would
still fail since the 'struct sh_eth_cpu_data' corresponding to SH771x
indicates the presence of TSU but the memory resource for it is absent.
Add the missing TSU resource to both Ether devices and fix the harmless
off-by-one error in the main memory resources, while at it...
Fixes: 4986b996882d ("net: sh_eth: remove the SH_TSU_ADDR") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 6 Jan 2018 18:53:26 +0000 (21:53 +0300)]
SolutionEngine771x: fix Ether platform data
The 'sh_eth' driver's probe() method would fail on the SolutionEngine7710
board and crash on SolutionEngine7712 board as the platform code is
hopelessly behind the driver's platform data -- it passes the PHY address
instead of 'struct sh_eth_plat_data *'; pass the latter to the driver in
order to fix the bug...
Fixes: 71557a37adb5 ("[netdrvr] sh_eth: Add SH7619 support") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolai Stange [Mon, 8 Jan 2018 14:54:44 +0000 (15:54 +0100)]
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
Commit 8f659a03a0ba ("net: ipv4: fix for a race condition in
raw_sendmsg") fixed the issue of possibly inconsistent ->hdrincl handling
due to concurrent updates by reading this bit-field member into a local
variable and using the thus stabilized value in subsequent tests.
However, aforementioned commit also adds the (correct) comment that
/* hdrincl should be READ_ONCE(inet->hdrincl)
* but READ_ONCE() doesn't work with bit fields
*/
because as it stands, the compiler is free to shortcut or even eliminate
the local variable at its will.
Note that I have not seen anything like this happening in reality and thus,
the concern is a theoretical one.
However, in order to be on the safe side, emulate a READ_ONCE() on the
bit-field by doing it on the local 'hdrincl' variable itself:
int hdrincl = inet->hdrincl;
hdrincl = READ_ONCE(hdrincl);
This breaks the chain in the sense that the compiler is not allowed
to replace subsequent reads from hdrincl with reloads from inet->hdrincl.
Fixes: 8f659a03a0ba ("net: ipv4: fix for a race condition in raw_sendmsg") Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiongfeng Wang [Mon, 8 Jan 2018 11:43:00 +0000 (19:43 +0800)]
net: caif: use strlcpy() instead of strncpy()
gcc-8 reports
net/caif/caif_dev.c: In function 'caif_enroll_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
net/caif/cfctrl.c: In function 'cfctrl_linkup_request':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
net/caif/cfcnfg.c: In function 'caif_connect_client':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Dryomov [Thu, 21 Dec 2017 14:35:11 +0000 (15:35 +0100)]
rbd: set max_segments to USHRT_MAX
Commit d3834fefcfe5 ("rbd: bump queue_max_segments") bumped
max_segments (unsigned short) to max_hw_sectors (unsigned int).
max_hw_sectors is set to the number of 512-byte sectors in an object
and overflows unsigned short for 32M (largest possible) objects, making
the block layer resort to handing us single segment (i.e. single page
or even smaller) bios in that case.
1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and
Alexander Duyck.
2) Undo unintentional UAPI change in netfilter conntrack, from Florian
Westphal.
3) Revert a change to how error codes are returned from
dev_get_valid_name(), it broke some apps.
4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6
dual-stack. From Eli Cooper.
5) Fix missed PMTU updates in geneve, from Xin Long.
6) Cure double free in macvlan, from Gao Feng.
7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from
Mohamed Ghannam.
8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed
deferral of probe when the regulator is not ready yet).
9) Missing DMA mapping error checks in 3c59x, from Neil Horman.
10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli.
11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB()
isn't initialized. From Andrei Vagin.
12) Fix crashes in fib6_add(), from Wei Wang.
13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
sh_eth: fix TXALCR1 offsets
mdio-sun4i: Fix a memory leak
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
sctp: fix the handling of ICMP Frag Needed for too small MTUs
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
xen-netfront: enable device after manual module load
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
sh_eth: fix SH7757 GEther initialization
net: fec: free/restore resource in related probe error pathes
uapi/if_ether.h: prevent redefinition of struct ethhdr
ipv6: fix general protection fault in fib6_add()
RDS: null pointer dereference in rds_atomic_free_op
sh_eth: fix TSU resource handling
net: stmmac: enable EEE in MII, GMII or RGMII only
rtnetlink: give a user socket to get_target_net()
MAINTAINERS: Update my email address.
can: ems_usb: improve error reporting for error warning and error passive
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
can: gs_usb: fix return value of the "set_bittiming" callback
...
Linus Torvalds [Tue, 9 Jan 2018 00:17:31 +0000 (16:17 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
- One line fix to mlx4 error flow (same as mlx5 fix in last pull
request, just in the mlx4 driver)
- Fix a race condition in the IPoIB driver. This patch is larger than
just a one line fix, but resolves a race condition in a fairly
straight forward manner
- Fix a locking issue in the RDMA netlink code. This patch is also
larger than I would like for a late -rc. It has, however, had a week
to bake in the rdma tree prior to this pull request
- One line fix to fix granting remote machine access to memory that
they don't need and shouldn't have
- One line fix to correct the fact that our sgid/dgid pair is swapped
from what you would expect when receiving an incoming connection
request
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/srpt: Fix ACL lookup during login
IB/srpt: Disable RDMA access by the initiator
RDMA/netlink: Fix locking around __ib_get_device_by_index
IB/ipoib: Fix race condition in neigh creation
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.
v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sergei Shtylyov [Sat, 6 Jan 2018 21:26:47 +0000 (00:26 +0300)]
sh_eth: fix TXALCR1 offsets
The TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error. Luckily, the driver never uses this
register. :-)
Fixes: 4a55530f38e4 ("net: sh_eth: modify the definitions of register") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.
Fixes: 4bdcb1dd9feb ("net: Add MDIO bus driver for the Allwinner EMAC") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1463447 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:19:13 +0000 (14:19 -0500)]
Merge branch 'SCTP-PMTU-discovery-fixes'
Marcelo Ricardo Leitner says:
====================
SCTP PMTU discovery fixes
This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.
Please consider these to stable.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: fix the handling of ICMP Frag Needed for too small MTUs
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512
That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).
As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.
The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.
Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.
Changes from v1:
- do not disable PMTU discovery, in the light of commit 06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none
See-also: https://lkml.org/lkml/2017/12/22/811 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.
The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.
Changes from v2:
- updated stale comment, noticed by Xin Long
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eduardo Otubo [Fri, 5 Jan 2018 08:42:16 +0000 (09:42 +0100)]
xen-netfront: enable device after manual module load
When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.
The guest would face errors like:
[root@guest ~]# ethtool -i eth0
Cannot get driver information: No such device
[root@guest ~]# ifconfig eth0
eth0: error fetching interface information: Device not found
This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.
Signed-off-by: Eduardo Otubo <otubo@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:13:45 +0000 (14:13 -0500)]
Merge branch 'bnxt_en_fixes'
Michael Chan says:
====================
bnxt_en: 2 small bug fixes.
The first one fixes the TC Flower flow parameter passed to firmware. The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Duvvuru [Thu, 4 Jan 2018 23:46:55 +0000 (18:46 -0500)]
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Challa [Thu, 4 Jan 2018 23:46:54 +0000 (18:46 -0500)]
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.
Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 8 Jan 2018 19:13:08 +0000 (11:13 -0800)]
Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This contains fixes for the following two non-trivial issues:
- The task iterator got broken while adding thread mode support for
v4.14. It was less visible because it only triggers when both
cgroup1 and cgroup2 hierarchies are in use. The recent versions of
systemd uses cgroup2 for process management even when cgroup1 is
used for resource control exposing this issue.
- cpuset CPU hotplug path could deadlock when racing against exits.
There also are two patches to replace unlimited strcpy() usages with
strlcpy()"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
cgroup: Fix deadlock in cpu hotplug path
cgroup: use strlcpy() instead of strscpy() to avoid spurious warning
cgroup: avoid copying strings longer than the buffers
Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
issues to appear on some systems and they are difficult to reproduce,
because there is no guaranteed ordering between subsys_initcall()
calls, so they may occur in different orders on different systems.
In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache
creation delayed issue) exposed one of these issues where genl_init()
and acpi_wmi_init() are both called at the same initcall level, but
the former must run before the latter so as to avoid a NULL pointer
dereference.
For this reason, move the acpi_wmi_init() invocation to the
initcall_sync level which should still be early enough for things
to work correctly in the WMI land.
Link: https://marc.info/?t=151274596700002&r=1&w=2 Reported-by: Jonathan McDowell <noodles@earth.li> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Takashi Iwai [Mon, 8 Jan 2018 13:03:53 +0000 (14:03 +0100)]
ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
PCM OSS read/write loops keep taking the mutex lock for the whole
read/write, and this might take very long when the exceptionally high
amount of data is given. Also, since it invokes with mutex_lock(),
the concurrent read/write becomes unbreakable.
This patch tries to address these issues by replacing mutex_lock()
with mutex_lock_interruptible(), and also splits / re-takes the lock
at each read/write period chunk, so that it can switch the context
more finely if requested.
Jiri Olsa [Sun, 7 Jan 2018 16:03:52 +0000 (17:03 +0100)]
perf script: Add support to display sample misc field
Adding support to display sample misc field in form
of letter for each bit:
# perf script -F +misc ...
sched-messaging 1414 K 28690.636582: 4590 cycles ...
sched-messaging 1407 U 28690.636600: 325620 cycles ...
sched-messaging 1414 K 28690.636608: 19473 cycles ...
misc field __________/
The misc bits are assigned to following letters:
PERF_RECORD_MISC_KERNEL K
PERF_RECORD_MISC_USER U
PERF_RECORD_MISC_HYPERVISOR H
PERF_RECORD_MISC_GUEST_KERNEL G
PERF_RECORD_MISC_GUEST_USER g
PERF_RECORD_MISC_MMAP_DATA* M
PERF_RECORD_MISC_COMM_EXEC E
PERF_RECORD_MISC_SWITCH_OUT S
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 7 Jan 2018 16:03:51 +0000 (17:03 +0100)]
perf: Update PERF_RECORD_MISC_* comment for perf_event_header::misc bit 13
The perf_event_header::misc bit 13 is shared on different events and
next patch is adding yet another bit 13 user. Updating the comment to
make it more structured and clear which events use bit 13.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20180107160356.28203-8-jolsa@kernel.org
[ Update the tools/include/uapi/linux copy ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 7 Jan 2018 16:03:47 +0000 (17:03 +0100)]
perf: Allocate context task_ctx_data for child event
Currently we use perf_event_context::task_ctx_data to save and restore
the LBR status when the task is scheduled out and in.
We don't allocate it for child contexts, which results in shorter task's
LBR stack, because we don't save the history from previous run and start
over every time we schedule the task in.
I made a test to generate samples with LBR call stack and got higher
numbers on bigger chain depths:
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 4af57ef28c2c ("perf: Add pmu specific data for perf task context") Link: http://lkml.kernel.org/r/20180107160356.28203-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 7 Jan 2018 16:03:45 +0000 (17:03 +0100)]
perf tools: Enable LIBBABELTRACE by default
There's no reason anymore to treat babel trace in a special way, because
a) we no longer display its state b) the needed babeltrace library is
now out and well adopted among distros.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Fri, 8 Dec 2017 13:13:46 +0000 (21:13 +0800)]
perf script: Support time percent and multiple time ranges
perf script has a --time option to limit the time range of output. It
only supports absolute time.
Now this option is extended to support multiple time ranges and support
the percent of time.
For example:
1. Select the first and second 10% time slices:
perf script --time 10%/1,10%/2
2. Select from 0% to 10% and 30% to 40% slices:
perf script --time 0%-10%,30%-40%
Changelog:
v6: Fix the merge issue with latest perf/core branch.
No functional changes.
v5: Add checking of first/last sample time to detect if it's recorded
in perf.data. If it's not recorded, returns error message to user.
v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample
v3: Since the definitions of first_sample_time/last_sample_time
are moved from perf_session to perf_evlist so change the
related code.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-7-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Fri, 8 Dec 2017 13:13:45 +0000 (21:13 +0800)]
perf report: Support time percent and multiple time ranges
perf report has a --time option to limit the time range of output. It
only supports absolute time.
Now this option is extended to support multiple time ranges and support
the percent of time.
For example:
1. Select the first and second 10% time slices:
perf report --time 10%/1,10%/2
2. Select from 0% to 10% and 30% to 40% slices:
perf report --time 0%-10%,30%-40%
Changelog:
v6: Fix the merge issue with latest perf/core branch.
No functional changes.
v5: Add checking of first/last sample time to detect if it's recorded
in perf.data. If it's not recorded, returns error message to user.
v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample
v3: Since the definitions of first_sample_time/last_sample_time
are moved from perf_session to perf_evlist so change the
related code.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-6-git-send-email-yao.jin@linux.intel.com
[ Add missing colons at end of examples in the man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Fri, 8 Dec 2017 13:13:44 +0000 (21:13 +0800)]
perf tools: Create function to perform multiple time range checking
Previous patch supports the multiple time range.
For example, select the first and second 10% time slices.
perf report --time 10%/1,10%/2
We need a function to check if a timestamp is in the ranges of
[0, 10%) and [10%, 20%].
Note that it includes the last element in [10%, 20%] but it doesn't
include the last element in [0, 10%). It's to avoid the overlap.
This patch implments a new function perf_time__ranges_skip_sample
for this checking.
Change log:
v4: Let perf_time__ranges_skip_sample be compatible with
perf_time__skip_sample when only one time range.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-5-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Fri, 8 Dec 2017 13:13:43 +0000 (21:13 +0800)]
perf tools: Create function to parse time percent
Current perf report/script/... have a --time option to limit the time
range of output. But right now it only supports absolute time, add
support for time percentage.
For example:
1. Select the second 10% time slice
perf report --time 10%/2
2. Select from 0% to 10% time slice
perf report --time 0%-10%
It also support the multiple time ranges.
3. Select the first and second 10% time slices
perf report --time 10%/1,10%/2
4. Select from 0% to 10% and 30% to 40% slices
perf report --time 0%-10%,30%-40%
Changelog:
v4: An issue is found. Following passes.
perf script --time 10%/10x12321xsdfdasfdsafdsafdsa
Now it uses strtol to replace atoi.
Committer notes:
This just puts in place the infrastructure, so the examples in this cset
comment will only work later, after more patches in this series are
applied.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-4-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Fri, 8 Dec 2017 13:13:42 +0000 (21:13 +0800)]
perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().
Later, at post processing time, perf report/script will fetch the time
from perf file header.
Committer testing:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
[root@jouet home]# perf report --header | grep "time of "
# time of first sample : 22947.909226
# time of last sample : 22948.910704
#
# perf report -D | grep PERF_RECORD_SAMPLE\(
0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
<SNIP>
3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
#
Changelog:
v7: Just update the patch description according to Arnaldo's suggestion.
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.
While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.
v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.
At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.
v3: Remove the definitions of first_sample_time and last_sample_time
from struct record and directly save them in perf_evlist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Fri, 8 Dec 2017 13:13:41 +0000 (21:13 +0800)]
perf header: Add infrastructure to record first and last sample time
perf report/script/... have a --time option to limit the time range of
output. That's very useful to slice large traces, e.g. when processing
the output of perf script for some analysis.
But right now --time only supports absolute time. Also there is no fast
way to get the start/end times of a given trace except for looking at
it. This makes it hard to e.g. only decode the first half of the trace,
which is useful for parallelization of scripts
Another problem is that perf records are variable size and there is no
synchronization mechanism. So the only way to find the last sample
reliably would be to walk all samples. But we want to avoid that in perf
report/... because it is already quite expensive. That is why storing
the first sample time and last sample time in perf record is better.
This patch creates a new header feature type HEADER_SAMPLE_TIME and
related ops. Save the first sample time and the last sample time to the
feature section in perf file header. That will be done when, for
instance, processing build-ids, where we already have to process all
samples to create the build-id table, take advantage of that to further
amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf
report/script' faster when using --time.
Committer testing:
After this patch is applied the header is written with zeroes, we need
the next patch, for "perf record" to actually write the timestamps:
# perf report -D | grep PERF_RECORD_SAMPLE\( 22501155244406 0x44f0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21be8c5 period: 1 addr: 0
<SNIP> 22501155793625 0x4a30 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21ffd50 period: 2828043 addr: 0
# perf report --header | grep "time of "
# time of first sample : 0.000000
# time of last sample : 0.000000
#
Changelog:
v7: 1. Rebase to latest perf/core branch.
2. Add following clarification in patch description according to
Arnaldo's suggestion.
"That will be done when, for instance, processing build-ids,
where we already have to process all samples to create the
build-id table, take advantage of that to further amortize
that processing by storing HEADER_SAMPLE_TIME to make
'perf report/script' faster when using --time."
v4: Use perf script time style for timestamp printing. Also add with
the printing of sample duration.
v3: Remove the definitions of first_sample_time/last_sample_time from
perf_session. Just define them in perf_evlist
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-2-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Takashi Iwai [Mon, 8 Jan 2018 12:58:31 +0000 (13:58 +0100)]
ALSA: pcm: Abort properly at pending signal in OSS read/write loops
The loops for read and write in PCM OSS emulation have no proper check
of pending signals, and they keep processing even after user tries to
break. This results in a very long delay, often seen as RCU stall
when a huge unprocessed bytes remain queued. The bug could be easily
triggered by syzkaller.
As a simple workaround, this patch adds the proper check of pending
signals and aborts the loop appropriately.
Jin Yao [Tue, 26 Dec 2017 10:42:43 +0000 (18:42 +0800)]
perf report: Fix a no annotate browser displayed issue
When enabling '-b' option in perf record, for example,
perf record -b ...
perf report
and then browsing the annotate browser from perf report (press 'A'), it
would fail (annotate browser can't be displayed).
It's because the '.add_entry_cb' op of struct report is overwritten by
hist_iter__branch_callback() in builtin-report.c. But this function doesn't do
something like mapping symbols and sources. So next, do_annotate() will return
directly.
notes = symbol__annotation(act->ms.sym);
if (!notes->src)
return 0;
This patch adds the lost code to hist_iter__branch_callback (refer to
hist_iter__report_callback).
v2:
Fix a crash bug when perform 'perf report --stdio'.
The reason is that we init the symbol annotation only in browser mode, it
doesn't allocate/init resources for stdio mode.
So now in hist_iter__branch_callback(), it will return directly if it's not in
browser mode.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1514284963-18587-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The issue is the value which is passed to parameter 'addr' in
__get_srcline() is the objdump address. It's not correct if we calculate
the offset by using 'addr - sym->start'.
This patch creates a new parameter 'ip' in __get_srcline(). It is not
converted to objdump address.
Make sure you get any valid vmlinux out of the way, using '-v' on the
'perf report' case and deleting it from places where perf searches them,
like your kernel build dir and the build-id cache, in ~/.debug/.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1514564812-17344-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Wed, 6 Dec 2017 01:50:40 +0000 (01:50 +0000)]
perf tools: Fix compile error with libunwind x86
Fix a compile error:
...
CC util/libunwind/x86_32.o
In file included from util/libunwind/x86_32.c:33:0:
util/libunwind/../../arch/x86/util/unwind-libunwind.c: In function 'libunwind__x86_reg_id':
util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: error: 'EINVAL' undeclared (first use in this function)
return -EINVAL;
^
util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: note: each undeclared identifier is reported only once for each function it appears in
mv: cannot stat 'util/libunwind/.x86_32.o.tmp': No such file or directory
make[4]: *** [util/libunwind/x86_32.o] Error 1
make[3]: *** [util] Error 2
make[2]: *** [libperf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
It happens when libunwind-x86 feature is detected.
The 'perf test bpf' was hooking a eBPF program on the SyS_epoll_wait()
kernel function, that was what the epoll_wait() glibc function ended up
calling, but since at least glibc 2.26, the one that comes with, for
instance, Fedora 27, glibc ends up calling SyS_epoll_pwait() when
epoll_wait() is used, causing this 'perf test' entry to fail.
So switch to using epoll_pwait() and hook the eBPF program to the
SyS_epoll_pwait() kernel function to make it work on a wider range of
glibc and kernel versions.
Tested-by: Wang Nan <wangnan0@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-zynvquy63er8s5mrgsz65pto@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test bpf: Use designated struct field initializers
To follow standard practice in the kernel sources, documenting the
initialization better and helping quickly finding the value for some
field in a struct with many entries.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-syn3hz9hz7ukxlxbx5x6hv20@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-403npu7daupv6b2bmxliv5pk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Palmer Dabbelt [Wed, 27 Dec 2017 03:11:22 +0000 (19:11 -0800)]
RISC-V: Make __NR_riscv_flush_icache visible to userspace
We were hoping to avoid making this visible to userspace, but it looks
like we're going to have to because QEMU's user-mode emulation doesn't
want to emulate a vDSO. Having vDSO-only system calls was a bit
unothodox anyway, so I think in this case it's OK to just make the
actual system call number public.
This patch simply moves the definition of __NR_riscv_flush_icache
availiable to userspace, which results in the deletion of the now empty
vdso-syscalls.h.
Changes since v1:
* I've moved the definition into uapi/asm/syscalls.h rathen than
uapi/asm/unistd.h. This allows me to keep asm/unistd.h, so we can
keep the syscall table macros sane.
* As a side effect of the above, this no longer disables all system
calls on RISC-V. Whoops!
Karsten Merker [Thu, 4 Jan 2018 22:37:02 +0000 (23:37 +0100)]
RISC-V: Add a basic defconfig
This patch provides a basic defconfig for the RISC-V
architecture that enables enough kernel features to run a
basic Linux distribution on qemu's "virt" board for native
software development. Features include:
- serial console
- virtio block and network device support
- VFAT and ext2/3/4 filesystem support
- NFS client and NFS rootfs support
- an assortment of other kernel features required for
running systemd
It also enables a number of drivers for physical hardware
that target the "SiFive U500" SoC and the corresponding
development platform. These include:
- PCIe host controller support for the FPGA-based U500
development platform (PCIE_XILINX)
- USB host controller support (OHCI/EHCI/XHCI)
- USB HID (keyboard/mouse) support
- USB mass storage support (bulk and UAS)
- SATA support (AHCI)
- ethernet drivers (MACB for a SoC-internal MAC block, microsemi
ethernet phy, E1000E and R8169 for PCIe-connected external devices)
- DRM and framebuffer console support for PCIe-connected
Radeon graphics chips
Linus Torvalds [Sun, 7 Jan 2018 19:42:57 +0000 (11:42 -0800)]
Merge branch 'parisc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Many small fixes to show the real physical addresses of devices
instead of hashed addresses.
- One important fix to unbreak 32-bit SMP support: We forgot to 16-byte
align the spinlocks in the assembler code.
- Qemu support: The host will get a chance to sleep when the parisc
guest is idle. We use the same mechanism as the power architecture by
overlaying the "or %r10,%r10,%r10" instruction which is simply a nop
on real hardware.
* 'parisc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: qemu idle sleep support
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
parisc: Show unhashed EISA EEPROM address
parisc: Show unhashed HPA of Dino chip
parisc: Show initial kernel memory layout unhashed
parisc: Show unhashed hardware inventory
Linus Torvalds [Sun, 7 Jan 2018 19:33:12 +0000 (11:33 -0800)]
Merge tag 'apparmor-pr-2018-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor fix from John Johansen:
"This fixes a regression when the kernel feature set is reported as
supporting mount and policy is pinned to a feature set that does not
support mount mediation"
* tag 'apparmor-pr-2018-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix regression in mount mediation when feature set is pinned
Linus Torvalds [Sun, 7 Jan 2018 19:01:59 +0000 (11:01 -0800)]
Merge tag 'led_fixes_for_4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED fix from Jacek Anaszewski:
"The commit 2b83ff96f51d for 4.15-rc6, which was fixing LED brightness
setting after clearing delay_off broke the behavior on any alteration
of delay_on{off} properties, due to use of a LED core helper that does
too much for this particular case"
* tag 'led_fixes_for_4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: core: Fix regression caused by commit 2b83ff96f51d
leds: core: Fix regression caused by commit 2b83ff96f51d
Commit 2b83ff96f51d ("led: core: Fix brightness setting when setting delay_off=0")
replaced del_timer_sync(&led_cdev->blink_timer) with led_stop_software_blink()
in led_blink_set(), which additionally clears LED_BLINK_SW flag as well as
zeroes blink_delay_on and blink_delay_off properties of the struct led_classdev.
Cleansing of the latter ones wasn't required to fix the original issue but
wasn't considered harmful. It nonetheless turned out to be so in case when
pointer to one or both props is passed to led_blink_set() like in the
ledtrig-timer.c. In such cases zeroes are passed later in delay_on and/or
delay_off arguments to led_blink_setup(), which results either in stopping
the software blinking or setting blinking frequency always to 1Hz.
Avoid using led_stop_software_blink() and add a single call required
to clear LED_BLINK_SW flag, which was the only needed modification to
fix the original issue.
Fixes 2b83ff96f51d ("led: core: Fix brightness setting when setting delay_off=0") Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Linus Torvalds [Sun, 7 Jan 2018 01:13:21 +0000 (17:13 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
- untangle sys_close() abuses in xt_bpf
- deal with register_shrinker() failures in sget()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
sget(): handle failures of register_shrinker()
mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.
since commit 82abbf8d2fc4 the verifier rejects the bit-wise
arithmetic on pointers earlier.
The test 'dubious pointer arithmetic' now has less output to match on.
Adjust it.
Fixes: 82abbf8d2fc4 ("bpf: do not allow root to mangle valid pointers") Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>