Jens Axboe [Thu, 19 Dec 2019 19:06:02 +0000 (12:06 -0700)]
io_uring: improve poll completion performance
For busy IORING_OP_POLL_ADD workloads, we can have enough contention
on the completion lock that we fail the inline completion path quite
often as we fail the trylock on that lock. Add a list for deferred
completions that we can use in that case. This helps reduce the number
of async offloads we have to do, as if we get multiple completions in
a row, we'll piggy back on to the poll_llist instead of having to queue
our own offload.
Jens Axboe [Thu, 19 Dec 2019 00:12:20 +0000 (17:12 -0700)]
io_uring: split overflow state into SQ and CQ side
We currently check ->cq_overflow_list from both SQ and CQ context, which
causes some bouncing of that cache line. Add separate bits of state for
this instead, so that the SQ side can check using its own state, and
likewise for the CQ side.
This adds ->sq_check_overflow with the SQ state, and ->cq_check_overflow
with the CQ state. If we hit an overflow condition, both of these bits
are set. Likewise for overflow flush clear, we clear both bits. For the
fast path of just checking if there's an overflow condition on either
the SQ or CQ side, we can use our own private bit for this.
Jens Axboe [Wed, 18 Dec 2019 16:50:26 +0000 (09:50 -0700)]
io_uring: add lookup table for various opcode needs
We currently have various switch statements that check if an opcode needs
a file, mm, etc. These are hard to keep in sync as opcodes are added. Add
a struct io_op_def that holds all of this information, so we have just
one spot to update when opcodes are added.
This also enables us to NOT allocate req->io if a deferred command
doesn't need it, and corrects some mistakes we had in terms of what
commands need mm context.
Jens Axboe [Tue, 17 Dec 2019 15:04:44 +0000 (08:04 -0700)]
io_uring: add IOSQE_ASYNC
io_uring defaults to always doing inline submissions, if at all
possible. But for larger copies, even if the data is fully cached, that
can take a long time. Add an IOSQE_ASYNC flag that the application can
set on the SQE - if set, it'll ensure that we always go async for those
kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
get the concurrency we desire for this case.
Jens Axboe [Tue, 17 Dec 2019 15:46:33 +0000 (08:46 -0700)]
io-wq: support concurrent non-blocking work
io-wq assumes that work will complete fast (and not block), so it
doesn't create a new worker when work is enqueued, if we already have
at least one worker running. This is done on the assumption that if work
is running, then it will complete fast.
Add an option to force io-wq to fork a new worker for work queued. This
is signaled by setting IO_WQ_WORK_CONCURRENT on the work item. For that
case, io-wq will create a new worker, even though workers are already
running.
Jens Axboe [Mon, 9 Dec 2019 18:22:50 +0000 (11:22 -0700)]
io_uring: avoid ring quiesce for fixed file set unregister and update
We currently fully quiesce the ring before an unregister or update of
the fixed fileset. This is very expensive, and we can be a bit smarter
about this.
Add a percpu refcount for the file tables as a whole. Grab a percpu ref
when we use a registered file, and put it on completion. This is cheap
to do. Upon removal of a file from a set, switch the ref count to atomic
mode. When we hit zero ref on the completion side, then we know we can
drop the previously registered files. When the old files have been
dropped, switch the ref back to percpu mode for normal operation.
Since there's a period between doing the update and the kernel being
done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
same action. The application knows the update has completed when it gets
the CQE for it. Between doing the update and receiving this completion,
the application must continue to use the unregistered fd if submitting
IO on this particular file.
This takes the runtime of test/file-register from liburing from 14s to
about 0.7s.
Jens Axboe [Wed, 11 Dec 2019 21:02:38 +0000 (14:02 -0700)]
io_uring: add support for IORING_OP_CLOSE
This works just like close(2), unsurprisingly. We remove the file
descriptor and post the completion inline, then offload the actual
(potential) last file put to async context.
Mark the async part of this work as uncancellable, as we really must
guarantee that the latter part of the close is run.
Jens Axboe [Thu, 12 Dec 2019 02:29:43 +0000 (19:29 -0700)]
io-wq: add support for uncancellable work
Not all work can be cancelled, some of it we may need to guarantee
that it runs to completion. Allow the caller to set IO_WQ_WORK_NO_CANCEL
on work that must not be cancelled. Note that the caller work function
must also check for IO_WQ_WORK_NO_CANCEL on work that is marked
IO_WQ_WORK_CANCEL.
Jens Axboe [Wed, 11 Dec 2019 18:20:36 +0000 (11:20 -0700)]
io_uring: add support for IORING_OP_OPENAT
This works just like openat(2), except it can be performed async. For
the normal case of a non-blocking path lookup this will complete
inline. If we have to do IO to perform the open, it'll be done from
async context.
io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space. In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it. Also, align the field naturally and check that
no garbage is passed there.
Fixes: f0da3ad60191361a ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 19 Jan 2020 20:10:28 +0000 (12:10 -0800)]
Merge tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Three fixes for RISC-V:
- Don't free and reuse memory containing the code that CPUs parked at
boot reside in.
- Fix rv64 build problems for ubsan and some modules by adding
logical and arithmetic shift helpers for 128-bit values. These are
from libgcc and are similar to what's present for ARM64.
- Fix vDSO builds to clean up their own temporary files"
* tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Less inefficient gcc tishift helpers (and export their symbols)
riscv: delete temporary files
riscv: make sure the cores stay looping in .Lsecondary_park
1) Fix non-blocking connect() in x25, from Martin Schiller.
2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.
3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.
4) Limit size of TSO packets properly in lan78xx driver, from Eric
Dumazet.
5) r8152 probe needs an endpoint sanity check, from Johan Hovold.
6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
John Fastabend.
7) hns3 needs short frames padded on transmit, from Yunsheng Lin.
8) Fix netfilter ICMP header corruption, from Eyal Birger.
9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.
10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.
11) Fix memory leak in act_ctinfo, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
cxgb4: reject overlapped queues in TC-MQPRIO offload
cxgb4: fix Tx multi channel port rate limit
net: sched: act_ctinfo: fix memory leak
bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
bnxt_en: Fix ipv6 RFS filter matching logic.
bnxt_en: Fix NTUPLE firmware command failures.
net: systemport: Fixed queue mapping in internal ring map
net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
net: hns: fix soft lockup when there is not enough memory
net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
net/sched: act_ife: initalize ife->metalist earlier
netfilter: nat: fix ICMP header corruption on ICMP errors
net: wan: lapbether.c: Use built-in RCU list checking
netfilter: nf_tables: fix flowtable list del corruption
netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
netfilter: nft_tunnel: ERSPAN_VERSION must not be null
netfilter: nft_tunnel: fix null-attribute check
...
Rahul Lakkireddy [Fri, 17 Jan 2020 12:51:47 +0000 (18:21 +0530)]
cxgb4: reject overlapped queues in TC-MQPRIO offload
A queue can't belong to multiple traffic classes. So, reject
any such configuration that results in overlapped queues for a
traffic class.
Fixes: b8855f9e9469 ("cxgb4: parse and configure TC-MQPRIO offload") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Fri, 17 Jan 2020 12:53:55 +0000 (18:23 +0530)]
cxgb4: fix Tx multi channel port rate limit
T6 can support 2 egress traffic management channels per port to
double the total number of traffic classes that can be configured.
In this configuration, if the class belongs to the other channel,
then all the queues must be bound again explicitly to the new class,
for the rate limit parameters on the other channel to take effect.
So, always explicitly bind all queues to the port rate limit traffic
class, regardless of the traffic management channel that it belongs
to. Also, only bind queues to port rate limit traffic class, if all
the queues don't already belong to an existing different traffic
class.
Fixes: e60a32413c08 ("cxgb4: add TC-MATCHALL classifier egress offload") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Tue, 17 Dec 2019 04:06:31 +0000 (20:06 -0800)]
riscv: Less inefficient gcc tishift helpers (and export their symbols)
The existing __lshrti3 was really inefficient, and the other two helpers
are also needed to compile some modules.
Add the missing versions, and export all of the symbols like arm64
already does.
This code is based on the assembly generated by libgcc builds.
This fixes a build break triggered by ubsan:
riscv64-unknown-linux-gnu-ld: lib/ubsan.o: in function `.L2':
ubsan.c:(.text.unlikely+0x38): undefined reference to `__ashlti3'
riscv64-unknown-linux-gnu-ld: ubsan.c:(.text.unlikely+0x42): undefined reference to `__ashrti3'
Signed-off-by: Olof Johansson <olof@lixom.net>
[paul.walmsley@sifive.com: use SYM_FUNC_{START,END} instead of
ENTRY/ENDPROC; note libgcc origin] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Linus Torvalds [Sun, 19 Jan 2020 00:34:17 +0000 (16:34 -0800)]
Merge tag 'mtd/fixes-for-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"Raw NAND:
- GPMI: Fix the suspend/resume
SPI-NOR:
- Fix quad enable on Spansion like flashes
- Fix selection of 4-byte addressing opcodes on Spansion"
* tag 'mtd/fixes-for-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume
mtd: rawnand: gpmi: Fix suspend/resume problem
mtd: spi-nor: Fix quad enable for Spansion like flashes
mtd: spi-nor: Fix selection of 4-byte addressing opcodes on Spansion
Linus Torvalds [Sat, 18 Jan 2020 21:57:31 +0000 (13:57 -0800)]
Merge tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Back from LCA2020, fixes wasn't too busy last week, seems to have
quieten down appropriately, some amdgpu, i915, then a core mst fix and
one fix for virtio-gpu and one for rockchip:
core mst:
- serialize down messages and clear timeslots are on unplug
amdgpu:
- Update golden settings for renoir
- eDP fix
i915:
- uAPI fix: Remove dash and colon from PMU names to comply with
tools/perf
- Fix for include file that was indirectly included
- Two fixes to make sure VMA are marked active for error capture
virtio:
- maintain obj reservation lock when submitting cmds
rockchip:
- increase link rate var size to accommodate rates"
* tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Reorder detect_edp_sink_caps before link settings read.
drm/amdgpu: update goldensetting for renoir
drm/dp_mst: Have DP_Tx send one msg at a time
drm/dp_mst: clear time slots for ports invalid
drm/i915/pmu: Do not use colons or dashes in PMU names
drm/rockchip: fix integer type used for storing dp data rate
drm/i915/gt: Mark ring->vma as active while pinned
drm/i915/gt: Mark context->state vma as active while pinned
drm/i915/gt: Skip trying to unbind in restore_ggtt_mappings
drm/i915: Add missing include file <linux/math64.h>
drm/virtio: add missing virtio_gpu_array_lock_resv call
Linus Torvalds [Sat, 18 Jan 2020 21:02:12 +0000 (13:02 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- a resctrl fix for uninitialized objects found by debugobjects
- a resctrl memory leak fix
- fix the unintended re-enabling of the of SME and SEV CPU flags if
memory encryption was disabled at bootup via the MSR space"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
x86/resctrl: Fix potential memory leak
x86/resctrl: Fix an imbalance in domain_remove_cpu()
Linus Torvalds [Sat, 18 Jan 2020 21:00:59 +0000 (13:00 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Three fixes: fix link failure on Alpha, fix a Sparse warning and
annotate/robustify a lockless access in the NOHZ code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/sched: Annotate lockless access to last_jiffies_update
lib/vdso: Make __cvdso_clock_getres() static
time/posix-stubs: Provide compat itimer supoprt for alpha
Linus Torvalds [Sat, 18 Jan 2020 20:55:19 +0000 (12:55 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Tooling fixes, three Intel uncore driver fixes, plus an AUX events fix
uncovered by the perf fuzzer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Remove PCIe3 unit for SNR
perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events
perf/x86/intel/uncore: Add PCI ID of IMC for Xeon E3 V5 Family
perf: Correctly handle failed perf_get_aux_event()
perf hists: Fix variable name's inconsistency in hists__for_each() macro
perf map: Set kmap->kmaps backpointer for main kernel map chunks
perf report: Fix incorrectly added dimensions as switch perf data file
tools lib traceevent: Fix memory leakage in filter_event
Linus Torvalds [Sat, 18 Jan 2020 20:53:28 +0000 (12:53 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Three fixes:
- Fix an rwsem spin-on-owner crash, introduced in v5.4
- Fix a lockdep bug when running out of stack_trace entries,
introduced in v5.4
- Docbook fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
futex: Fix kernel-doc notation warning
locking/lockdep: Fix buffer overrun problem in stack_trace[]
Linus Torvalds [Sat, 18 Jan 2020 20:50:14 +0000 (12:50 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Three EFI fixes:
- Fix a slow-boot-scrolling regression but making sure we use WC for
EFI earlycon framebuffer mappings on x86
- Fix a mixed EFI mode boot crash
- Disable paging explicitly before entering startup_32() in mixed
mode bootup"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efistub: Disable paging at mixed mode entry
efi/libstub/random: Initialize pointer variables to zero for mixed mode
efi/earlycon: Fix write-combine mapping on x86
Linus Torvalds [Sat, 18 Jan 2020 20:29:13 +0000 (12:29 -0800)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
"Two rseq bugfixes:
- CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end
up corrupting the TLS of the parent. Technically a change in the
ABI but the previous behavior couldn't resonably have been relied
on by applications so this looks like a valid exception to the ABI
rule.
- Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the
handling of other flags. This is not thought to impact any
applications either"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Unregister rseq for clone CLONE_VM
rseq: Reject unknown flags on rseq unregister
Linus Torvalds [Sat, 18 Jan 2020 20:23:31 +0000 (12:23 -0800)]
Merge tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fixes from Christian Brauner:
"Here is an urgent fix for ptrace_may_access() permission checking.
Commit 00a5e4c05288 ("ptrace: do not audit capability check when
outputing /proc/pid/stat") introduced the ability to opt out of audit
messages for accesses to various proc files since they are not
violations of policy.
While doing so it switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking
the subjective credentials (ktask->cred) of the task to using the
objective credentials (ktask->real_cred). This is appears to be wrong.
ptrace_has_cap() is currently only used in ptrace_may_access() And is
used to check whether the calling task (subject) has the
CAP_SYS_PTRACE capability in the provided user namespace to operate on
the target task (object). According to the cred.h comments this means
the subjective credentials of the calling task need to be used.
With this fix we switch ptrace_has_cap() to use security_capable() and
thus back to using the subjective credentials.
As one example where this might be particularly problematic, Jann
pointed out that in combination with the upcoming IORING_OP_OPENAT{2}
feature, this bug might allow unprivileged users to bypass the
capability checks while asynchronously opening files like /proc/*/mem,
because the capability checks for this would be performed against
kernel credentials.
To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials
of the caller. Later on, when it starts to do work it creates a kernel
thread and registers a callback. The callback runs with kernel creds
for ktask->real_cred and ktask->cred.
To prevent this from becoming a full-blown 0-day io_uring will call
override_cred() and override ktask->cred with the subjective
credentials of the creator of the io_uring instance. With
ptrace_has_cap() currently looking at ktask->real_cred this override
will be ineffective and the caller will be able to open arbitray proc
files as mentioned above.
Luckily, this is currently not exploitable but would be so once
IORING_OP_OPENAT{2} land in v5.6. Let's fix it now.
To minimize potential regressions I successfully ran the criu
testsuite. criu makes heavy use of ptrace() and extensively hits
ptrace_may_access() codepaths and has a good change of detecting any
regressions.
Additionally, I succesfully ran the ptrace and seccomp kernel tests"
* tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
Linus Torvalds [Sat, 18 Jan 2020 20:18:55 +0000 (12:18 -0800)]
Merge tag 's390-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix printing misleading Secure-IPL enabled message when it is not.
- Fix a race condition between host ap bus and guest ap bus doing
device reset in crypto code.
- Fix sanity check in CCA cipher key function (CCA AES cipher key
support), which fails otherwise.
* tag 's390-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/setup: Fix secure ipl message
s390/zcrypt: move ap device reset from bus to driver code
s390/zcrypt: Fix CCA cipher key gen with clear key value function
Linus Torvalds [Sat, 18 Jan 2020 20:12:36 +0000 (12:12 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes in drivers with no impact to core code.
The mptfusion fix is enormous because the driver API had to be
rethreaded to pass down the necessary iocp pointer, but once that's
done a significant chunk of code is deleted.
The other two patches are small"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mptfusion: Fix double fetch bug in ioctl
scsi: storvsc: Correctly set number of hardware queues for IDE disk
scsi: fnic: fix invalid stack access
Linus Torvalds [Sat, 18 Jan 2020 20:08:57 +0000 (12:08 -0800)]
Merge tag 'char-misc-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some small fixes for 5.5-rc7
Included here are:
- two lkdtm fixes
- coresight build fix
- Documentation update for the hw process document
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Documentation/process: Add Amazon contact for embargoed hardware issues
lkdtm/bugs: fix build error in lkdtm_UNSET_SMEP
lkdtm/bugs: Make double-fault test always available
coresight: etm4x: Fix unused function warning
Linus Torvalds [Sat, 18 Jan 2020 20:06:09 +0000 (12:06 -0800)]
Merge tag 'staging-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here are some small staging and iio driver fixes for 5.5-rc7
All of them are for some small reported issues. Nothing major, full
details in the shortlog.
All have been in linux-next with no reported issues"
* tag 'staging-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: comedi: ni_routes: allow partial routing information
staging: comedi: ni_routes: fix null dereference in ni_find_route_source()
iio: light: vcnl4000: Fix scale for vcnl4040
iio: buffer: align the size of scan bytes to size of the largest element
iio: chemical: pms7003: fix unmet triggered buffer dependency
iio: imu: st_lsm6dsx: Fix selection of ST_LSM6DS3_ID
iio: adc: ad7124: Fix DT channel configuration
Aleksa Sarai [Fri, 6 Dec 2019 14:13:38 +0000 (01:13 +1100)]
Documentation: path-lookup: include new LOOKUP flags
Now that we have new LOOKUP flags, we should document them in the
relevant path-walking documentation. And now that we've settled on a
common name for nd_jump_link() style symlinks ("magic links"), use that
term where magic-link semantics are described.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Aleksa Sarai [Sat, 18 Jan 2020 12:08:00 +0000 (23:08 +1100)]
selftests: add openat2(2) selftests
Test all of the various openat2(2) flags. A small stress-test of a
symlink-rename attack is included to show that the protections against
".."-based attacks are sufficient.
The main things these self-tests are enforcing are:
* The struct+usize ABI for openat2(2) and copy_struct_from_user() to
ensure that upgrades will be handled gracefully (in addition,
ensuring that misaligned structures are also handled correctly).
* The -EINVAL checks for openat2(2) are all correctly handled to avoid
userspace passing unknown or conflicting flag sets (most
importantly, ensuring that invalid flag combinations are checked).
* All of the RESOLVE_* semantics (including errno values) are
correctly handled with various combinations of paths and flags.
* RESOLVE_IN_ROOT correctly protects against the symlink rename(2)
attack that has been responsible for several CVEs (and likely will
be responsible for several more).
Aleksa Sarai [Sat, 18 Jan 2020 12:07:59 +0000 (23:07 +1100)]
open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
David S. Miller [Sat, 18 Jan 2020 13:38:30 +0000 (14:38 +0100)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes.
3 small bug fix patches. The 1st two are aRFS fixes and the last one
fixes a fatal driver load failure on some kernels without PCIe
extended config space support enabled.
Please also queue these for -stable. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 17 Jan 2020 05:32:47 +0000 (00:32 -0500)]
bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
DSN read can fail, for example on a kdump kernel without PCIe extended
config space support. If DSN read fails, don't set the
BNXT_FLAG_DSN_VALID flag and continue loading. Check the flag
to see if the stored DSN is valid before using it. Only VF reps
creation should fail without valid DSN.
Fixes: 397645a5e0de ("bnxt: move bp->switch_id initialization to PF probe") Reported-by: Marc Smith <msmith626@gmail.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 17 Jan 2020 05:32:46 +0000 (00:32 -0500)]
bnxt_en: Fix ipv6 RFS filter matching logic.
Fix bnxt_fltr_match() to match ipv6 source and destination addresses.
The function currently only checks ipv4 addresses and will not work
corrently on ipv6 filters.
Fixes: 93658deae41f ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 17 Jan 2020 05:32:45 +0000 (00:32 -0500)]
bnxt_en: Fix NTUPLE firmware command failures.
The NTUPLE related firmware commands are sent to the wrong firmware
channel, causing all these commands to fail on new firmware that
supports the new firmware channel. Fix it by excluding the 3
NTUPLE firmware commands from the list for the new firmware channel.
Fixes: a1fe1fa391a6 ("bnxt_en: Add support for 2nd firmware message channel.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
Commit 00a5e4c05288 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to various
proc files since they are not violations of policy. While doing so it
somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. This
is wrong since. ptrace_has_cap() is currently only used in
ptrace_may_access() And is used to check whether the calling task (subject)
has the CAP_SYS_PTRACE capability in the provided user namespace to operate
on the target task (object). According to the cred.h comments this would
mean the subjective credentials of the calling task need to be used.
This switches ptrace_has_cap() to use security_capable(). Because we only
call ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling task's creds under rcu_read_lock() there's
no need to go through another series of dereferences and rcu locking done
in ns_capable{_noaudit}().
As one example where this might be particularly problematic, Jann pointed
out that in combination with the upcoming IORING_OP_OPENAT feature, this
bug might allow unprivileged users to bypass the capability checks while
asynchronously opening files like /proc/*/mem, because the capability
checks for this would be performed against kernel credentials.
To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials of the
caller. Later on, when it starts to do work it creates a kernel thread and
registers a callback. The callback runs with kernel creds for
ktask->real_cred and ktask->cred. To prevent this from becoming a
full-blown 0-day io_uring will call override_cred() and override
ktask->cred with the subjective credentials of the creator of the io_uring
instance. With ptrace_has_cap() currently looking at ktask->real_cred this
override will be ineffective and the caller will be able to open arbitray
proc files as mentioned above.
Luckily, this is currently not exploitable but will turn into a 0-day once
IORING_OP_OPENAT{2} land in v5.6. Fix it now!
Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Jann Horn <jannh@google.com> Fixes: 00a5e4c05288 ("ptrace: do not audit capability check when outputing /proc/pid/stat") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Dave Airlie [Sat, 18 Jan 2020 02:54:10 +0000 (12:54 +1000)]
Merge tag 'drm-misc-fixes-2020-01-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
virtio: maintain obj reservation lock when submitting cmds (Gerd)
rockchip: increase link rate var size to accommodate rates (Tobias)
mst: serialize down messages and clear timeslots are on unplug (Wayne)
Dave Airlie [Sat, 18 Jan 2020 02:53:53 +0000 (12:53 +1000)]
Merge tag 'drm-intel-fixes-2020-01-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- uAPI fix: Remove dash and colon from PMU names to comply with tools/perf
- Fix for include file that was indirectly included
- Two fixes to make sure VMA are marked active for error capture
Esben Haabendal [Fri, 17 Jan 2020 20:05:37 +0000 (21:05 +0100)]
mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume
As we reset the GPMI block at resume, the timing parameters setup by a
previous exec_op is lost. Rewriting GPMI timing registers on first exec_op
after resume fixes the problem.
Michael Walle [Thu, 16 Jan 2020 09:37:00 +0000 (10:37 +0100)]
mtd: spi-nor: Fix quad enable for Spansion like flashes
The commit 992164719f8c ("mtd: spi-nor: Merge spansion Quad Enable
methods") forgot to actually set the QE bit in some cases. Thus this
breaks quad mode accesses to flashes which support readback of the
status register-2. Fix it.
Linus Torvalds [Fri, 17 Jan 2020 19:25:45 +0000 (11:25 -0800)]
Merge tag 'io_uring-5.5-2020-01-16' of git://git.kernel.dk/linux-block
Pull io_uring fixes form Jens Axboe:
- Ensure ->result is always set when IO is retried (Bijan)
- In conjunction with the above, fix a regression in polled IO issue
when retried (me/Bijan)
- Don't setup async context for read/write fixed, otherwise we may
wrongly map the iovec on retry (me)
- Cancel io-wq work if we fail getting mm reference (me)
- Ensure dependent work is always initialized correctly (me)
- Only allow original task to submit IO, don't allow it from a passed
ring fd (me)
* tag 'io_uring-5.5-2020-01-16' of git://git.kernel.dk/linux-block:
io_uring: only allow submit from owning task
io_uring: ensure workqueue offload grabs ring mutex for poll list
io_uring: clear req->result always before issuing a read/write request
io_uring: be consistent in assigning next work from handler
io-wq: cancel work if we fail getting a mm reference
io_uring: don't setup async context for read/write fixed
Linus Torvalds [Fri, 17 Jan 2020 19:21:05 +0000 (11:21 -0800)]
Merge tag 'for-5.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes that have been in the works during last twp weeks.
All have a user visible effect and are stable material:
- scrub: properly update progress after calling cancel ioctl, calling
'resume' would start from the beginning otherwise
- fix subvolume reference removal, after moving out of the original
path the reference is not recognized and will lead to transaction
abort
- fix reloc root lifetime checks, could lead to crashes when there's
subvolume cleaning running in parallel
- fix memory leak when quotas get disabled in the middle of extent
accounting
- fix transaction abort in case of balance being started on degraded
mount on eg. RAID1"
* tag 'for-5.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: check rw_devices, not num_devices for balance
Btrfs: always copy scrub arguments back to user space
btrfs: relocation: fix reloc_root lifespan and access
btrfs: fix memory leak in qgroup accounting
btrfs: do not delete mismatched root refs
btrfs: fix invalid removal of root ref
btrfs: rework arguments of btrfs_unlink_subvol
Merge tag 'usb-serial-5.5-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.5-rc7
Here are a few fixes for issues related to unbound port devices which
could lead to NULL-pointer dereferences. Notably the bind attributes for
usb-serial (port) drivers are removed as almost none of the drivers can
handle individual ports going away once they've been bound.
Included are also some new device ids.
All but the unbound-port fixes have been in linux-next with no reported
issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.5-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: quatech2: handle unbound ports
USB: serial: keyspan: handle unbound ports
USB: serial: io_edgeport: add missing active-port sanity check
USB: serial: io_edgeport: handle unbound ports on URB completion
USB: serial: ch341: handle unbound port at reset_resume
USB: serial: suppress driver bind attributes
USB: serial: option: add support for Quectel RM500Q in QDL mode
USB: serial: opticon: fix control-message timeouts
USB: serial: option: Add support for Quectel RM500Q
USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx
Linus Torvalds [Fri, 17 Jan 2020 16:38:35 +0000 (08:38 -0800)]
Merge tag 'sound-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became bigger than I have hoped for rc7. But, the only large LOC
is for stm32 fixes that are simple rewriting of register access
helpers, while the rest are all nice and small fixes:
- A few ASoC fixes for the remaining probe error handling bugs
- ALSA sequencer core fix for racy proc file accesses
- Revert the option rename of snd-hda-intel to make compatible again
- Various device-specific fixes"
* tag 'sound-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: seq: Fix racy access for queue timer in proc read
ALSA: usb-audio: fix sync-ep altsetting sanity check
ASoC: msm8916-wcd-digital: Reset RX interpolation path after use
ASoC: msm8916-wcd-analog: Fix MIC BIAS Internal1
ASoC: cros_ec_codec: Make the device acpi compatible
ASoC: sti: fix possible sleep-in-atomic
ASoC: msm8916-wcd-analog: Fix selected events for MIC BIAS External1
ASoC: hdac_hda: Fix error in driver removal after failed probe
ASoC: SOF: Intel: fix HDA codec driver probe with multiple controllers
ASoC: SOF: Intel: lower print level to dbg if we will reinit DSP
ALSA: dice: fix fallback from protocol extension into limited functionality
ALSA: firewire-tascam: fix corruption due to spin lock without restoration in SoftIRQ context
ALSA: hda: Rename back to dmic_detect option
ASoC: stm32: dfsdm: fix 16 bits record
ASoC: stm32: sai: fix possible circular locking
ASoC: Fix NULL dereference at freeing
ASoC: Intel: bytcht_es8316: Fix Irbis NB41 netbook quirk
ASoC: rt5640: Fix NULL dereference on module unload
Johan Hovold [Fri, 17 Jan 2020 14:35:26 +0000 (15:35 +0100)]
USB: serial: quatech2: handle unbound ports
Check for NULL port data in the modem- and line-status handlers to avoid
dereferencing a NULL pointer in the unlikely case where a port device
isn't bound to a driver (e.g. after an allocation failure on port
probe).
Note that the other (stubbed) event handlers qt2_process_xmit_empty()
and qt2_process_flush() would need similar sanity checks in case they
are ever implemented.
Fixes: 2aa42ae2e9ed ("USB: serial: add quatech2 usb to serial driver") Cc: stable <stable@vger.kernel.org> # 3.5 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
Johan Hovold [Fri, 17 Jan 2020 09:50:25 +0000 (10:50 +0100)]
USB: serial: keyspan: handle unbound ports
Check for NULL port data in the control URB completion handlers to avoid
dereferencing a NULL pointer in the unlikely case where a port device
isn't bound to a driver (e.g. after an allocation failure on port
probe()).
Fixes: 0e9602c71014 ("USB Serial Keyspan: add support for USA-49WG & USA-28XG") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable <stable@vger.kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
The driver receives the active port number from the device, but never
made sure that the port number was valid. This could lead to a
NULL-pointer dereference or memory corruption in case a device sends
data for an invalid port.
Johan Hovold [Fri, 17 Jan 2020 09:50:23 +0000 (10:50 +0100)]
USB: serial: io_edgeport: handle unbound ports on URB completion
Check for NULL port data in the shared interrupt and bulk completion
callbacks to avoid dereferencing a NULL pointer in case a device sends
data for a port device which isn't bound to a driver (e.g. due to a
malicious device having unexpected endpoints or after an allocation
failure on port probe).
Johan Hovold [Fri, 17 Jan 2020 09:50:22 +0000 (10:50 +0100)]
USB: serial: ch341: handle unbound port at reset_resume
Check for NULL port data in reset_resume() to avoid dereferencing a NULL
pointer in case the port device isn't bound to a driver (e.g. after a
failed control request at port probe).
Fixes: 2070dab40d24 ("USB: ch341 serial: fix port number changed after resume") Cc: stable <stable@vger.kernel.org> # 2.6.30 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
We now use btrfs_can_overcommit() to see if we can flip a block group
read only. Before this would fail because we weren't taking into
account the usable un-allocated space for allocating chunks. With my
patches we were allowed to do the balance, which is technically correct.
The test is trying to start balance on degraded mount. So now we're
trying to allocate a chunk and cannot because we want to allocate a
RAID1 chunk, but there's only 1 device that's available for usage. This
results in an ENOSPC.
But we shouldn't even be making it this far, we don't have enough
devices to restripe. The problem is we're using btrfs_num_devices(),
that also includes missing devices. That's not actually what we want, we
need to use rw_devices.
The chunk_mutex is not needed here, rw_devices changes only in device
add, remove or replace, all are excluded by EXCL_OP mechanism.
Fixes: b36be4a28128 ("Btrfs: implement online profile changing") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com>
[ add stacktrace, update changelog, drop chunk_mutex ] Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 16 Jan 2020 11:29:20 +0000 (11:29 +0000)]
Btrfs: always copy scrub arguments back to user space
If scrub returns an error we are not copying back the scrub arguments
structure to user space. This prevents user space to know how much
progress scrub has done if an error happened - this includes -ECANCELED
which is returned when users ask for scrub to stop. A particular use
case, which is used in btrfs-progs, is to resume scrub after it is
canceled, in that case it relies on checking the progress from the scrub
arguments structure and then use that progress in a call to resume
scrub.
So fix this by always copying the scrub arguments structure to user
space, overwriting the value returned to user space with -EFAULT only if
copying the structure failed to let user space know that either that
copying did not happen, and therefore the structure is stale, or it
happened partially and the structure is probably not valid and corrupt
due to the partial copy.
Reported-by: Graham Cobb <g.btrfs@cobb.uk.net> Link: https://lore.kernel.org/linux-btrfs/d0a97688-78be-08de-ca7d-bcb4c7fb397e@cobb.uk.net/ Fixes: 4705f6e0dc28fd ("Btrfs: do not overwrite scrub error with fault error in scrub ioctl") CC: stable@vger.kernel.org # 5.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Graham Cobb <g.btrfs@cobb.uk.net> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Fri, 17 Jan 2020 14:03:11 +0000 (06:03 -0800)]
Merge tag 'gpio-v5.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"This reverts the GPIOLIB_IRQCHIP in the ThunderX driver.
ThunderX is a piece of Arm-based server chip. I converted the driver
to hierarchical gpiochip without access to real silicon and failed
miserably since I didn't take MSI's into account.
Kevin Hao helpfully stepped in and fixed it properly, let's revert it
for v5.5 and put the proper conversion into v5.6"
* tag 'gpio-v5.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
Revert "gpio: thunderx: Switch to GPIOLIB_IRQCHIP"
Linus Torvalds [Fri, 17 Jan 2020 13:54:18 +0000 (05:54 -0800)]
Merge tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three fixes that should go into this release:
- The 32-bit segment size fix that I mentioned last week (Ming)
- Use uint for the block size (Mikulas)
- A null_blk zone write handling fix (Damien)"
* tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block:
block: fix an integer overflow in logical block size
null_blk: Fix zone write handling
block: fix get_max_segment_size() overflow on 32bit arch
Florian Fainelli [Thu, 16 Jan 2020 21:08:58 +0000 (13:08 -0800)]
net: systemport: Fixed queue mapping in internal ring map
We would not be transmitting using the correct SYSTEMPORT transmit queue
during ndo_select_queue() which looks up the internal TX ring map
because while establishing the mapping we would be off by 4, so for
instance, when we populate switch port mappings we would be doing:
switch port 0, queue 0 -> ring index #0
switch port 0, queue 1 -> ring index #1
...
switch port 0, queue 3 -> ring index #3
switch port 1, queue 0 -> ring index #8 (4 + 4 * 1)
...
instead of using ring index #4. This would cause our ndo_select_queue()
to use the fallback queue mechanism which would pick up an incorrect
ring for that switch port. Fix this by using the correct switch queue
number instead of SYSTEMPORT queue number.
Fixes: e3f78751c97b ("net: systemport: Simplify queue mapping logic") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 16 Jan 2020 20:55:48 +0000 (12:55 -0800)]
net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
With the implementation of the system reset controller we lost a setting
that is currently applied by the bootloader and which configures the IMP
port for 2Gb/sec, the default is 1Gb/sec. This is needed given the
number of ports and applications we expect to run so bring back that
setting.
Fixes: 01b0ac07589e ("net: dsa: bcm_sf2: Add support for optional reset controller line") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 16 Jan 2020 18:43:27 +0000 (20:43 +0200)]
net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
The sja1105_parse_ports_node function was tested only on device trees
where all ports were enabled. Fix this check so that the driver
continues to probe only with the ports where status is not "disabled",
as expected.
Fixes: d6439dcd9177 ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
According to the Datasheet this bit should be 0 (Normal operation) in
default. With the FORCE_LINK_GOOD bit set, it is not possible to get a
link. This patch sets FORCE_LINK_GOOD to the default value after
resetting the phy.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Kan Liang [Thu, 16 Jan 2020 20:02:10 +0000 (12:02 -0800)]
perf/x86/intel/uncore: Remove PCIe3 unit for SNR
The PCIe Root Port driver for CPU Complex PCIe Root Ports are not
loaded on SNR.
The device ID for SNR PCIe3 unit is used by both uncore driver and the
PCIe Root Port driver. If uncore driver is loaded, the PCIe Root Port
driver never be probed.
Remove the PCIe3 unit for SNR for now. The support for PCIe3 unit will
be added later separately.
Kan Liang [Thu, 16 Jan 2020 20:02:09 +0000 (12:02 -0800)]
perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events
An Oops during the boot is found on some SNR machines. It turns out
this is because the snr_uncore_imc_freerunning_events[] array was
missing an end-marker.
Fixes: 9206f7a55359 ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Reported-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Like Xu <like.xu@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200116200210.18937-1-kan.liang@linux.intel.com
| so I was tracking down some odd behavior in the perf_fuzzer which turns
| out to be because perf_even_open() sometimes returns 0 (indicating a file
| descriptor of 0) even though as far as I can tell stdin is still open.
... and further the cause:
| error is triggered if aux_sample_size has non-zero value.
|
| seems to be this line in kernel/events/core.c:
|
| if (perf_need_aux_event(event) && !perf_get_aux_event(event, group_leader))
| goto err_locked;
|
| (note, err is never set)
This seems to be a thinko in commit:
f403171f4e881b27 ("perf: Allow normal events to output AUX data")
... and we should probably return -EINVAL here, as this should only
happen when the new event is mis-configured or does not have a
compatible aux_event group leader.
Fixes: f403171f4e881b27 ("perf: Allow normal events to output AUX data") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Yonglong Liu [Thu, 16 Jan 2020 07:41:17 +0000 (15:41 +0800)]
net: hns: fix soft lockup when there is not enough memory
When there is not enough memory and napi_alloc_skb() return NULL,
the HNS driver will print error message, and than try again, if
the memory is not enough for a while, huge error message and the
retry operation will cause soft lockup.
When napi_alloc_skb() return NULL because of no memory, we can
get a warn_alloc() call trace, so this patch deletes the error
message. We already use polling mode to handle irq, but the
retry operation will render the polling weight inactive, this
patch just return budget when the rx is not completed to avoid
dead loop.
Fixes: deab060e7ea6 ("net: hns: Optimize hns_nic_common_poll for better performance") Fixes: 5fda5763a87e ("net: add Hisilicon Network Subsystem basic ethernet support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 16 Jan 2020 16:07:05 +0000 (17:07 +0100)]
USB: serial: suppress driver bind attributes
USB-serial drivers must not be unbound from their ports before the
corresponding USB driver is unbound from the parent interface so
suppress the bind and unbind attributes.
Unbinding a serial driver while it's port is open is a sure way to
trigger a crash as any driver state is released on unbind while port
hangup is handled on the parent USB interface level. Drivers for
multiport devices where ports share a resource such as an interrupt
endpoint also generally cannot handle individual ports going away.
Cong Wang [Wed, 15 Jan 2020 21:02:38 +0000 (13:02 -0800)]
net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
syzbot reported some bogus lockdep warnings, for example bad unlock
balance in sch_direct_xmit(). They are due to a race condition between
slow path and fast path, that is qdisc_xmit_lock_key gets re-registered
in netdev_update_lockdep_key() on slow path, while we could still
acquire the queue->_xmit_lock on fast path in this small window:
CPU A CPU B
__netif_tx_lock();
lockdep_unregister_key(qdisc_xmit_lock_key);
__netif_tx_unlock();
lockdep_register_key(qdisc_xmit_lock_key);
In fact, unlike the addr_list_lock which has to be reordered when
the master/slave device relationship changes, queue->_xmit_lock is
only acquired on fast path and only when NETIF_F_LLTX is not set,
so there is likely no nested locking for it.
Therefore, we can just get rid of re-registration of
qdisc_xmit_lock_key.
Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com Fixes: dba6ed4c4b1e ("net: core: add generic lockdep keys") Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: dbbf3d381cbd ("net/sched: act_ife: validate the control action inside init()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Waiman Long [Wed, 15 Jan 2020 15:43:36 +0000 (10:43 -0500)]
locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
The commit 23169130ee29 ("locking/rwsem: Make handoff writer
optimistically spin on owner") will allow a recently woken up waiting
writer to spin on the owner. Unfortunately, if the owner happens to be
RWSEM_OWNER_UNKNOWN, the code will incorrectly spin on it leading to a
kernel crash. This is fixed by passing the proper non-spinnable bits
to rwsem_spin_on_owner() so that RWSEM_OWNER_UNKNOWN will be treated
as a non-spinnable target.
Fixes: 23169130ee29 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200115154336.8679-1-longman@redhat.com
Jens Axboe [Fri, 17 Jan 2020 02:00:24 +0000 (19:00 -0700)]
io_uring: only allow submit from owning task
If the credentials or the mm doesn't match, don't allow the task to
submit anything on behalf of this ring. The task that owns the ring can
pass the file descriptor to another task, but we don't want to allow
that task to submit an SQE that then assumes the ring mm and creds if
it needs to go async.
Cc: stable@vger.kernel.org Suggested-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 17 Jan 2020 03:42:08 +0000 (19:42 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"I've been sitting on these longer than I meant, so the patch count is
a bit higher than ideal for this part of the release. There's also
some reverts of double-applied patches that brings the diffstat up a
bit.
With that said, the biggest changes are:
- Revert of duplicate i2c device addition on two Aspeed (BMC)
Devicetrees.
- Move of two device nodes that got applied to the wrong part of the
tree on ASpeed G6.
- Regulator fix for Beaglebone X15 (adding 12/5V supplies)
- Use interrupts for keys on Amlogic SM1 to avoid missed polls
In addition to that, there is a collection of smaller DT fixes:
- Power supply assignment fixes for i.MX6
- Fix of interrupt line for magnetometer on i.MX8 Librem5 devkit
- Build fixlets (selects) for davinci/omap2+
- More interrupt number fixes for Stratix10, Amlogic SM1, etc.
- ... and more similar fixes across different platforms
And some non-DT stuff:
- optee fix to register multiple shared pages properly
- Clock calculation fixes for MMP3
- Clock fixes for OMAP as well"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
MAINTAINERS: Add myself as the co-maintainer for Actions Semi platforms
ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support
ARM: dts: imx6sll-evk: Remove incorrect power supply assignment
ARM: dts: imx6sl-evk: Remove incorrect power supply assignment
ARM: dts: imx6sx-sdb: Remove incorrect power supply assignment
ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL
ARM: omap2plus: select RESET_CONTROLLER
ARM: davinci: select CONFIG_RESET_CONTROLLER
ARM: dts: aspeed: rainier: Fix fan fault and presence
ARM: dts: aspeed: rainier: Remove duplicate i2c busses
ARM: dts: aspeed: tacoma: Remove duplicate flash nodes
ARM: dts: aspeed: tacoma: Remove duplicate i2c busses
ARM: dts: aspeed: tacoma: Fix fsi master node
ARM: dts: aspeed-g6: Fix FSI master location
ARM: dts: mmp3: Fix the TWSI ranges
clk: mmp2: Fix the order of timer mux parents
ARM: mmp: do not divide the clock rate
arm64: dts: rockchip: Fix IR on Beelink A1
optee: Fix multi page dynamic shm pool alloc
...
Linus Torvalds [Thu, 16 Jan 2020 23:55:30 +0000 (15:55 -0800)]
Merge tag 'pm-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a coding mistake in the teo cpuidle governor causing data to be
written beyond the last array element (Ikjoon Jang)"
* tag 'pm-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: teo: Fix intervals[] array indexing bug
Tom Lendacky [Wed, 15 Jan 2020 22:05:16 +0000 (16:05 -0600)]
x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
If the SME and SEV features are present via CPUID, but memory encryption
support is not enabled (MSR 0xC001_0010[23]), the feature flags are cleared
using clear_cpu_cap(). However, if get_cpu_cap() is later called, these
feature flags will be reset back to present, which is not desired.
Change from using clear_cpu_cap() to setup_clear_cpu_cap() so that the
clearing of the flags is maintained.
Linus Torvalds [Thu, 16 Jan 2020 18:26:40 +0000 (10:26 -0800)]
Merge tag 'tag-chrome-platform-fixes-for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform fix from Benson Leung:
"One fix in the wilco_ec keyboard backlight driver to allow the EC
driver to continue loading in the absence of a backlight module"
* tag 'tag-chrome-platform-fixes-for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: wilco_ec: Fix keyboard backlight probing
Eyal Birger [Tue, 14 Jan 2020 08:03:50 +0000 (10:03 +0200)]
netfilter: nat: fix ICMP header corruption on ICMP errors
Commit e2749f1cefa6 ("netfilter: nat: fix spurious connection timeouts")
made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4
manipulation function for the outer packet on ICMP errors.
However, icmp_manip_pkt() assumes the packet has an 'id' field which
is not correct for all types of ICMP messages.
This is not correct for ICMP error packets, and leads to bogus bytes
being written the ICMP header, which can be wrongfully regarded as
'length' bytes by RFC 4884 compliant receivers.
Fix by assigning the 'id' field only for ICMP messages that have this
semantic.
A missing generation check during DELTABLE processing causes it to queue
the DELFLOWTABLE operation a second time, so we corrupt the list here:
case NFT_MSG_DELFLOWTABLE:
list_del_rcu(&nft_trans_flowtable(trans)->list);
nf_tables_flowtable_notify(&trans->ctx,
because we have two different DELFLOWTABLE transactions for the same
flowtable. We then call list_del_rcu() twice for the same flowtable->list.
The object handling seems to suffer from the same bug so add a generation
check too and only queue delete transactions for flowtables/objects that
are still active in the next generation.
Dan Carpenter [Thu, 16 Jan 2020 10:09:31 +0000 (13:09 +0300)]
netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
Syzbot detected a leak in nf_tables_parse_netdev_hooks(). If the hook
already exists, then the error handling doesn't free the newest "hook".
Reported-by: syzbot+f9d4095107fc8749c69c@syzkaller.appspotmail.com Fixes: 07182957e54e ("netfilter: nf_tables: allow netdevice to be used only once per flowtable") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>