All three calls to ksys_mount() in initrd-related kernel code can
be switched over to do_mount():
- the first and third arguments are const strings in the kernel,
and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be,
copied over from userspace;
- the second and fourth argument are passed through anyway.
In devtmpfs, do_mount() can be called directly instead of complex wrapping
by ksys_mount():
- the first and third arguments are const strings in the kernel,
and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be
copied over from userspace;
- the second and fourth argument are passed through anyway.
Linus Torvalds [Thu, 12 Dec 2019 02:10:40 +0000 (18:10 -0800)]
Merge tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Fixes for AFS plus one patch to make debugging easier:
- Fix how addresses are matched to server records. This is currently
incorrect which means cache invalidation callbacks from the server
don't necessarily get delivered correctly. This causes stale data
and metadata to be seen under some circumstances.
- Make the dynamic root superblock R/W so that rpm/dnf can reapply
the SELinux label to it when upgrading the Fedora filesystem-afs
package. If the filesystem is R/O, this fails and the upgrade
fails.
It might be better in future to allow setxattr from an LSM to
bypass the R/O protections, if only for pseudo-filesystems.
- Fix the parsing of mountpoint strings. The mountpoint object has to
have a terminal dot, whereas the source/device string passed to
mount should not. This confuses type-forcing suffix detection
leading to the wrong volume variant being mounted.
- Make lookups in the dynamic root superblock for creation events
(such as mkdir) fail with EOPNOTSUPP rather than something like
EEXIST. The dynamic root only allows implicit creation by the
->lookup() method - and only if the target cell exists.
- Fix the looking up of an AFS superblock to include the cell in the
matching key - otherwise all volumes with the same ID number are
treated as the same thing, irrespective of which cell they're in.
- Show the volume name of each volume in the volume records displayed
in /proc/net/afs/<cell>/volumes. This proved useful in debugging as
it provides a way to map the volume IDs to names, where the names
are what appear in /proc/mounts"
* tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Show volume name in /proc/net/afs/<cell>/volumes
afs: Fix missing cell comparison in afs_test_super()
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
afs: Fix mountpoint parsing
afs: Fix SELinux setting security label on /afs
afs: Fix afs_find_server lookups for ipv4 peers
Linus Torvalds [Wed, 11 Dec 2019 20:25:32 +0000 (12:25 -0800)]
Merge tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Mainly address a regression reported by David recently observed
together with overlayfs due to the improper return value of
listxattr() without xattr. Update outdated expressions in document as
well.
Summary:
- Fix improper return value of listxattr() with no xattr
- Keep up documentation with latest code"
* tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: update documentation
erofs: zero out when listxattr is called with no xattr
Linus Torvalds [Wed, 11 Dec 2019 20:22:38 +0000 (12:22 -0800)]
Merge tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Remove code I accidentally applied when doing a minor fix up to a
patch, and then using "git commit -a --amend", which pulled in some
other changes I was playing with.
- Remove an used variable in trace_events_inject code
- Fix function graph tracer when it traces a ftrace direct function.
It will now ignore tracing a function that has a ftrace direct
tramploine attached. This is needed for eBPF to use the ftrace direct
code.
* tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix function_graph tracer interaction with BPF trampoline
tracing: remove set but not used variable 'buffer'
module: Remove accidental change of module_enable_x()
Linus Torvalds [Wed, 11 Dec 2019 19:46:19 +0000 (11:46 -0800)]
pipe: simplify signal handling in pipe_read() and add comments
There's no need to separately check for signals while inside the locked
region, since we're going to do "wait_event_interruptible()" right
afterwards anyway, and the error handling is much simpler there.
The check for whether we had already read anything was also redundant,
since we no longer do the odd merging of reads when there are pending
writers.
But perhaps more importantly, this adds commentary about why we still
need to wake up possible writers even though we didn't read any data,
and why we can skip all the finishing touches now if we get a signal (or
had a signal pending) while waiting for more data.
[ This is a split-out cleanup from my "make pipe IO use exclusive wait
queues" thing, which I can't apply because it triggers a nasty bug in
the GNU make jobserver - Linus ]
David Howells [Wed, 11 Dec 2019 08:58:59 +0000 (08:58 +0000)]
afs: Show volume name in /proc/net/afs/<cell>/volumes
Show the name of each volume in /proc/net/afs/<cell>/volumes to make it
easier to work out the name corresponding to a volume ID. This makes it
easier to work out which mounts in /proc/mounts correspond to which volume
ID.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
David Howells [Wed, 11 Dec 2019 08:06:08 +0000 (08:06 +0000)]
afs: Fix missing cell comparison in afs_test_super()
Fix missing cell comparison in afs_test_super(). Without this, any pair
volumes that have the same volume ID will share a superblock, no matter the
cell, unless they're in different network namespaces.
Normally, most users will only deal with a single cell and so they won't
see this. Even if they do look into a second cell, they won't see a
problem unless they happen to hit a volume with the same ID as one they've
already got mounted.
David Howells [Wed, 11 Dec 2019 08:56:04 +0000 (08:56 +0000)]
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
Fix the lookup method on the dynamic root directory such that creation
calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP
rather than failing with some odd error (such as EEXIST).
lookup() itself tries to create automount directories when it is invoked.
These are cached locally in RAM and not committed to storage.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
David Howells [Mon, 9 Dec 2019 15:04:45 +0000 (15:04 +0000)]
afs: Fix mountpoint parsing
Each AFS mountpoint has strings that define the target to be mounted. This
is required to end in a dot that is supposed to be stripped off. The
string can include suffixes of ".readonly" or ".backup" - which are
supposed to come before the terminal dot. To add to the confusion, the "fs
lsmount" afs utility does not show the terminal dot when displaying the
string.
The kernel mount source string parser, however, assumes that the terminal
dot marks the suffix and that the suffix is always "" and is thus ignored.
In most cases, there is no suffix and this is not a problem - but if there
is a suffix, it is lost and this affects the ability to mount the correct
volume.
The command line mount command, on the other hand, is expected not to
include a terminal dot - so the problem doesn't arise there.
Fix this by making sure that the dot exists and then stripping it when
passing the string to the mount configuration.
Fixes: 07676b361027 ("AFS: Implement an autocell mount capability [ver #2]") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
ftrace: Fix function_graph tracer interaction with BPF trampoline
Depending on type of BPF programs served by BPF trampoline it can call original
function. In such case the trampoline will skip one stack frame while
returning. That will confuse function_graph tracer and will cause crashes with
bad RIP. Teach graph tracer to skip functions that have BPF trampoline attached.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
YueHaibing [Sat, 7 Dec 2019 03:44:09 +0000 (11:44 +0800)]
tracing: remove set but not used variable 'buffer'
kernel/trace/trace_events_inject.c: In function trace_inject_entry:
kernel/trace/trace_events_inject.c:20:22: warning: variable buffer set but not used [-Wunused-but-set-variable]
It is never used, so remove it.
Link: http://lkml.kernel.org/r/20191207034409.25668-1-yuehaibing@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
module: Remove accidental change of module_enable_x()
When pulling in Divya Indi's patch, I made a minor fix to remove unneeded
braces. I commited my fix up via "git commit -a --amend". Unfortunately, I
didn't realize I had some changes I was testing in the module code, and
those changes were applied to Divya's patch as well.
This reverts the accidental updates to the module code.
Cc: Jessica Yu <jeyu@kernel.org> Cc: Divya Indi <divya.indi@oracle.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Fixes: 6d25faf626b6 ("tracing: Verify if trace array exists before destroying it.") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Mon, 9 Dec 2019 20:14:31 +0000 (12:14 -0800)]
Merge tag 'for-5.5-rc1-kconfig-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs Kconfig fix from David Sterba:
"This adds the config dependency integrating the crypto code and btrfs
support for blake2b (added in this dev cycle, via different trees).
Without it the option had to be selected manually"
* tag 'for-5.5-rc1-kconfig-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add Kconfig dependency for BLAKE2B
Linus Torvalds [Mon, 9 Dec 2019 19:48:21 +0000 (11:48 -0800)]
Merge tag 'printk-for-5.5-pr-warning-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull pr_warning() removal from Petr Mladek.
- Final removal of the unused pr_warning() alias.
You're supposed to use just "pr_warn()" in the kernel.
* tag 'printk-for-5.5-pr-warning-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
checkpatch: Drop pr_warning check
printk: Drop pr_warning definition
Fix up for "printk: Drop pr_warning definition"
workqueue: Use pr_warn instead of pr_warning
Linus Torvalds [Mon, 9 Dec 2019 18:59:00 +0000 (10:59 -0800)]
Merge tag 'thermal-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal fixes from Zhang Rui:
"Starting from this release cycle, we have Daniel Lezcano work as the
new thermal co-maintainer because Eduardo's email is bouncing for
sometime and we can not reach him. We also have a new shared git tree
so that both Daniel and I can actively working on it.
Specifics:
- Update MAINTAINER file for new thermal co-maintainer and new
thermal git tree address. (Daniel Lezcano, Florian Fainelli, Zhang
Rui)
- Fix a Kconfig warning. (YueHaibing)"
* tag 'thermal-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
MAINTAINERS: thermal: Change the git tree location
MAINTAINERS: thermal: Add Daniel Lezcano as the thermal maintainer
MAINTAINERS: thermal: Eduardo's email is bouncing
thermal: power_allocator: Fix Kconfig warning
David Sterba [Thu, 28 Nov 2019 12:02:32 +0000 (13:02 +0100)]
btrfs: add Kconfig dependency for BLAKE2B
Because the BLAKE2B code went through a different tree, it was not
available at the time the btrfs part was merged. Now that the Kconfig
symbol exists, add it to the list.
David Howells [Mon, 9 Dec 2019 15:04:45 +0000 (15:04 +0000)]
afs: Fix SELinux setting security label on /afs
Make the AFS dynamic root superblock R/W so that SELinux can set the
security label on it. Without this, upgrades to, say, the Fedora
filesystem-afs RPM fail if afs is mounted on it because the SELinux label
can't be (re-)applied.
It might be better to make it possible to bypass the R/O check for LSM
label application through setxattr.
Fixes: d61043e0bf09 ("afs: Support the AFS dynamic root") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: selinux@vger.kernel.org
cc: linux-security-module@vger.kernel.org
Marc Dionne [Mon, 9 Dec 2019 15:04:43 +0000 (15:04 +0000)]
afs: Fix afs_find_server lookups for ipv4 peers
afs_find_server tries to find a server that has an address that
matches the transport address of an rxrpc peer. The code assumes
that the transport address is always ipv6, with ipv4 represented
as ipv4 mapped addresses, but that's not the case. If the transport
family is AF_INET, srx->transport.sin6.sin6_addr.s6_addr32[] will
be beyond the actual ipv4 address and will always be 0, and all
ipv4 addresses will be seen as matching.
As a result, the first ipv4 address seen on any server will be
considered a match, and the server returned may be the wrong one.
One of the consequences is that callbacks received over ipv4 will
only be correctly applied for the server that happens to have the
first ipv4 address on the fs_addresses4 list. Callbacks over ipv4
from all other servers are dropped, causing the client to serve stale
data.
This is fixed by looking at the transport family, and comparing ipv4
addresses based on a sockaddr_in structure rather than a sockaddr_in6.
Fixes: b7b4753fe8b8 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
1) More jumbo frame fixes in r8169, from Heiner Kallweit.
2) Fix bpf build in minimal configuration, from Alexei Starovoitov.
3) Use after free in slcan driver, from Jouni Hogander.
4) Flower classifier port ranges don't work properly in the HW offload
case, from Yoshiki Komachi.
5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.
6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.
7) Fix flow dissection in dsa TX path, from Alexander Lobakin.
8) Stale syncookie timestampe fixes from Guillaume Nault.
[ Did an evil merge to silence a warning introduced by this pull - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
r8169: fix rtl_hw_jumbo_disable for RTL8168evl
net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
r8169: add missing RX enabling for WoL on RTL8125
vhost/vsock: accept only packets with the right dst_cid
net: phy: dp83867: fix hfs boot in rgmii mode
net: ethernet: ti: cpsw: fix extra rx interrupt
inet: protect against too small mtu values.
gre: refetch erspan header from skb->data after pskb_may_pull()
pppoe: remove redundant BUG_ON() check in pppoe_pernet
tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
tcp: tighten acceptance of ACKs not matching a child socket
tcp: fix rejected syncookies due to stale timestamps
lpc_eth: kernel BUG on remove
tcp: md5: fix potential overestimation of TCP option space
net: sched: allow indirect blocks to bind to clsact in TC
net: core: rename indirect block ingress cb function
net-sysfs: Call dev_hold always in netdev_queue_add_kobject
net: dsa: fix flow dissection on Tx path
net/tls: Fix return values to avoid ENOTSUPP
net: avoid an indirect call in ____sys_recvmsg()
...
Linus Torvalds [Sun, 8 Dec 2019 20:23:42 +0000 (12:23 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Eleven patches, all in drivers (no core changes) that are either minor
cleanups or small fixes.
They were late arriving, but still safe for -rc1"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: MAINTAINERS: Add the linux-scsi mailing list to the ISCSI entry
scsi: megaraid_sas: Make poll_aen_lock static
scsi: sd_zbc: Improve report zones error printout
scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI
scsi: qla2xxx: unregister ports after GPN_FT failure
scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan
scsi: pm80xx: Remove unused include of linux/version.h
scsi: pm80xx: fix logic to break out of loop when register value is 2 or 3
scsi: scsi_transport_sas: Fix memory leak when removing devices
scsi: lpfc: size cpu map by last cpu id set
scsi: ibmvscsi_tgt: Remove unneeded variable rc
Linus Torvalds [Sun, 8 Dec 2019 20:12:18 +0000 (12:12 -0800)]
Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Nine cifs/smb3 fixes:
- one fix for stable (oops during oplock break)
- two timestamp fixes including important one for updating mtime at
close to avoid stale metadata caching issue on dirty files (also
improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the
wire)
- two fixes for "modefromsid" mount option for file create (now
allows mode bits to be set more atomically and accurately on create
by adding "sd_context" on create when modefromsid specified on
mount)
- two fixes for multichannel found in testing this week against
different servers
- two small cleanup patches"
* tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: improve check for when we send the security descriptor context on create
smb3: fix mode passed in on create for modetosid mount option
cifs: fix possible uninitialized access and race on iface_list
cifs: Fix lookup of SMB connections on multichannel
smb3: query attributes on file close
smb3: remove unused flag passed into close functions
cifs: remove redundant assignment to pointer pneg_ctxt
fs: cifs: Fix atime update check vs mtime
CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
Linus Torvalds [Sun, 8 Dec 2019 19:08:28 +0000 (11:08 -0800)]
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro:
"No common topic, just three cleanups".
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make __d_alloc() static
fs/namespace: add __user to open_tree and move_mount syscalls
fs/fnctl: fix missing __user in fcntl_rw_hint()
Linus Torvalds [Sun, 8 Dec 2019 01:07:18 +0000 (17:07 -0800)]
Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
"Fix a race condition and a use-after-free error:
- Fix a UAF when reporting writeback errors
- Fix a race condition when handling page uptodate on fragmented file
with blocksize < pagesize"
* tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: stop using ioend after it's been freed in iomap_finish_ioend()
iomap: fix sub-page uptodate handling
Linus Torvalds [Sun, 8 Dec 2019 01:05:33 +0000 (17:05 -0800)]
Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Fix a couple of resource management errors and a hang:
- fix a crash in the log setup code when log mounting fails
- fix a hang when allocating space on the realtime device
- fix a block leak when freeing space on the realtime device"
* tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix mount failure crash on invalid iclog memory access
xfs: don't check for AG deadlock for realtime files in bunmapi
xfs: fix realtime file data space leak
Linus Torvalds [Sun, 8 Dec 2019 00:59:25 +0000 (16:59 -0800)]
Merge tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:
"orangefs: posix open permission checking...
Orangefs has no open, and orangefs checks file permissions on each
file access. Posix requires that file permissions be checked on open
and nowhere else. Orangefs-through-the-kernel needs to seem posix
compliant.
The VFS opens files, even if the filesystem provides no method. We can
see if a file was successfully opened for read and or for write by
looking at file->f_mode.
When writes are flowing from the page cache, file is no longer
available. We can trust the VFS to have checked file->f_mode before
writing to the page cache.
The mode of a file might change between when it is opened and IO
commences, or it might be created with an arbitrary mode.
We'll make sure we don't hit EACCES during the IO stage by using
UID 0"
[ This is "posixish", but not a great solution in the long run, since a
proper secure network server shouldn't really trust the client like this.
But proper and secure POSIX behavior requires an open method and a
resulting cookie for IO of some kind, or similar. - Linus ]
* tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: posix open permission checking...
Linus Torvalds [Sun, 8 Dec 2019 00:56:00 +0000 (16:56 -0800)]
Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"This is a relatively quiet cycle for nfsd, mainly various bugfixes.
Possibly most interesting is Trond's fixes for some callback races
that were due to my incomplete understanding of rpc client shutdown.
Unfortunately at the last minute I've started noticing a new
intermittent failure to send callbacks. As the logic seems basically
correct, I'm leaving Trond's patches in for now, and hope to find a
fix in the next week so I don't have to revert those patches"
* tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
nfsd: depend on CRYPTO_MD5 for legacy client tracking
NFSD fixing possible null pointer derefering in copy offload
nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
nfsd: Ensure CLONE persists data and metadata changes to the target file
SUNRPC: Fix backchannel latency metrics
nfsd: restore NFSv3 ACL support
nfsd: v4 support requires CRYPTO_SHA256
nfsd: Fix cld_net->cn_tfm initialization
lockd: remove __KERNEL__ ifdefs
sunrpc: remove __KERNEL__ ifdefs
race in exportfs_decode_fh()
nfsd: Drop LIST_HEAD where the variable it declares is never used.
nfsd: document callback_wq serialization of callback code
nfsd: mark cb path down on unknown errors
nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
nfsd: minor 4.1 callback cleanup
SUNRPC: Fix svcauth_gss_proxy_init()
SUNRPC: Trace gssproxy upcall results
sunrpc: fix crash when cache_head become valid before update
nfsd: remove private bin2hex implementation
...
Linus Torvalds [Sun, 8 Dec 2019 00:50:55 +0000 (16:50 -0800)]
Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv4.2 now supports cross device offloaded copy (i.e. offloaded
copy of a file from one source server to a different target
server).
- New RDMA tracepoints for debugging congestion control and Local
Invalidate WRs.
Bugfixes and cleanups
- Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for
layoutreturn
- Handle bad/dead sessions correctly in nfs41_sequence_process()
- Various bugfixes to the delegation return operation.
- Various bugfixes pertaining to delegations that have been revoked.
- Cleanups to the NFS timespec code to avoid unnecessary conversions
between timespec and timespec64.
- Fix unstable RDMA connections after a reconnect
- Close race between waking an RDMA sender and posting a receive
- Wake pending RDMA tasks if connection fails
- Fix MR list corruption, and clean up MR usage
- Fix another RPCSEC_GSS issue with MIC buffer space"
* tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
SUNRPC: Capture completion of all RPC tasks
SUNRPC: Fix another issue with MIC buffer space
NFS4: Trace lock reclaims
NFS4: Trace state recovery operation
NFSv4.2 fix memory leak in nfs42_ssc_open
NFSv4.2 fix kfree in __nfs42_copy_file_range
NFS: remove duplicated include from nfs4file.c
NFSv4: Make _nfs42_proc_copy_notify() static
NFS: Fallocate should use the nfs4_fattr_bitmap
NFS: Return -ETXTBSY when attempting to write to a swapfile
fs: nfs: sysfs: Remove NULL check before kfree
NFS: remove unneeded semicolon
NFSv4: add declaration of current_stateid
NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list
SUNRPC: Avoid RPC delays when exiting suspend
NFS: Add a tracepoint in nfs_fh_to_dentry()
NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
...
Steve French [Sat, 7 Dec 2019 23:38:22 +0000 (17:38 -0600)]
smb3: improve check for when we send the security descriptor context on create
We had cases in the previous patch where we were sending the security
descriptor context on SMB3 open (file create) in cases when we hadn't
mounted with with "modefromsid" mount option.
Add check for that mount flag before calling ad_sd_context in
open init.
Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Linus Torvalds [Sat, 7 Dec 2019 22:51:04 +0000 (14:51 -0800)]
Merge tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Remove hugepage checks for reserved pfns (Ben Luo)
- Fix irq-bypass unregister ordering (Jiang Yi)
* tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio:
vfio/pci: call irq_bypass_unregister_producer() before freeing irq
vfio/type1: remove hugepage checks in is_invalid_reserved_pfn()
Linus Torvalds [Sat, 7 Dec 2019 22:49:20 +0000 (14:49 -0800)]
Merge tag 'for-linus-5.5b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:
- a patch to fix a build warning
- a cleanup of no longer needed code in the Xen event handling
- a small series for the Xen grant driver avoiding high order
allocations and replacing an insane global limit by a per-call one
- a small series fixing Xen frontend/backend module referencing
* tag 'for-linus-5.5b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-blkback: allow module to be cleanly unloaded
xen/xenbus: reference count registered modules
xen/gntdev: switch from kcalloc() to kvcalloc()
xen/gntdev: replace global limit of mapped pages by limit per call
xen/gntdev: remove redundant non-zero check on ret
xen/events: remove event handling recursion detection
Linus Torvalds [Sat, 7 Dec 2019 22:43:46 +0000 (14:43 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc Kconfig updates from Andrew Morton:
"A number of changes to Kconfig files under lib/ from Changbin Du and
Krzysztof Kozlowski"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
lib/: fix Kconfig indentation
kernel-hacking: move DEBUG_FS to 'Generic Kernel Debugging Instruments'
kernel-hacking: move DEBUG_BUGVERBOSE to 'printk and dmesg options'
kernel-hacking: create a submenu for scheduler debugging options
kernel-hacking: move SCHED_STACK_END_CHECK after DEBUG_STACK_USAGE
kernel-hacking: move Oops into 'Lockups and Hangs'
kernel-hacking: move kernel testing and coverage options to same submenu
kernel-hacking: group kernel data structures debugging together
kernel-hacking: create submenu for arch special debugging options
kernel-hacking: group sysrq/kgdb/ubsan into 'Generic Kernel Debugging Instruments'
Heiner Kallweit [Sat, 7 Dec 2019 21:21:52 +0000 (22:21 +0100)]
r8169: fix rtl_hw_jumbo_disable for RTL8168evl
In referenced fix we removed the RTL8168e-specific jumbo config for
RTL8168evl in rtl_hw_jumbo_enable(). We have to do the same in
rtl_hw_jumbo_disable().
v2: fix referenced commit id
Fixes: 0b4ab0d1cf54 ("r8169: fix jumbo configuration for RTL8168evl") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 7 Dec 2019 21:53:09 +0000 (13:53 -0800)]
pipe: don't use 'pipe_wait() for basic pipe IO
pipe_wait() may be simple, but since it relies on the pipe lock, it
means that we have to do the wakeup while holding the lock. That's
unfortunate, because the very first thing the waked entity will want to
do is to get the pipe lock for itself.
So get rid of the pipe_wait() usage by simply releasing the pipe lock,
doing the wakeup (if required) and then using wait_event_interruptible()
to wait on the right condition instead.
wait_event_interruptible() handles races on its own by comparing the
wakeup condition before and after adding itself to the wait queue, so
you can use an optimistic unlocked condition for it.
Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 7 Dec 2019 21:21:01 +0000 (13:21 -0800)]
pipe: remove 'waiting_writers' merging logic
This code is ancient, and goes back to when we only had a single page
for the pipe buffers. The exact history is hidden in the mists of time
(ie "before git", and in fact predates the BK repository too).
At that long-ago point in time, it actually helped to try to merge big
back-and-forth pipe reads and writes, and not limit pipe reads to the
single pipe buffer in length just because that was all we had at a time.
However, since then we've expanded the pipe buffers to multiple pages,
and this logic really doesn't seem to make sense. And a lot of it is
somewhat questionable (ie "hmm, the user asked for a non-blocking read,
but we see that there's a writer pending, so let's wait anyway to get
the extra data that the writer will have").
But more importantly, it makes the "go to sleep" logic much less
obvious, and considering the wakeup issues we've had, I want to make for
less of those kinds of things.
Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 7 Dec 2019 20:54:26 +0000 (12:54 -0800)]
pipe: fix and clarify pipe read wakeup logic
This is the read side version of the previous commit: it simplifies the
logic to only wake up waiting writers when necessary, and makes sure to
use a synchronous wakeup. This time not so much for GNU make jobserver
reasons (that pipe never fills up), but simply to get the writer going
quickly again.
A bit less verbose commentary this time, if only because I assume that
the write side commentary isn't going to be ignored if you touch this
code.
Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 7 Dec 2019 20:14:28 +0000 (12:14 -0800)]
pipe: fix and clarify pipe write wakeup logic
The pipe rework ends up having been extra painful, partly becaused of
actual bugs with ordering and caching of the pipe state, but also
because of subtle performance issues.
In particular, the pipe rework caused the kernel build to inexplicably
slow down.
The reason turns out to be that the GNU make jobserver (which limits the
parallelism of the build) uses a pipe to implement a "token" system: a
parallel submake will read a character from the pipe to get the job
token before starting a new job, and will write a character back to the
pipe when it is done. The overall job limit is thus easily controlled
by just writing the appropriate number of initial token characters into
the pipe.
But to work well, that really means that the old behavior of write
wakeups being synchronous (WF_SYNC) is very important - when the pipe
writer wakes up a reader, we want the reader to actually get scheduled
immediately. Otherwise you lose the parallelism of the build.
The pipe rework lost that synchronous wakeup on write, and we had
clearly all forgotten the reasons and rules for it.
This rewrites the pipe write wakeup logic to do the required Wsync
wakeups, but also clarifies the logic and avoids extraneous wakeups.
It also ends up addign a number of comments about what oit does and why,
so that we hopefully don't end up forgetting about this next time we
change this code.
Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiner Kallweit [Fri, 6 Dec 2019 22:27:15 +0000 (23:27 +0100)]
r8169: add missing RX enabling for WoL on RTL8125
RTL8125 also requires to enable RX for WoL.
v2: add missing Fixes tag
Fixes: 80025be3880f ("r8169: add support for RTL8125") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 927de32bb50e ("net: phy: dp83867: move dt parsing to probe")
causes regression on TI dra71x-evm and dra72x-evm, where DP83867 PHY is
used in "rgmii-id" mode - the networking stops working.
Unfortunately, it's not enough to just move DT parsing code to .probe() as
it depends on phydev->interface value, which is set to correct value abter
the .probe() is completed and before calling .config_init(). So, RGMII
configuration can't be loaded from DT.
To fix and issue
- move RGMII validation code to .config_init()
- parse RGMII parameters in dp83867_of_init(), but consider them as
optional.
Fixes: 927de32bb50e ("net: phy: dp83867: move dt parsing to probe") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Now RX interrupt is triggered twice every time, because in
cpsw_rx_interrupt() it is asked first and then disabled. So there will be
pending interrupt always, when RX interrupt is enabled again in NAPI
handler.
Fix it by first disabling IRQ and then do ask.
Fixes: fe5774054a19 ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 6 Dec 2019 04:43:46 +0000 (20:43 -0800)]
inet: protect against too small mtu values.
syzbot was once again able to crash a host by setting a very small mtu
on loopback device.
Let's make inetdev_valid_mtu() available in include/net/ip.h,
and use it in ip_setup_cork(), so that we protect both ip_append_page()
and __ip_append_data()
Also add a READ_ONCE() when the device mtu is read.
Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
even if other code paths might write over this field.
Add a big comment in include/linux/netdevice.h about dev->mtu
needing READ_ONCE()/WRITE_ONCE() annotations.
Hopefully we will add the missing ones in followup patches.
Fixes: 52898418bfe6 ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 6 Dec 2019 03:39:02 +0000 (19:39 -0800)]
gre: refetch erspan header from skb->data after pskb_may_pull()
After pskb_may_pull() we should always refetch the header
pointers from the skb->data in case it got reallocated.
In gre_parse_header(), the erspan header is still fetched
from the 'options' pointer which is fetched before
pskb_may_pull().
Found this during code review of a KMSAN bug report.
Fixes: b10138767b42 ("net: ip_gre: use erspan key field for tunnel lookup") Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: William Tu <u9012063@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aditya Pakki [Thu, 5 Dec 2019 23:04:49 +0000 (17:04 -0600)]
pppoe: remove redundant BUG_ON() check in pppoe_pernet
Passing NULL to pppoe_pernet causes a crash via BUG_ON.
Dereferencing net in net_generici() also has the same effect. This patch
removes the redundant BUG_ON check on the same parameter.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Changbin Du [Sat, 7 Dec 2019 01:03:42 +0000 (17:03 -0800)]
kernel-hacking: group sysrq/kgdb/ubsan into 'Generic Kernel Debugging Instruments'
Patch series "hacking: make 'kernel hacking' menu better structurized", v3.
This series is a trivial improvment for the layout of 'kernel hacking'
configuration menu. Now we have many items in it which makes takes a
little time to look up them since they are not well structurized yet.
Early discussion is here:
https://lkml.org/lkml/2019/9/1/39
This patch (of 9):
Group generic kernel debugging instruments sysrq/kgdb/ubsan together
into a new submenu.
Link: http://lkml.kernel.org/r/20190909144453.3520-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 7 Dec 2019 18:41:17 +0000 (10:41 -0800)]
pipe: fix poll/select race introduced by the pipe rework
The kernel wait queues have a basic rule to them: you add yourself to
the wait-queue first, and then you check the things that you're going to
wait on. That avoids the races with the event you're waiting for.
The same goes for poll/select logic: the "poll_wait()" goes first, and
then you check the things you're polling for.
Of course, if you use locking, the ordering doesn't matter since the
lock will serialize with anything that changes the state you're looking
at. That's not the case here, though.
So move the poll_wait() first in pipe_poll(), before you start looking
at the pipe state.
Fixes: eb9e4cb12ab3 ("pipe: Use head and tail pointers for the ring, not cursor and length") Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfsd: depend on CRYPTO_MD5 for legacy client tracking
The legacy client tracking infrastructure of nfsd makes use of MD5 to
derive a client's recovery directory name. As the nfsd module doesn't
declare any dependency on CRYPTO_MD5, though, it may fail to allocate
the hash if the kernel was compiled without it. As a result, generation
of client recovery directories will fail with the following error:
NFSD: unable to generate recoverydir name
The explicit dependency on CRYPTO_MD5 was removed as redundant back in 27e240fdf617 (NFSD: Remove redundant "select" clauses in fs/Kconfig
2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit 006c174a09d1 (NFS: Fix the selection of security flavours in Kconfig) at
a later point.
Fix the issue by adding back an explicit dependency on CRYPTO_MD5.
Fixes: 006c174a09d1 (NFS: Fix the selection of security flavours in Kconfig) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSD fixing possible null pointer derefering in copy offload
Static checker revealed possible error path leading to possible
NULL pointer dereferencing.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 43222e02b5ea: ("NFSD introduce async copy feature") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Florian Fainelli [Sat, 23 Nov 2019 15:43:03 +0000 (07:43 -0800)]
MAINTAINERS: thermal: Eduardo's email is bouncing
The last two emails to Eduardo were returned with:
452 4.2.2 The email account that you tried to reach is over quota.
Please direct the recipient to
https://support.google.com/mail/?p=OverQuotaTemp j17sor626162wrq.49 -
gsmtp
YueHaibing [Wed, 13 Nov 2019 10:53:13 +0000 (18:53 +0800)]
thermal: power_allocator: Fix Kconfig warning
When do randbuiding, we got this:
WARNING: unmet direct dependencies detected for THERMAL_GOV_POWER_ALLOCATOR
Depends on [n]: THERMAL [=y] && ENERGY_MODEL [=n]
Selected by [y]:
- THERMAL_DEFAULT_GOV_POWER_ALLOCATOR [=y] && <choice>
The Kconfig option THERMAL_DEFAULT_GOV_POWER_ALLOCATOR selects the
THERMAL_GOV_POWER_ALLOCATOR but this one depends on the ENERGY_MODEL
which is not enabled.
Make THERMAL_DEFAULT_GOV_POWER_ALLOCATOR depend on THERMAL_GOV_POWER_ALLOCATOR
to fix this warning.
Suggested-by: Quentin Perret <qperret@google.com> Fixes: 197f8bff6c4a ("thermal: cpu_cooling: Migrate to using the EM framework") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20191113105313.41616-1-yuehaibing@huawei.com
====================
tcp: fix handling of stale syncookies timestamps
The synflood timestamps (->ts_recent_stamp and ->synq_overflow_ts) are
only refreshed when the syncookie protection triggers. Therefore, their
value can become very far apart from jiffies if no synflood happens for
a long time.
If jiffies grows too much and wraps while the synflood timestamp isn't
refreshed, then time_after32() might consider the later to be in the
future. This can trick tcp_synq_no_recent_overflow() into returning
erroneous values and rejecting valid ACKs.
Patch 1 handles the case of ACKs using legitimate syncookies.
Patch 2 handles the case of stray ACKs.
Patch 3 annotates lockless timestamp operations with READ_ONCE() and
WRITE_ONCE().
Changes from v3:
- Fix description of time_between32() (found by Eric Dumazet).
- Use more accurate Fixes tag in patch 3 (suggested by Eric Dumazet).
Changes from v2:
- Define and use time_between32() instead of a pair of
time_before32/time_after32 (suggested by Eric Dumazet).
- Use 'last_overflow - HZ' as lower bound in
tcp_synq_no_recent_overflow(), to accommodate for concurrent
timestamp updates (found by Eric Dumazet).
- Add a third patch to annotate lockless accesses to .ts_recent_stamp.
Changes from v1:
- Initialising timestamps at socket creation time is not enough
because jiffies wraps in 24 days with HZ=1000 (Eric Dumazet).
Handle stale timestamps in tcp_synq_overflow() and
tcp_synq_no_recent_overflow() instead.
- Rework commit description.
- Add a second patch to handle the case of stray ACKs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 6 Dec 2019 11:38:49 +0000 (12:38 +0100)]
tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
timestamp of the last synflood. Protect them with READ_ONCE() and
WRITE_ONCE() since reads and writes aren't serialised.
Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
introduced by 8ed8e9f8fe4f ("syncookies: remove last_synq_overflow from
struct tcp_sock"). But unprotected accesses were already there when
timestamp was stored in .last_synq_overflow.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 6 Dec 2019 11:38:43 +0000 (12:38 +0100)]
tcp: tighten acceptance of ACKs not matching a child socket
When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.
That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.
Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().
However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.
In practice, using last_overflow as lower bound is problematic.
If the synflood timestamp is concurrently updated between the time
we read jiffies and the moment we store the timestamp in
'last_overflow', then 'now' becomes smaller than 'last_overflow' and
tcp_synq_no_recent_overflow() returns true, potentially dropping a
valid syncookie.
Reading jiffies after loading the timestamp could fix the problem,
but that'd require a memory barrier. Let's just accommodate for
potential timestamp growth instead and extend the interval using
'last_overflow - HZ' as lower bound.
Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 6 Dec 2019 11:38:36 +0000 (12:38 +0100)]
tcp: fix rejected syncookies due to stale timestamps
If no synflood happens for a long enough period of time, then the
synflood timestamp isn't refreshed and jiffies can advance so much
that time_after32() can't accurately compare them any more.
Therefore, we can end up in a situation where time_after32(now,
last_overflow + HZ) returns false, just because these two values are
too far apart. In that case, the synflood timestamp isn't updated as
it should be, which can trick tcp_synq_no_recent_overflow() into
rejecting valid syncookies.
For example, let's consider the following scenario on a system
with HZ=1000:
* The synflood timestamp is 0, either because that's the timestamp
of the last synflood or, more commonly, because we're working with
a freshly created socket.
* We receive a new SYN, which triggers synflood protection. Let's say
that this happens when jiffies == 2147484649 (that is,
'synflood timestamp' + HZ + 2^31 + 1).
* Then tcp_synq_overflow() doesn't update the synflood timestamp,
because time_after32(2147484649, 1000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 1000: the value of 'last_overflow' + HZ.
* A bit later, we receive the ACK completing the 3WHS. But
cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
says that we're not under synflood. That's because
time_after32(2147484649, 120000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
Of course, in reality jiffies would have increased a bit, but this
condition will last for the next 119 seconds, which is far enough
to accommodate for jiffie's growth.
Fix this by updating the overflow timestamp whenever jiffies isn't
within the [last_overflow, last_overflow + HZ] range. That shouldn't
have any performance impact since the update still happens at most once
per second.
Now we're guaranteed to have fresh timestamps while under synflood, so
tcp_synq_no_recent_overflow() can safely use it with time_after32() in
such situations.
Stale timestamps can still make tcp_synq_no_recent_overflow() return
the wrong verdict when not under synflood. This will be handled in the
next patch.
For 64 bits architectures, the problem was introduced with the
conversion of ->tw_ts_recent_stamp to 32 bits integer by commit d52d4bbccbc3 ("tcp: use monotonic timestamps for PAWS").
The problem has always been there on 32 bits architectures.
Fixes: d52d4bbccbc3 ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Please pull and let me know if there is any problem.
For -stable v4.19:
('net/mlx5e: Query global pause state before setting prio2buffer')
For -stable v5.3
('net/mlx5e: Fix SFF 8472 eeprom length')
('net/mlx5e: Fix translation of link mode into speed')
('net/mlx5e: Fix freeing flow with kfree() and not kvfree()')
('net/mlx5e: ethtool, Fix analysis of speed setting')
('net/mlx5e: Fix TXQ indices to be sequential')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We may have found a bug in the nxp/lpc_eth.c driver. The function
platform_set_drvdata() is called twice, the second time it is called,
in lpc_mii_init(), it overwrites the struct net_device which should be
at pdev->dev->driver_data with pldat->mii_bus. When trying to remove
the driver, in lpc_eth_drv_remove(), platform_get_drvdata() will
return the pldat->mii_bus pointer and try to use it as a struct
net_device pointer. This causes unregister_netdev to segfault and
generate a kernel BUG. Is this reproducible?
Signed-off-by: Daniel Martinez <linux@danielsmartinez.com> Signed-off-by: Bruno Carneiro da Cunha <brunocarneirodacunha@usp.br> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 7 Dec 2019 04:45:09 +0000 (20:45 -0800)]
Merge branch 'net-tc-indirect-block-relay'
John Hurley says:
====================
Ensure egress un/bind are relayed with indirect blocks
On register and unregister for indirect blocks, a command is called that
sends a bind/unbind event to the registering driver. This command assumes
that the bind to indirect block will be on ingress. However, drivers such
as NFP have allowed binding to clsact qdiscs as well as ingress qdiscs
from mainline Linux 5.2. A clsact qdisc binds to an ingress and an egress
block.
Rather than assuming that an indirect bind is always ingress, modify the
function names to remove the ingress tag (patch 1). In cls_api, which is
used by NFP to offload TC flower, generate bind/unbind message for both
ingress and egress blocks on the event of indirectly
registering/unregistering from that block. Doing so mimics the behaviour
of both ingress and clsact qdiscs on initialise and destroy.
This now ensures that drivers such as NFP receive the correct binder type
for the indirect block registration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Thu, 5 Dec 2019 17:03:35 +0000 (17:03 +0000)]
net: sched: allow indirect blocks to bind to clsact in TC
When a device is bound to a clsact qdisc, bind events are triggered to
registered drivers for both ingress and egress. However, if a driver
registers to such a device using the indirect block routines then it is
assumed that it is only interested in ingress offload and so only replays
ingress bind/unbind messages.
The NFP driver supports the offload of some egress filters when
registering to a block with qdisc of type clsact. However, on unregister,
if the block is still active, it will not receive an unbind egress
notification which can prevent proper cleanup of other registered
callbacks.
Modify the indirect block callback command in TC to send messages of
ingress and/or egress bind depending on the qdisc in use. NFP currently
supports egress offload for TC flower offload so the changes are only
added to TC.
Fixes: 2217f7813a24 ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Thu, 5 Dec 2019 17:03:34 +0000 (17:03 +0000)]
net: core: rename indirect block ingress cb function
With indirect blocks, a driver can register for callbacks from a device
that is does not 'own', for example, a tunnel device. When registering to
or unregistering from a new device, a callback is triggered to generate
a bind/unbind event. This, in turn, allows the driver to receive any
existing rules or to properly clean up installed rules.
When first added, it was assumed that all indirect block registrations
would be for ingress offloads. However, the NFP driver can, in some
instances, support clsact qdisc binds for egress offload.
Change the name of the indirect block callback command in flow_offload to
remove the 'ingress' identifier from it. While this does not change
functionality, a follow up patch will implement a more more generic
callback than just those currently just supporting ingress offload.
Fixes: 2217f7813a24 ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jouni Hogander [Thu, 5 Dec 2019 13:57:07 +0000 (15:57 +0200)]
net-sysfs: Call dev_hold always in netdev_queue_add_kobject
Dev_hold has to be called always in netdev_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.
Fixes: f20a305eb122 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject") Reported-by: Hulk Robot <hulkci@huawei.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Miller <davem@davemloft.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9125ec315565 ("net-next: dsa: fix flow dissection") added an
ability to override protocol and network offset during flow dissection
for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
in order to fix skb hashing for RPS on Rx path.
However, skb_hash() and added part of code can be invoked not only on
Rx, but also on Tx path if we have a multi-queued device and:
- kernel is running on UP system or
- XPS is not configured.
The call stack in this two cases will be like: dev_queue_xmit() ->
__dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
skb_tx_hash() -> skb_get_hash().
The problem is that skbs queued for Tx have both network offset and
correct protocol already set up even after inserting a CPU tag by DSA
tagger, so calling tag_ops->flow_dissect() on this path actually only
breaks flow dissection and hashing.
This can be observed by adding debug prints just before and right after
tag_ops->flow_dissect() call to the related block of code:
In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
to prevent code from calling tag_ops->flow_dissect() on Tx.
I also decided to initialize 'offset' variable so tagger callbacks can
now safely leave it untouched without provoking a chaos.
Fixes: 9125ec315565 ("net-next: dsa: fix flow dissection") Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Valentin Vidic [Thu, 5 Dec 2019 06:41:18 +0000 (07:41 +0100)]
net/tls: Fix return values to avoid ENOTSUPP
ENOTSUPP is not available in userspace, for example:
setsockopt failed, 524, Unknown error 524
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 7 Dec 2019 00:12:39 +0000 (16:12 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- fix CPU topology setup for SCHED_MC case
- fix VDSO regression
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8947/1: Fix __arch_get_hw_counter() access to CNTVCT
ARM: 8943/1: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC
Linus Torvalds [Fri, 6 Dec 2019 22:19:37 +0000 (14:19 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A set of fixes that we've merged late, but for the most part that have
been sitting in -next for a while through platform maintainer trees:
- Fixes to suspend/resume on Tegra, caused by the added features this
merge window
- Cleanups and minor fixes to TI additions this merge window
- Tee fixes queued up late before the merge window, included here.
- A handful of other fixlets
There's also a refresh of the shareed config files (multi_v* on
32-bit, and defconfig on 64-bit), to avoid conflicts when we get new
contributions"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
ARM: multi_v7_defconfig: Restore debugfs support
ARM: defconfig: re-run savedefconfig on multi_v* configs
arm64: defconfig: re-run savedefconfig
ARM: pxa: Fix resource properties
soc: mediatek: cmdq: fixup wrong input order of write api
soc: aspeed: Fix snoop_file_poll()'s return type
MAINTAINERS: Switch to Marvell addresses
MAINTAINERS: update Cavium ThunderX drivers
Revert "arm64: dts: juno: add dma-ranges property"
MAINTAINERS: Make Nicolas Saenz Julienne the new bcm2835 maintainer
firmware: arm_scmi: Avoid double free in error flow
arm64: dts: juno: Fix UART frequency
ARM: dts: Fix sgx sysconfig register for omap4
arm: socfpga: execute cold reboot by default
ARM: dts: Fix vcsi regulator to be always-on for droid4 to prevent hangs
ARM: dts: dra7: fix cpsw mdio fck clock
ARM: dts: am57xx-beagle-x15: Update pinmux name to ddr_3_3v
ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
soc/tegra: pmc: Add reset sources and levels on Tegra194
soc/tegra: pmc: Add missing IRQ callbacks on Tegra194
...
Linus Torvalds [Fri, 6 Dec 2019 22:18:01 +0000 (14:18 -0800)]
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- ZONE_DMA32 initialisation fix when memblocks fall entirely within the
first GB (used by ZONE_DMA in 5.5 for Raspberry Pi 4).
- Couple of ftrace fixes following the FTRACE_WITH_REGS patchset.
- access_ok() fix for the Tagged Address ABI when called from from a
kernel thread (asynchronous I/O): the kthread does not have the TIF
flags of the mm owner, so untag the user address unconditionally.
- KVM compute_layout() called before the alternatives code patching.
- Minor clean-ups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: entry: refine comment of stack overflow check
arm64: ftrace: fix ifdeffery
arm64: KVM: Invoke compute_layout() before alternatives are applied
arm64: Validate tagged addresses in access_ok() called from kernel threads
arm64: mm: Fix column alignment for UXN in kernel_page_tables
arm64: insn: consistently handle exit text
arm64: mm: Fix initialisation of DMA zones on non-NUMA systems
David Howells [Fri, 6 Dec 2019 21:34:51 +0000 (21:34 +0000)]
pipe: Fix iteration end check in fuse_dev_splice_write()
Fix the iteration end check in fuse_dev_splice_write(). The iterator
position can only be compared with == or != since wrappage may be involved.
Fixes: eb9e4cb12ab3 ("pipe: Use head and tail pointers for the ring, not cursor and length") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Dec 2019 21:36:31 +0000 (13:36 -0800)]
Merge tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"A few commits splitting the KASAN instrumented bitops header in three,
to match the split of the asm-generic bitops headers.
This is needed on powerpc because we use the generic bitops for the
non-atomic case only, whereas the existing KASAN instrumented bitops
assume all the underlying operations are provided by the arch as
arch_foo() versions.
Thanks to: Daniel Axtens & Christophe Leroy"
* tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
docs/core-api: Remove possibly confusing sub-headings from Bit Operations
powerpc: support KASAN instrumentation of bitops
kasan: support instrumented bitops combined with generic bitops
Linus Torvalds [Fri, 6 Dec 2019 21:34:31 +0000 (13:34 -0800)]
Merge tag 'powerpc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for a regression introduced by our recent rework of cache
flushing on memory hotunplug.
Like several other arches, our VDSO clock_getres() needed a fix to
match the semantics of posix_get_hrtimer_res().
A fix for a boot crash on Power9 LPARs using PCI LSI interrupts.
A commit disabling use of the trace_imc PMU (not the core PMU) on
Power9 systems, because it can lead to checkstops, until a workaround
is developed.
A handful of other minor fixes.
Thanks to: Aneesh Kumar K.V, Anju T Sudhakar, Ard Biesheuvel,
Christophe Leroy, Cédric Le Goater, Madhavan Srinivasan, Vincenzo
Frascino"
* tag 'powerpc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Disable trace_imc pmu
powerpc/powernv: Avoid re-registration of imc debugfs directory
powerpc/pmem: Convert to EXPORT_SYMBOL_GPL
powerpc/archrandom: fix arch_get_random_seed_int()
powerpc: Fix vDSO clock_getres()
powerpc/pmem: Fix kernel crash due to wrong range value usage in flush_dcache_range
powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
powerpc/kasan: Fix boot failure with RELOCATABLE && FSL_BOOKE
Linus Torvalds [Fri, 6 Dec 2019 20:40:35 +0000 (12:40 -0800)]
pipe: fix incorrect caching of pipe state over pipe_wait()
Similarly to commit 175891ce5b7d ("pipe: Fix missing mask update after
pipe_wait()") this fixes a case where the pipe rewrite ended up caching
the pipe state incorrectly over a pipe lock drop event.
It wasn't quite as obvious, because you needed to splice data from a
pipe to a file, which is a fairly unusual operation, but it's completely
wrong.
Make sure we load the pipe head/tail/size information only after we've
waited for there to be data in the pipe.
While in that file, also make one of the splice helper functions use the
canonical arghument order for pipe_empty(). That's syntactic - pipe
emptiness is just that head and tail are equal, and thus mixing up head
and tail doesn't really matter. It's still wrong, though.
Reported-by: David Sterba <dsterba@suse.cz> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve French [Fri, 6 Dec 2019 08:02:38 +0000 (02:02 -0600)]
smb3: fix mode passed in on create for modetosid mount option
When using the special SID to store the mode bits in an ACE (See
http://technet.microsoft.com/en-us/library/hh509017(v=ws.10).aspx)
which is enabled with mount parm "modefromsid" we were not
passing in the mode via SMB3 create (although chmod was enabled).
SMB3 create allows a security descriptor context to be passed
in (which is more atomic and thus preferable to setting the mode
bits after create via a setinfo).
This patch enables setting the mode bits on create when using
modefromsid mount option. In addition it fixes an endian
error in the definition of the Control field flags in the SMB3
security descriptor. It also makes the ACE type of the special
SID better match the documentation (and behavior of servers
which use this to store mode bits in SMB3 ACLs).
Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Eric Dumazet [Fri, 6 Dec 2019 17:38:36 +0000 (09:38 -0800)]
net: avoid an indirect call in ____sys_recvmsg()
CONFIG_RETPOLINE=y made indirect calls expensive.
gcc seems to add an indirect call in ____sys_recvmsg().
Rewriting the code slightly makes sure to avoid this indirection.
Alternative would be to not call sock_recvmsg() and instead
use security_socket_recvmsg() and sock_recvmsg_nosec(),
but this is less readable IMO.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: David Laight <David.Laight@aculab.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Fri, 6 Dec 2019 05:25:48 +0000 (05:25 +0000)]
tipc: fix ordering of tipc module init and exit routine
In order to set/get/dump, the tipc uses the generic netlink
infrastructure. So, when tipc module is inserted, init function
calls genl_register_family().
After genl_register_family(), set/get/dump commands are immediately
allowed and these callbacks internally use the net_generic.
net_generic is allocated by register_pernet_device() but this
is called after genl_register_family() in the __init function.
So, these callbacks would use un-initialized net_generic.
Test commands:
#SHELL1
while :
do
modprobe tipc
modprobe -rv tipc
done
Fixes: 4406d659b313 ("tipc: fix a slab object leak") Fixes: 93e14f2daba5 ("tipc: make subscriber server support net namespace") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When user runs a command like
tc qdisc add dev eth1 root mqprio
KASAN stack-out-of-bounds warning is emitted.
Currently, NLA_ALIGN macro used in mqprio_dump provides too large
buffer size as argument for nla_put and memcpy down the call stack.
The flow looks like this:
1. nla_put expects exact object size as an argument;
2. Later it provides this size to memcpy;
3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
macro itself.
Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
Otherwise it will lead to out-of-bounds memory access in memcpy.
Fixes: 2077be55ced8 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jongsung Kim [Fri, 6 Dec 2019 11:40:00 +0000 (20:40 +0900)]
net: stmmac: reset Tx desc base address before restarting Tx
Refer to the databook of DesignWare Cores Ethernet MAC Universal:
6.2.1.5 Register 4 (Transmit Descriptor List Address Register
If this register is not changed when the ST bit is set to 0, then
the DMA takes the descriptor address where it was stopped earlier.
The stmmac_tx_err() does zero indices to Tx descriptors, but does
not reset HW current Tx descriptor address. To fix inconsistency,
the base address of the Tx descriptors should be rewritten before
restarting Tx.
Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Fri, 6 Dec 2019 09:53:35 +0000 (17:53 +0800)]
enetc: disable EEE autoneg by default
The EEE support has not been enabled on ENETC, but it may connect
to a PHY which supports EEE and advertises EEE by default, while
its link partner also advertises EEE. If this happens, the PHY enters
low power mode when the traffic rate is low and causes packet loss.
This patch disables EEE advertisement by default for any PHY that
ENETC connects to, to prevent the above unwanted outcome.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 6 Dec 2019 18:28:09 +0000 (10:28 -0800)]
Merge tag 'drm-next-2019-12-06' of git://anongit.freedesktop.org/drm/drm
Pull more drm updates from Dave Airlie:
"Rob pointed out I missed his pull request for msm-next, it's been in
next for a while outside of my tree so shouldn't cause any unexpected
issues, it has some OCMEM support in drivers/soc that is acked by
other maintainers as it's outside my tree.
Otherwise it's a usual fixes pull, i915, amdgpu, the main ones, with
some tegra, omap, mgag200 and one core fix.
Summary:
msm-next:
- OCMEM support for a3xx and a4xx GPUs.
- a510 support + display support
core:
- mst payload deletion fix
i915:
- uapi alignment fix
- fix for power usage regression due to security fixes
- change default preemption timeout to 640ms from 100ms
- EHL voltage level display fixes
- TGL DGL PHY fix
- gvt - MI_ATOMIC cmd parser fix, CFL non-priv warning
- CI spotted deadlock fix
- EHL port D programming fix
amdgpu:
- VRAM lost fixes on BACO for CI/VI
- navi14 DC fixes
- misc SR-IOV, gfx10 fixes
- XGMI fixes for arcturus
- SRIOV fixes
amdkfd:
- KFD on ppc64le enabled
- page table optimisations
radeon:
- fix for r1xx/2xx register checker.
tegra:
- displayport regression fixes
- DMA API regression fixes
mgag200:
- fix devices that can't scanout except at 0 addr
omap:
- fix dma_addr refcounting"
* tag 'drm-next-2019-12-06' of git://anongit.freedesktop.org/drm/drm: (100 commits)
drm/dp_mst: Correct the bug in drm_dp_update_payload_part1()
drm/omap: fix dma_addr refcounting
drm/tegra: Run hub cleanup on ->remove()
drm/tegra: sor: Make the +5V HDMI supply optional
drm/tegra: Silence expected errors on IOMMU attach
drm/tegra: vic: Export module device table
drm/tegra: sor: Implement system suspend/resume
drm/tegra: Use proper IOVA address for cursor image
drm/tegra: gem: Remove premature import restrictions
drm/tegra: gem: Properly pin imported buffers
drm/tegra: hub: Remove bogus connection mutex check
ia64: agp: Replace empty define with do while
agp: Add bridge parameter documentation
agp: remove unused variable num_segments
agp: move AGPGART_MINOR to include/linux/miscdevice.h
agp: remove unused variable size in agp_generic_create_gatt_table
drm/dp_mst: Fix build on systems with STACKTRACE_SUPPORT=n
drm/radeon: fix r1xx/r2xx register checker for POT textures
drm/amdgpu: fix GFX10 missing CSIB set(v3)
drm/amdgpu: should stop GFX ring in hw_fini
...
Linus Torvalds [Fri, 6 Dec 2019 18:08:59 +0000 (10:08 -0800)]
Merge tag 'for-linus-20191205' of git://git.kernel.dk/linux-block
Pull more block and io_uring updates from Jens Axboe:
"I wasn't expecting this to be so big, and if I was, I would have used
separate branches for this. Going forward I'll be doing separate
branches for the current tree, just like for the next kernel version
tree. In any case, this contains:
- Series from Christoph that fixes an inherent race condition with
zoned devices and revalidation.
- null_blk zone size fix (Damien)
- Fix for a regression in this merge window that caused busy spins by
sending empty disk uevents (Eric)
- Fix for a regression in this merge window for bfq stats (Hou)
- Fix for io_uring creds allocation failure handling (me)
- io_uring -ERESTARTSYS send/recvmsg fix (me)
- Series that fixes the need for applications to retain state across
async request punts for io_uring. This one is a bit larger than I
would have hoped, but I think it's important we get this fixed for
5.5.
- connect(2) improvement for io_uring, handling EINPROGRESS instead
of having applications needing to poll for it (me)
- Have io_uring use a hash for poll requests instead of an rbtree.
This turned out to work much better in practice, so I think we
should make the switch now. For some workloads, even with a fair
amount of cancellations, the insertion sort is just too expensive.
(me)
- Various little io_uring fixes (me, Jackie, Pavel, LimingWu)
- Fix for brd unaligned IO, and a warning for the future (Ming)
- Fix for a bio integrity data leak (Justin)
- bvec_iter_advance() improvement (Pavel)
- Xen blkback page unmap fix (SeongJae)
The major items in here are all well tested, and on the liburing side
we continue to add regression and feature test cases. We're up to 50
topic cases now, each with anywhere from 1 to more than 10 cases in
each"
* tag 'for-linus-20191205' of git://git.kernel.dk/linux-block: (33 commits)
block: fix memleak of bio integrity data
io_uring: fix a typo in a comment
bfq-iosched: Ensure bio->bi_blkg is valid before using it
io_uring: hook all linked requests via link_list
io_uring: fix error handling in io_queue_link_head
io_uring: use hash table for poll command lookups
io-wq: clear node->next on list deletion
io_uring: ensure deferred timeouts copy necessary data
io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT
null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
brd: warn on un-aligned buffer
brd: remove max_hw_sectors queue limit
xen/blkback: Avoid unmapping unmapped grant pages
io_uring: handle connect -EINPROGRESS like -EAGAIN
block: set the zone size in blk_revalidate_disk_zones atomically
block: don't handle bio based drivers in blk_revalidate_disk_zones
block: allocate the zone bitmaps lazily
block: replace seq_zones_bitmap with conv_zones_bitmap
block: simplify blkdev_nr_zones
block: remove the empty line at the end of blk-zoned.c
...
Linus Torvalds [Fri, 6 Dec 2019 17:06:58 +0000 (09:06 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
"Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
Basically, the problem is that negative pinned dentries require
careful treatment - unless ->d_lock is locked or parent is held at
least shared, another thread can make them positive right under us.
Most of the uses turned out to be safe - the main surprises as far as
filesystems are concerned were
- race in dget_parent() fastpath, that might end up with the caller
observing the returned dentry _negative_, due to insufficient
barriers. It is positive in memory, but we could end up seeing the
wrong value of ->d_inode in CPU cache. Fixed.
- manual checks that result of lookup_one_len_unlocked() is positive
(and rejection of negatives). Again, insufficient barriers (we
might end up with inconsistent observed values of ->d_inode and
->d_flags). Fixed by switching to a new primitive that does the
checks itself and returns ERR_PTR(-ENOENT) instead of a negative
dentry. That way we get rid of boilerplate converting negatives
into ERR_PTR(-ENOENT) in the callers and have a single place to
deal with the barrier-related mess - inside fs/namei.c rather than
in every caller out there.
The guts of pathname resolution *do* need to be careful - the race
found by Ritesh is real, as well as several similar races.
Fortunately, it turns out that we can take care of that with fairly
local changes in there.
The tree-wide audit had not been fun, and I hate the idea of repeating
it. I think the right approach would be to annotate the places where
we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
catch regressions. But I'm still not sure what would be the least
invasive way of doing that and it's clearly the next cycle fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/namei.c: fix missing barriers when checking positivity
fix dget_parent() fastpath race
new helper: lookup_positive_unlocked()
fs/namei.c: pull positivity check into follow_managed()
Olof Johansson [Fri, 6 Dec 2019 16:29:54 +0000 (08:29 -0800)]
Merge tag 'arm-soc/for-5.5/devicetree-part2' of https://github.com/Broadcom/stblinux into arm/fixes
This pull request contains the second batch of changes for Broadcom
ARM-based SoCs, please pull the following:
- Nicolas declares a CMA area within the first 1GB of DRAM in order for
it to be guaranteed to reside there, otherwise ARM64's memory
initialization will pick up a CMA area within ZONE_DMA32
- Stefan adds the Device Tree node for the built-in Ethernet controller
(GENET) on the Raspberry Pi 4 model B board
* tag 'arm-soc/for-5.5/devicetree-part2' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm2711-rpi-4: Enable GENET support
ARM: dts: bcm2711: force CMA into first GB of memory