Mathy Vanhoef [Mon, 19 Oct 2020 16:01:13 +0000 (20:01 +0400)]
mac80211: fix regression where EAPOL frames were sent in plaintext
When sending EAPOL frames via NL80211 they are treated as injected
frames in mac80211. Due to commit 02f4d16799f8 ("mac80211: never drop
injected frames even if normally not allowed") these injected frames
were not assigned a sta context in the function ieee80211_tx_dequeue,
causing certain wireless network cards to always send EAPOL frames in
plaintext. This may cause compatibility issues with some clients or
APs, which for instance can cause the group key handshake to fail and
in turn would cause the station to get disconnected.
This commit fixes this regression by assigning a sta context in
ieee80211_tx_dequeue to injected frames as well.
Note that sending EAPOL frames in plaintext is not a security issue
since they contain their own encryption and authentication protection.
Cc: stable@vger.kernel.org Fixes: 02f4d16799f8 ("mac80211: never drop injected frames even if normally not allowed") Reported-by: Thomas Deutschmann <whissi@gentoo.org> Tested-by: Christian Hesse <list@eworm.de> Tested-by: Thomas Deutschmann <whissi@gentoo.org> Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20201019160113.350912-1-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Thu, 29 Oct 2020 20:02:52 +0000 (13:02 -0700)]
Merge tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull fallthrough fix from Gustavo A. R. Silva:
"This fixes a ton of fall-through warnings when building with Clang
12.0.0 and -Wimplicit-fallthrough"
* tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
include: jhash/signal: Fix fall-through warnings for Clang
Linus Torvalds [Thu, 29 Oct 2020 19:55:02 +0000 (12:55 -0700)]
Merge tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff from an
IRQ handler on RT by forcing hard irqs, but this driver shares
legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix to
tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even if there
are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver and
device, after removal of 56G support from the driver 56G was not
cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since it
was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress false
positive audit messages"
* tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
r8169: fix issue with forced threading in combination with shared interrupts
netem: fix zero division in tabledist
ibmvnic: fix ibmvnic_set_mac
mptcp: add missing memory scheduling in the rx path
tipc: fix memory leak caused by tipc_buf_append()
gtp: fix an use-before-init in gtp_newlink()
net: protect tcf_block_unbind with block lock
ibmveth: Fix use of ibmveth in a bridge.
net/sched: act_mpls: Add softdep on mpls_gso.ko
ravb: Fix bit fields checking in ravb_hwtstamp_get()
devlink: Unlock on error in dumpit()
devlink: Fix some error codes
chelsio/chtls: fix memory leaks in CPL handlers
chelsio/chtls: fix deadlock issue
net: hns3: Clear the CMDQ registers before unmapping BAR region
bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
bnxt_en: Check abort error state in bnxt_open_nic().
bnxt_en: Re-write PCI BARs after PCI fatal error.
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
...
Linus Torvalds [Thu, 29 Oct 2020 18:50:59 +0000 (11:50 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"The good news is people are testing rc1 in the RDMA world - the bad
news is testing of the for-next area is not as good as I had hoped, as
we really should have caught at least the rdma_connect_locked() issue
before now.
Notable merge window regressions that didn't get caught/fixed in time
for rc1:
- Fix in kernel users of rxe, they were broken by the rapid fix to
undo the uABI breakage in rxe from another patch
- EFA userspace needs to read the GID table but was broken with the
new GID table logic
- Fix user triggerable deadlock in mlx5 using devlink reload
- Fix deadlock in several ULPs using rdma_connect from the CM handler
callbacks
- Memory leak in qedr"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/qedr: Fix memory leak in iWARP CM
RDMA: Add rdma_connect_locked()
RDMA/uverbs: Fix false error in query gid IOCTL
RDMA/mlx5: Fix devlink deadlock on net namespace deletion
RDMA/rxe: Fix small problem in network_type patch
Heiner Kallweit [Thu, 29 Oct 2020 09:18:53 +0000 (10:18 +0100)]
r8169: fix issue with forced threading in combination with shared interrupts
As reported by Serge flag IRQF_NO_THREAD causes an error if the
interrupt is actually shared and the other driver(s) don't have this
flag set. This situation can occur if a PCI(e) legacy interrupt is
used in combination with forced threading.
There's no good way to deal with this properly, therefore we have to
remove flag IRQF_NO_THREAD. For fixing the original forced threading
issue switch to napi_schedule().
Aleksandr Nogikh [Wed, 28 Oct 2020 17:07:31 +0000 (17:07 +0000)]
netem: fix zero division in tabledist
Currently it is possible to craft a special netlink RTM_NEWQDISC
command that can result in jitter being equal to 0x80000000. It is
enough to set the 32 bit jitter to 0x02000000 (it will later be
multiplied by 2^6) or just set the 64 bit jitter via
TCA_NETEM_JITTER64. This causes an overflow during the generation of
uniformly distributed numbers in tabledist(), which in turn leads to
division by zero (sigma != 0, but sigma * 2 is 0).
The related fragment of code needs 32-bit division - see commit bed8e2f ("netem: remove unnecessary 64 bit modulus"), so switching to
64 bit is not an option.
Fix the issue by keeping the value of jitter within the range that can
be adequately handled by tabledist() - [0;INT_MAX]. As negative std
deviation makes no sense, take the absolute value of the passed value
and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit
arithmetic in order to prevent overflows.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan [Tue, 27 Oct 2020 22:04:56 +0000 (17:04 -0500)]
ibmvnic: fix ibmvnic_set_mac
Jakub Kicinski brought up a concern in ibmvnic_set_mac().
ibmvnic_set_mac() does this:
ether_addr_copy(adapter->mac_addr, addr->sa_data);
if (adapter->state != VNIC_PROBED)
rc = __ibmvnic_set_mac(netdev, addr->sa_data);
So if state == VNIC_PROBED, the user can assign an invalid address to
adapter->mac_addr, and ibmvnic_set_mac() will still return 0.
The fix is to validate ethernet address at the beginning of
ibmvnic_set_mac(), and move the ether_addr_copy to
the case of "adapter->state != VNIC_PROBED".
include: jhash/signal: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, explicitly
add break statements instead of letting the code fall through to the
next case.
This patch adds four break statements that, together, fix almost 40,000
warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change
reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang,
such change[1] is meant to be reverted at some point. So, this patch helps
to move in that direction.
Something important to mention is that there is currently a discrepancy
between GCC and Clang when dealing with switch fall-through to empty case
statements or to cases that only contain a break/continue/return
statement[2][3][4].
Now that the -Wimplicit-fallthrough option has been globally enabled[5],
any compiler should really warn on missing either a fallthrough annotation
or any of the other case-terminating statements (break/continue/return/
goto) when falling through to the next case statement. Making exceptions
to this introduces variation in case handling which may continue to lead
to bugs, misunderstandings, and a general lack of robustness. The point
of enabling options like -Wimplicit-fallthrough is to prevent human error
and aid developers in spotting bugs before their code is even built/
submitted/committed, therefore eliminating classes of bugs. So, in order
to really accomplish this, we should, and can, move in the direction of
addressing any error-prone scenarios and get rid of the unintentional
fallthrough bug-class in the kernel, entirely, even if there is some minor
redundancy. Better to have explicit case-ending statements than continue to
have exceptions where one must guess as to the right result. The compiler
will eliminate any actual redundancy.
[1] commit 84bd867e54bef ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
[2] https://github.com/ClangBuiltLinux/linux/issues/636
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432
[4] https://godbolt.org/z/xgkvIh
[5] commit 9efd3326fae8 ("Makefile: Globally enable fall-through warning")
Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Linus Torvalds [Thu, 29 Oct 2020 17:13:09 +0000 (10:13 -0700)]
Merge tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Fix copy_file_range() to an afs file now returning EINVAL if the
splice_write file op isn't supplied.
- Fix a deref-before-check in afs_unuse_cell().
- Fix a use-after-free in afs_xattr_get_acl().
- Fix afs to not try to clear PG_writeback when laundering a page.
- Fix afs to take a ref on a page that it sets PG_private on and to
drop that ref when clearing PG_private. This is done through recently
added helpers.
- Fix a page leak if write_begin() fails.
- Fix afs_write_begin() to not alter the dirty region info stored in
page->private, but rather do this in afs_write_end() instead when we
know what we actually changed.
- Fix afs_invalidatepage() to alter the dirty region info on a page
when partial page invalidation occurs so that we don't inadvertantly
include a span of zeros that will get written back if a page gets
laundered due to a remote 3rd-party induced invalidation.
We mustn't, however, reduce the dirty region if the page has been
seen to be mapped (ie. we got called through the page_mkwrite vector)
as the page might still be mapped and we might lose data if the file
is extended again.
- Fix the dirty region info to have a lower resolution if the size of
the page is too large for this to be encoded (e.g. powerpc32 with 64K
pages).
Note that this might not be the ideal way to handle this, since it
may allow some leakage of undirtied zero bytes to the server's copy
in the case of a 3rd-party conflict.
To aid the last two fixes, two additional changes:
- Wrap the manipulations of the dirty region info stored in
page->private into helper functions.
- Alter the encoding of the dirty region so that the region bounds can
be stored with one fewer bit, making a bit available for the
indication of mappedness.
* tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix dirty-region encoding on ppc32 with 64K pages
afs: Fix afs_invalidatepage to adjust the dirty region
afs: Alter dirty range encoding in page->private
afs: Wrap page->private manipulations in inline functions
afs: Fix where page->private is set during write
afs: Fix page leak on afs_write_begin() failure
afs: Fix to take ref on page when PG_private is set
afs: Fix afs_launder_page to not clear PG_writeback
afs: Fix a use after free in afs_xattr_get_acl()
afs: Fix tracing deref-before-check
afs: Fix copy_file_range()
Tung Nguyen [Tue, 27 Oct 2020 03:24:03 +0000 (10:24 +0700)]
tipc: fix memory leak caused by tipc_buf_append()
Commit 69abd53036dd ("tipc: fix the skb_unshare() in tipc_buf_append()")
replaced skb_unshare() with skb_copy() to not reduce the data reference
counter of the original skb intentionally. This is not the correct
way to handle the cloned skb because it causes memory leak in 2
following cases:
1/ Sending multicast messages via broadcast link
The original skb list is cloned to the local skb list for local
destination. After that, the data reference counter of each skb
in the original list has the value of 2. This causes each skb not
to be freed after receiving ACK:
tipc_link_advance_transmq()
{
...
/* release skb */
__skb_unlink(skb, &l->transmq);
kfree_skb(skb); <-- memory exists after being freed
}
2/ Sending multicast messages via replicast link
Similar to the above case, each skb cannot be freed after purging
the skb list:
tipc_mcast_xmit()
{
...
__skb_queue_purge(pkts); <-- memory exists after being freed
}
This commit fixes this issue by using skb_unshare() instead. Besides,
to avoid use-after-free error reported by KASAN, the pointer to the
fragment is set to NULL before calling skb_unshare() to make sure that
the original skb is not freed after freeing the fragment 2 times in
case skb_unshare() returns NULL.
Fixes: 69abd53036dd ("tipc: fix the skb_unshare() in tipc_buf_append()") Acked-by: Jon Maloy <jmaloy@redhat.com> Reported-by: Thang Hoang Ngo <thang.h.ngo@dektech.com.au> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20201027032403.1823-1-tung.q.nguyen@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 29 Oct 2020 16:36:11 +0000 (09:36 -0700)]
Merge tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Bug fixes for the new ext4 fast commit feature, plus a fix for the
'data=journal' bug fix.
Also use the generic casefolding support which has now landed in
fs/libfs.c for 5.10"
* tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
ext4: use generic casefolding support
ext4: do not use extent after put_bh
ext4: use IS_ERR() for error checking of path
ext4: fix mmap write protection for data=journal mode
jbd2: fix a kernel-doc markup
ext4: use s_mount_flags instead of s_mount_state for fast commit state
ext4: make num of fast commit blocks configurable
ext4: properly check for dirty state in ext4_inode_datasync_dirty()
ext4: fix double locking in ext4_fc_commit_dentry_updates()
David Howells [Wed, 28 Oct 2020 12:08:39 +0000 (12:08 +0000)]
afs: Fix dirty-region encoding on ppc32 with 64K pages
The dirty region bounds stored in page->private on an afs page are 15 bits
on a 32-bit box and can, at most, represent a range of up to 32K within a
32K page with a resolution of 1 byte. This is a problem for powerpc32 with
64K pages enabled.
Further, transparent huge pages may get up to 2M, which will be a problem
for the afs filesystem on all 32-bit arches in the future.
Fix this by decreasing the resolution. For the moment, a 64K page will
have a resolution determined from PAGE_SIZE. In the future, the page will
need to be passed in to the helper functions so that the page size can be
assessed and the resolution determined dynamically.
Note that this might not be the ideal way to handle this, since it may
allow some leakage of undirtied zero bytes to the server's copy in the case
of a 3rd-party conflict. Fixing that would require a separately allocated
record and is a more complicated fix.
Fixes: 19554d2d2903 ("afs: Get rid of the afs_writeback record") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
David Howells [Thu, 22 Oct 2020 13:08:23 +0000 (14:08 +0100)]
afs: Fix afs_invalidatepage to adjust the dirty region
Fix afs_invalidatepage() to adjust the dirty region recorded in
page->private when truncating a page. If the dirty region is entirely
removed, then the private data is cleared and the page dirty state is
cleared.
Without this, if the page is truncated and then expanded again by truncate,
zeros from the expanded, but no-longer dirty region may get written back to
the server if the page gets laundered due to a conflicting 3rd-party write.
It mustn't, however, shorten the dirty region of the page if that page is
still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is
stored in page->private to record this.
Fixes: 19554d2d2903 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 26 Oct 2020 13:57:44 +0000 (13:57 +0000)]
afs: Alter dirty range encoding in page->private
Currently, page->private on an afs page is used to store the range of
dirtied data within the page, where the range includes the lower bound, but
excludes the upper bound (e.g. 0-1 is a range covering a single byte).
This, however, requires a superfluous bit for the last-byte bound so that
on a 4KiB page, it can say 0-4096 to indicate the whole page, the idea
being that having both numbers the same would indicate an empty range.
This is unnecessary as the PG_private bit is clear if it's an empty range
(as is PG_dirty).
Alter the way the dirty range is encoded in page->private such that the
upper bound is reduced by 1 (e.g. 0-0 is then specified the same single
byte range mentioned above).
Applying this to both bounds frees up two bits, one of which can be used in
a future commit.
This allows the afs filesystem to be compiled on ppc32 with 64K pages;
without this, the following warnings are seen:
../fs/afs/internal.h: In function 'afs_page_dirty_to':
../fs/afs/internal.h:881:15: warning: right shift count >= width of type [-Wshift-count-overflow]
881 | return (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;
| ^~
../fs/afs/internal.h: In function 'afs_page_dirty':
../fs/afs/internal.h:886:28: warning: left shift count >= width of type [-Wshift-count-overflow]
886 | return ((unsigned long)to << __AFS_PAGE_PRIV_SHIFT) | from;
| ^~
Fixes: 19554d2d2903 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 26 Oct 2020 13:22:47 +0000 (13:22 +0000)]
afs: Wrap page->private manipulations in inline functions
The afs filesystem uses page->private to store the dirty range within a
page such that in the event of a conflicting 3rd-party write to the server,
we write back just the bits that got changed locally.
However, there are a couple of problems with this:
(1) I need a bit to note if the page might be mapped so that partial
invalidation doesn't shrink the range.
(2) There aren't necessarily sufficient bits to store the entire range of
data altered (say it's a 32-bit system with 64KiB pages or transparent
huge pages are in use).
So wrap the accesses in inline functions so that future commits can change
how this works.
Also move them out of the tracing header into the in-directory header.
There's not really any need for them to be in the tracing header.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 26 Oct 2020 14:05:33 +0000 (14:05 +0000)]
afs: Fix where page->private is set during write
In afs, page->private is set to indicate the dirty region of a page. This
is done in afs_write_begin(), but that can't take account of whether the
copy into the page actually worked.
Fix this by moving the change of page->private into afs_write_end().
Fixes: 19554d2d2903 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 21 Oct 2020 12:22:19 +0000 (13:22 +0100)]
afs: Fix to take ref on page when PG_private is set
Fix afs to take a ref on a page when it sets PG_private on it and to drop
the ref when removing the flag.
Note that in afs_write_begin(), a lot of the time, PG_private is already
set on a page to which we're going to add some data. In such a case, we
leave the bit set and mustn't increment the page count.
As suggested by Matthew Wilcox, use attach/detach_page_private() where
possible.
Fixes: 6f37bcf4b639 ("AFS: implement basic file write support") Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Linus Torvalds [Wed, 28 Oct 2020 19:05:14 +0000 (12:05 -0700)]
Merge tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix synthetic event "strcat" overrun
New synthetic event code used strcat() and miscalculated the ending,
causing the concatenation to write beyond the allocated memory.
Instead of using strncat(), the code is switched over to seq_buf which
has all the mechanisms in place to protect against writing more than
what is allocated, and cleans up the code a bit"
* tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing, synthetic events: Replace buggy strcat() with seq_buf operations
Daniel Rosenberg [Wed, 28 Oct 2020 05:08:20 +0000 (05:08 +0000)]
ext4: use generic casefolding support
This switches ext4 over to the generic support provided in libfs.
Since casefolded dentries behave the same in ext4 and f2fs, we decrease
the maintenance burden by unifying them, and any optimizations will
immediately apply to both.
yangerkun [Wed, 28 Oct 2020 05:56:17 +0000 (13:56 +0800)]
ext4: do not use extent after put_bh
ext4_ext_search_right() will read more extent blocks and call put_bh
after we get the information we need. However, ret_ex will break this
and may cause use-after-free once pagecache has been freed. Fix it by
copying the extent structure if needed.
Jan Kara [Tue, 27 Oct 2020 13:27:51 +0000 (14:27 +0100)]
ext4: fix mmap write protection for data=journal mode
Commit 0779ed61efad "ext4: data=journal: write-protect pages on
j_submit_inode_data_buffers()") added calls ext4_jbd2_inode_add_write()
to track inode ranges whose mappings need to get write-protected during
transaction commits. However the added calls use wrong start of a range
(0 instead of page offset) and so write protection is not necessarily
effective. Use correct range start to fix the problem.
Fixes: 0779ed61efad ("ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201027132751.29858-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch reserves a field in the jbd2 superblock for number of fast
commit blocks. When this value is non-zero, Ext4 uses this field to
set the number of fast commit blocks.
Andrea Righi [Tue, 27 Oct 2020 04:49:13 +0000 (21:49 -0700)]
ext4: properly check for dirty state in ext4_inode_datasync_dirty()
ext4_inode_datasync_dirty() needs to return 'true' if the inode is
dirty, 'false' otherwise, but the logic seems to be incorrectly changed
by commit 95d2b4933dd9 ("ext4: main fast-commit commit path").
This introduces a problem with swap files that are always failing to be
activated, showing this error in dmesg:
Jason Gunthorpe [Mon, 26 Oct 2020 14:25:49 +0000 (11:25 -0300)]
RDMA: Add rdma_connect_locked()
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().
In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.
Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.
Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Fixes: e9d48fde9bd9 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leon Romanovsky [Mon, 26 Oct 2020 12:33:27 +0000 (14:33 +0200)]
net: protect tcf_block_unbind with block lock
The tcf_block_unbind() expects that the caller will take block->cb_lock
before calling it, however the code took RTNL lock and dropped cb_lock
instead. This causes to the following kernel panic.
The check for src mac address in ibmveth_is_packet_unsupported is wrong.
Commit 7dfc1cc642fe wanted to shut down messages for loopback packets,
but now suppresses bridged frames, which are accepted by the hypervisor
otherwise bridging won't work at all.
Fixes: 7dfc1cc642fe ("ibmveth: Detect unsupported packets before sending to the hypervisor") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrew Gabbasov [Mon, 26 Oct 2020 10:21:30 +0000 (05:21 -0500)]
ravb: Fix bit fields checking in ravb_hwtstamp_get()
In the function ravb_hwtstamp_get() in ravb_main.c with the existing
values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL
(0x6)
if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL)
config.rx_filter = HWTSTAMP_FILTER_ALL;
if the test on RAVB_RXTSTAMP_TYPE_ALL should be true,
it will never be reached.
This issue can be verified with 'hwtstamp_config' testing program
(tools/testing/selftests/net/hwtstamp_config.c). Setting filter type
to ALL and subsequent retrieving it gives incorrect value:
$ hwtstamp_config eth0 OFF ALL
flags = 0
tx_type = OFF
rx_filter = ALL
$ hwtstamp_config eth0
flags = 0
tx_type = OFF
rx_filter = PTP_V2_L2_EVENT
Correct this by converting if-else's to switch.
Fixes: bfafa9ee6042 ("Renesas Ethernet AVB driver proper") Reported-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Mon, 26 Oct 2020 08:01:27 +0000 (11:01 +0300)]
devlink: Unlock on error in dumpit()
This needs to unlock before returning.
Fixes: af97fc7e2a5d ("net: devlink: Add support for port regions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201026080127.GB1628785@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Mon, 26 Oct 2020 08:00:59 +0000 (11:00 +0300)]
devlink: Fix some error codes
These paths don't set the error codes. It's especially important in
devlink_nl_region_notify_build() where it leads to a NULL dereference in
the caller.
Fixes: af97fc7e2a5d ("net: devlink: Add support for port regions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201026080059.GA1628785@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CPL handler functions chtls_pass_open_rpl() and
chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE
so that caller function will do skb free to avoid leak.
In chtls_pass_establish() we hold child socket lock using bh_lock_sock
and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock.
Remove bh_lock_sock in add_to_reap_list() as lock is already held.
Dan Carpenter [Mon, 24 Aug 2020 08:58:12 +0000 (11:58 +0300)]
afs: Fix a use after free in afs_xattr_get_acl()
The "op" pointer is freed earlier when we call afs_put_operation().
Fixes: 590fdde7416c ("afs: Build an abstraction around an "operation" concept") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Colin Ian King <colin.king@canonical.com>
David Howells [Tue, 27 Oct 2020 10:42:56 +0000 (10:42 +0000)]
afs: Fix tracing deref-before-check
The patch c7e7548c3de8: "afs: Add tracing for cell refcount and active user
count" from Oct 13, 2020, leads to the following Smatch complaint:
fs/afs/cell.c:596 afs_unuse_cell()
warn: variable dereferenced before check 'cell' (see line 592)
Fix this by moving the retrieval of the cell debug ID to after the check of
the validity of the cell pointer.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: c7e7548c3de8 ("afs: Add tracing for cell refcount and active user count") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
David Howells [Tue, 27 Oct 2020 09:39:04 +0000 (09:39 +0000)]
afs: Fix copy_file_range()
The prevention of splice-write without explicit ops made the
copy_file_write() syscall to an afs file (as done by the generic/112
xfstest) fail with EINVAL.
Fix by using iter_file_splice_write() for afs.
Fixes: ec25677859f2 ("fs: don't allow splice read/write without explicit ops") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Tue, 27 Oct 2020 21:39:29 +0000 (14:39 -0700)]
Merge tag 'x86-urgent-2020-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A couple of x86 fixes which missed rc1 due to my stupidity:
- Drop lazy TLB mode before switching to the temporary address space
for text patching.
text_poke() switches to the temporary mm which clears the lazy mode
and restores the original mm afterwards. Due to clearing lazy mode
this might restore a already dead mm if exit_mmap() runs in
parallel on another CPU.
- Document the x32 syscall design fail vs. syscall numbers 512-547
properly.
- Fix the ORC unwinder to handle the inactive task frame correctly.
This was unearthed due to the slightly different code generation of
gcc-10.
- Use an up to date screen_info for the boot params of kexec instead
of the possibly stale and invalid version which happened to be
valid when the kexec kernel was loaded"
* tag 'x86-urgent-2020-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternative: Don't call text_poke() in lazy TLB mode
x86/syscalls: Document the fact that syscalls 512-547 are a legacy mistake
x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels
hyperv_fb: Update screen_info after removing old framebuffer
x86/kexec: Use up-to-dated screen_info copy to fill boot params
Linus Torvalds [Tue, 27 Oct 2020 19:42:44 +0000 (12:42 -0700)]
Merge tag 'orphan-handling-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull orphan section fixes from Kees Cook:
"A couple corner cases were found from the link-time orphan section
handling series:
- arm: handle .ARM.exidx and .ARM.extab sections (Nathan Chancellor)
- x86: collect .ctors.* with .ctors (Kees Cook)"
* tag 'orphan-handling-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm/build: Always handle .ARM.exidx and .ARM.extab sections
vmlinux.lds.h: Keep .ctors.* with .ctors
mm/process_vm_access.c: In function ‘process_vm_rw’:
mm/process_vm_access.c:277:5: error: implicit declaration of function ‘in_compat_syscall’ [-Werror=implicit-function-declaration]
277 | in_compat_syscall());
| ^~~~~~~~~~~~~~~~~
arm/build: Always handle .ARM.exidx and .ARM.extab sections
After turning on warnings for orphan section placement, enabling
CONFIG_UNWINDER_FRAME_POINTER instead of CONFIG_UNWINDER_ARM causes
thousands of warnings when clang + ld.lld are used:
$ scripts/config --file arch/arm/configs/multi_v7_defconfig \
-d CONFIG_UNWINDER_ARM \
-e CONFIG_UNWINDER_FRAME_POINTER
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- LLVM=1 defconfig zImage
ld.lld: warning: init/built-in.a(main.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(main.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(main.o):(.ARM.extab.ref.text) is being placed in '.ARM.extab.ref.text'
ld.lld: warning: init/built-in.a(do_mounts.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(do_mounts.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(do_mounts_rd.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(do_mounts_rd.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(do_mounts_initrd.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(initramfs.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(initramfs.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(calibrate.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(calibrate.o):(.ARM.extab) is being placed in '.ARM.extab'
These sections are handled by the ARM_UNWIND_SECTIONS define, which is
only added to the list of sections when CONFIG_ARM_UNWIND is set.
CONFIG_ARM_UNWIND is a hidden symbol that is only selected when
CONFIG_UNWINDER_ARM is set so CONFIG_UNWINDER_FRAME_POINTER never
handles these sections. According to the help text of
CONFIG_UNWINDER_ARM, these sections should be discarded so that the
kernel image size is not affected.
Fixes: 619624aacd7a ("arm/build: Warn on orphan section placement") Link: https://github.com/ClangBuiltLinux/linux/issues/1152 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Review-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
[kees: Made the discard slightly more specific] Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200928224854.3224862-1-natechancellor@gmail.com
Kees Cook [Mon, 5 Oct 2020 02:57:20 +0000 (19:57 -0700)]
vmlinux.lds.h: Keep .ctors.* with .ctors
Under some circumstances, the compiler generates .ctors.* sections. This
is seen doing a cross compile of x86_64 from a powerpc64el host:
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/trace_clock.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ftrace.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ring_buffer.o' being
placed in section `.ctors.65435'
Include these orphans along with the regular .ctors section.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 29e52d949ed2 ("x86/build: Warn on orphan section placement") Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20201005025720.2599682-1-keescook@chromium.org
Jens Axboe [Tue, 27 Oct 2020 00:03:18 +0000 (18:03 -0600)]
Fix compat regression in process_vm_rw()
The removal of compat_process_vm_{readv,writev} didn't change
process_vm_rw(), which always assumes it's not doing a compat syscall.
Instead of passing in 'false' unconditionally for 'compat', make it
conditional on in_compat_syscall().
[ Both Al and Christoph point out that trying to access a 64-bit process
from a 32-bit one cannot work anyway, and is likely better prohibited,
but that's a separate issue - Linus ]
The cause came down to a use of strcat() that was adding an string that was
shorten, but the strcat() did not take that into account.
strcat() is extremely dangerous as it does not care how big the buffer is.
Replace it with seq_buf operations that prevent the buffer from being
overwritten if what is being written is bigger than the buffer.
Fixes: 1e00fdd5b892 ("tracing: Handle synthetic event array field type checking correctly") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It looks like the BAR memory region had already been unmapped before we
start clearing CMDQ registers in it, which is pretty bad and the kernel
happily kills itself because of a Current EL Data Abort (on arm64).
Moving the CMDQ uninitialization a bit early fixes the issue for me.
Jakub Kicinski [Tue, 27 Oct 2020 01:26:38 +0000 (18:26 -0700)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes.
These 5 bug fixes are all related to the firmware reset or AER recovery.
2 patches fix the cleanup logic for the workqueue used to handle firmware
reset and recovery. 1 patch ensures that the chip will have the proper
BAR addresses latched after fatal AER recovery. 1 patch fixes the
open path to check for firmware reset abort error. The last one
sends the fw reset command unconditionally to fix the AER reset logic.
====================
In the AER or firmware reset flow, if we are in fatal error state or
if pci_channel_offline() is true, we don't send any commands to the
firmware because the commands will likely not reach the firmware and
most commands don't matter much because the firmware is likely to be
reset imminently.
However, the HWRM_FUNC_RESET command is different and we should always
attempt to send it. In the AER flow for example, the .slot_reset()
call will trigger this fw command and we need to try to send it to
effect the proper reset.
Fixes: 2f31cee463a3 ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 26 Oct 2020 04:18:20 +0000 (00:18 -0400)]
bnxt_en: Check abort error state in bnxt_open_nic().
bnxt_open_nic() is called during configuration changes that require
the NIC to be closed and then opened. This call is protected by
rtnl_lock. Firmware reset can be happening at the same time. Only
critical portions of the entire firmware reset sequence are protected
by the rtnl_lock. It is possible that bnxt_open_nic() can be called
when the firmware reset sequence is aborting. In that case,
bnxt_open_nic() needs to check if the ABORT_ERR flag is set and
abort if it is. The configuration change that resulted in the
bnxt_open_nic() call will fail but the NIC will be brought to a
consistent IF_DOWN state.
Without this patch, if bnxt_open_nic() were to continue in this error
state, it may crash like this:
Vasundhara Volam [Mon, 26 Oct 2020 04:18:19 +0000 (00:18 -0400)]
bnxt_en: Re-write PCI BARs after PCI fatal error.
When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.
pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid. So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses. These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.
Fixes: d71693f4506f ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vasundhara Volam [Mon, 26 Oct 2020 04:18:18 +0000 (00:18 -0400)]
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
As part of the commit 5541df70f7cd
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.
As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.
Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().
Fixes: 5541df70f7cd ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vasundhara Volam [Mon, 26 Oct 2020 04:18:17 +0000 (00:18 -0400)]
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
A recent patch has moved the workqueue cleanup logic before
calling unregister_netdev() in bnxt_remove_one(). This caused a
regression because the workqueue can be restarted if the device is
still open. Workqueue cleanup must be done after unregister_netdev().
The workqueue will not restart itself after the device is closed.
Call bnxt_cancel_sp_work() after unregister_netdev() and
call bnxt_dl_fw_reporters_destroy() after that. This fixes the
regession and the original NULL ptr dereference issue.
Fixes: 3fe04b785a8e ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 26 Oct 2020 23:45:53 +0000 (16:45 -0700)]
Merge branch 'mlxsw-various-fixes'
Ido Schimmel says:
====================
mlxsw: Various fixes
This patch set contains various fixes for mlxsw.
Patch #1 ensures that only link modes that are supported by both the
device and the driver are advertised. When a link mode that is not
supported by the driver is negotiated by the device, it will be
presented as an unknown speed by ethtool, causing the bond driver to
wrongly assume that the link is down.
Patch #2 fixes a trivial memory leak upon module removal.
Patch #3 fixes a use-after-free that syzkaller was able to trigger once
on a slow emulator after a few months of fuzzing.
====================
Amit Cohen [Sat, 24 Oct 2020 13:37:33 +0000 (16:37 +0300)]
mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish()
Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).
The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.
In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].
Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.
[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004
Memory state around the buggy address: ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Fixes: 4117a57bc99e6 ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amit Cohen [Sat, 24 Oct 2020 13:37:31 +0000 (16:37 +0300)]
mlxsw: Only advertise link modes supported by both driver and device
During port creation the driver instructs the device to advertise all
the supported link modes queried from the device.
Since cited commit not all the link modes supported by the device are
supported by the driver. This can result in the device negotiating a
link mode that is not recognized by the driver causing ethtool to show
an unsupported speed:
$ ethtool swp1
...
Speed: Unknown!
This is especially problematic when the netdev is enslaved to a bond, as
the bond driver uses unknown speed as an indication that the link is
down:
[13048.900895] net_ratelimit: 86 callbacks suppressed
[13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex
[13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
Fix this by making sure that only link modes that are supported by both
the device and the driver are advertised.
Fixes: 7b43c123382b ("mlxsw: Remove 56G speed support") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 26 Oct 2020 23:29:38 +0000 (16:29 -0700)]
Merge branch 'net-smc-fixes-2020-10-23'
Karsten Graul says:
====================
net/smc: fixes 2020-10-23
Patch 1 fixes a potential null pointer dereference. Patch 2 takes care
of a suppressed return code and patch 3 corrects the system EID in the
ISM driver.
====================
Karsten Graul [Fri, 23 Oct 2020 18:48:30 +0000 (20:48 +0200)]
s390/ism: fix incorrect system EID
The system EID that is defined by the ISM driver is not correct. Using
an incorrect system EID allows to communicate with remote Linux systems
that use the same incorrect system EID, but when it comes to
interoperability with other operating systems then the system EIDs do
never match which prevents SMC-Dv2 communication.
Using the correct system EID fixes this problem.
Fixes: ff4dc9aea0e0 ("net/smc: introduce System Enterprise ID (SEID)") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul [Fri, 23 Oct 2020 18:48:29 +0000 (20:48 +0200)]
net/smc: fix suppressed return code
The patch that repaired the invalid return code in smcd_new_buf_create()
missed to take care of errno ENOSPC which has a special meaning that no
more DMBEs can be registered on the device. Fix that by keeping this
errno value during the translation of the return code.
Fixes: 8cc5b24941b5 ("net/smc: fix invalid return code in smcd_new_buf_create()") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul [Fri, 23 Oct 2020 18:48:28 +0000 (20:48 +0200)]
net/smc: fix null pointer dereference in smc_listen_decline()
smc_listen_work() calls smc_listen_decline() on label out_decl,
providing the ini pointer variable. But this pointer can still be null
when the label out_decl is reached.
Fix this by checking the ini variable in smc_listen_work() and call
smc_listen_decline() with the result directly.
During __vsock_create() CAP_NET_ADMIN is used to determine if the
vsock_sock->trusted should be set to true. This value is used later
for determing if a remote connection should be allowed to connect
to a restricted VM. Unfortunately, if the caller doesn't have
CAP_NET_ADMIN, an audit message such as an selinux denial is
generated even if the caller does not want a trusted socket.
Logging errors on success is confusing. To avoid this, switch the
capable(CAP_NET_ADMIN) check to the noaudit version.
Reported-by: Roman Kiryanov <rkir@google.com>
https://android-review.googlesource.com/c/device/generic/goldfish/+/1468545/ Signed-off-by: Jeff Vander Stoep <jeffv@google.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20201023143757.377574-1-jeffv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raju Rangoju [Fri, 23 Oct 2020 11:58:52 +0000 (17:28 +0530)]
cxgb4: set up filter action after rewrites
The current code sets up the filter action field before
rewrites are set up. When the action 'switch' is used
with rewrites, this may result in initial few packets
that get switched out don't have rewrites applied
on them.
So, make sure filter action is set up along with rewrites
or only after everything else is set up for rewrites.
Dan Carpenter [Fri, 23 Oct 2020 11:22:12 +0000 (14:22 +0300)]
net: hns3: clean up a return in hclge_tm_bp_setup()
Smatch complains that "ret" might be uninitialized if we don't enter
the loop. We do always enter the loop so it's a false positive, but
it's cleaner to just return a literal zero and that silences the
warning as well.
Linus Torvalds [Mon, 26 Oct 2020 22:45:22 +0000 (15:45 -0700)]
scsi: qla2xxx: remove incorrect sparse #ifdef
The code to try to shut up sparse warnings about questionable locking
didn't shut up sparse: it made the result not parse as valid C at all,
since the end result now has a label with no statement.
The proper fix is to just always lock the hardware, the same way Bart
did in commit 1062d17d72e7 ("scsi: qla2xxx: Simplify the functions for
dumping firmware"). That avoids the whole problem with having locking
that is not statically obvious.
But in the meantime, just remove the incorrect attempt at trying to
avoid a sparse warning that just made things worse.
This was exposed by commit bb21802ff80f ("scsi: qla2xxx: Fix reset of
MPI firmware"), very similarly to how commit 3d0dcc038878 ("scsi:
qla2xxx: Fix MPI failure AEN (8200) handling") exposed the same problem
in another place, and caused that commit 1062d17d72e7.
Please don't add code to just shut up sparse without actually fixing
what sparse complains about.
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Arun Easi <aeasi@marvell.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 26 Oct 2020 22:39:37 +0000 (15:39 -0700)]
arch/um: partially revert the conversion to __section() macro
A couple of um files ended up not including the header file that defines
the __section() macro, and the simplest fix is to just revert the change
for those files.
Fixes: 513c7e624a0f treewide: Convert macro and uses of __section(foo) to __section("foo") Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Parav Pandit [Mon, 26 Oct 2020 13:43:59 +0000 (15:43 +0200)]
RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.
Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo
mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence. A below call graph
of the unload flow.
cleanup_net()
down_read(&pernet_ops_rwsem); <- first sem acquired
ops_pre_exit_list()
pre_exit()
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one()
[...]
mlx5_ib_remove()
mlx5_ib_unbind_slave_port()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier()
down_write(&pernet_ops_rwsem);<- recurrsive lock
Hence, when net namespace is deleted, mlx5 reload results in deadlock.
When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.
Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.
Fixes: 78080f553ea3 ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 16 Oct 2020 21:13:44 +0000 (16:13 -0500)]
RDMA/rxe: Fix small problem in network_type patch
The patch referenced below has a typo that results in using the wrong L2
header size for outbound traffic. (V4 <-> V6).
It also breaks kernel-side RC traffic because they use AVs that use
RDMA_NETWORK_XXX enums instead of RXE_NETWORK_TYPE_XXX enums. Fix this by
transcoding between these enum types.
Rob Herring [Mon, 5 Oct 2020 18:38:29 +0000 (13:38 -0500)]
dt-bindings: Explicitly allow additional properties in board/SoC schemas
In order to add meta-schema checks for additional/unevaluatedProperties
being present, all schema need to make this explicit. As the top-level
board/SoC schemas always have additional properties, add
'additionalProperties: true'.
Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201005183830.486085-4-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Tue, 21 Apr 2020 02:24:47 +0000 (21:24 -0500)]
dt-bindings: More whitespace clean-ups in schema files
Clean-up incorrect indentation, extra spaces, and missing EOF newline in
schema files. Most of the clean-ups are for list indentation which
should always be 2 spaces more than the preceding keyword.
Found with yamllint (now integrated into the checks).
Cc: linux-arm-kernel@lists.infradead.org Cc: dri-devel@lists.freedesktop.org Cc: linux-gpio@vger.kernel.org Cc: linux-i2c@vger.kernel.org Cc: linux-iio@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: alsa-devel@alsa-project.org Cc: linux-mmc@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-serial@vger.kernel.org Cc: linux-usb@vger.kernel.org Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Acked-by: Sam Ravnborg <sam@ravnborg.org> # for display Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #for-iio Signed-off-by: Rob Herring <robh@kernel.org>
Linus Torvalds [Mon, 26 Oct 2020 17:36:21 +0000 (10:36 -0700)]
Merge tag 's390-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fix from Heiko Carstens:
"Fix s390 compile breakage caused by commit 513c7e624a0f ("treewide:
Convert macro and uses of __section(foo) to __section("foo")")"
* tag 's390-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: correct __bootdata / __bootdata_preserved macros
SECTCMP .boot.data
error: section .boot.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.data] Error 1
SECTCMP .boot.preserved.data
error: section .boot.preserved.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.preserved.data] Error 1
make[1]: *** [bzImage] Error 2
Commit 513c7e624a0f ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") converted all __section(foo) to __section("foo").
This is wrong for __bootdata / __bootdata_preserved macros which want
variable names to be a part of intermediate section names .boot.data.<var
name> and .boot.preserved.data.<var name>. Those sections are later
sorted by alignment + name and merged together into final .boot.data
/ .boot.preserved.data sections. Those sections must be identical in
the decompressor and the decompressed kernel (that is checked during
the build).
Fixes: 513c7e624a0f ("treewide: Convert macro and uses of __section(foo) to __section("foo")") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The reserved-memory overlap detection code fails to detect overlaps if
either of the regions starts at address 0x0. The code explicitly checks
for and ignores such regions, apparently in order to ignore dynamically
allocated regions which have an address of 0x0 at this point. These
dynamically allocated regions also have a size of 0x0 at this point, so
fix this by removing the check and sorting the dynamically allocated
regions ahead of any static regions at address 0x0.
For example, there are two overlaps in this case but they are not
currently reported:
Fabien Parent [Sun, 18 Oct 2020 19:30:16 +0000 (21:30 +0200)]
dt-bindings: mailbox: mtk-gce: fix incorrect mbox-cells value
As the binding documentation says, #mbox-cells must have a value of 2,
but the example use a value 3. The MT8173 device tree correctly use
mbox-cells = <2>. This commit fixes the example.
Joe Perches [Thu, 22 Oct 2020 02:36:07 +0000 (19:36 -0700)]
treewide: Convert macro and uses of __section(foo) to __section("foo")
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Rasmus Villemoes [Sat, 24 Oct 2020 01:04:26 +0000 (03:04 +0200)]
kernel/sys.c: fix prototype of prctl_get_tid_address()
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().
Eric Biggers [Fri, 23 Oct 2020 23:27:16 +0000 (16:27 -0700)]
mm: remove kzfree() compatibility definition
Commit 877ae2a622de ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.
Since then a few more instances of kzfree() have slipped in.
Just get rid of them and remove the compatibility definition
once and for all.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 25 Oct 2020 18:28:49 +0000 (11:28 -0700)]
Merge tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A time namespace fix and a matching selftest. The futex absolute
timeouts which are based on CLOCK_MONOTONIC require time namespace
corrected. This was missed in the original time namesapce support"
* tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/timens: Add a test for futex()
futex: Adjust absolute futex timeouts with per time namespace offset
Linus Torvalds [Sun, 25 Oct 2020 18:25:16 +0000 (11:25 -0700)]
Merge tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two scheduler fixes:
- A trivial build fix for sched_feat() to compile correctly with
CONFIG_JUMP_LABEL=n
- Replace a zero lenght array with a flexible array"
* tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/features: Fix !CONFIG_JUMP_LABEL case
sched: Replace zero-length array with flexible-array
Linus Torvalds [Sun, 25 Oct 2020 18:12:31 +0000 (11:12 -0700)]
Merge tag 'ntb-5.10' of git://github.com/jonmason/ntb
Pull NTB fixes from Jon Mason.
* tag 'ntb-5.10' of git://github.com/jonmason/ntb:
NTB: Use struct_size() helper in devm_kzalloc()
ntb: intel: Fix memleak in intel_ntb_pci_probe
NTB: hw: amd: fix an issue about leak system resources
Linus Torvalds [Sun, 25 Oct 2020 18:10:23 +0000 (11:10 -0700)]
Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Regression fix for rc1 and stable kernels as well"
* 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs