Per Forlin [Thu, 13 Feb 2020 14:37:09 +0000 (15:37 +0100)]
net: dsa: tag_qca: Make sure there is headroom for tag
Passing tag size to skb_cow_head will make sure
there is enough headroom for the tag data.
This change does not introduce any overhead in case there
is already available headroom for tag.
Signed-off-by: Per Forlin <perfn@axis.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
William Dauchy [Thu, 13 Feb 2020 17:19:22 +0000 (18:19 +0100)]
net, ip6_tunnel: enhance tunnel locate with link check
With ipip, it is possible to create an extra interface explicitly
attached to a given physical interface:
# ip link show tunl0
4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
# ip link add tunl1 type ipip dev eth0
# ip link show tunl1
6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
But it is not possible with ip6tnl:
# ip link show ip6tnl0
5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd ::
# ip link add ip6tnl1 type ip6tnl dev eth0
RTNETLINK answers: File exists
This patch aims to make it possible by adding link comparaison in both
tunnel locate and lookup functions; we also modify mtu calculation when
attached to an interface with a lower mtu.
This permits to make use of x-netns communication by moving the newly
created tunnel in a given netns.
Signed-off-by: William Dauchy <w.dauchy@criteo.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Feb 2020 15:16:08 +0000 (07:16 -0800)]
Merge tag 'mac80211-for-net-2020-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a few fixes:
* avoid running out of tracking space for frames that need
to be reported to userspace by using more bits
* fix beacon handling suppression by adding some relevant
elements to the CRC calculation
* fix quiet mode in action frames
* fix crash in ethtool for virt_wifi and similar
* add a missing policy entry
* fix 160 & 80+80 bandwidth to take local capabilities into
account
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 14 Feb 2020 07:59:00 +0000 (08:59 +0100)]
net/smc: no peer ID in CLC decline for SMCD
Just SMCR requires a CLC Peer ID, but not SMCD. The field should be
zero for SMCD.
Fixes: 00b917c03235 ("net/smc: add SMC-D support in CLC messages") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 14 Feb 2020 07:58:59 +0000 (08:58 +0100)]
net/smc: transfer fasync_list in case of fallback
SMC does not work together with FASTOPEN. If sendmsg() is called with
flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to
fallback mode. To handle the previous ioctl FIOASYNC call correctly
in this case, it is necessary to transfer the socket wait queue
fasync_list to the internal TCP socket.
Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com Fixes: 8d716f2265c89 ("net/smc: handle sockopts forcing fallback") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Feb 2020 15:05:18 +0000 (07:05 -0800)]
Merge branch 'hns3-fixes'
Huazhong Tan says:
====================
net: hns3: fixes for -net
This series includes three bugfixes for the HNS3 ethernet driver.
[patch 1] fixes a management table lost issue after IMP reset.
[patch 2] fixes a VF bandwidth configuration not work problem.
[patch 3] fixes a problem related to IPv6 address copying.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Fri, 14 Feb 2020 01:53:43 +0000 (09:53 +0800)]
net: hns3: fix a copying IPv6 address error in hclge_fd_get_flow_tuples()
The IPv6 address defined in struct in6_addr is specified as
big endian, but there is no specified endian in struct
hclge_fd_rule_tuples, so it will cause a problem if directly
use memcpy() to copy ipv6 address between these two structures
since this field in struct hclge_fd_rule_tuples is little endian.
This patch fixes this problem by using be32_to_cpu() to convert
endian of IPv6 address of struct in6_addr before copying.
Fixes: 311b0277f0f2 ("net: hns3: add aRFS support for PF") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 14 Feb 2020 01:53:42 +0000 (09:53 +0800)]
net: hns3: fix VF bandwidth does not take effect in some case
When enabling 4 TC after setting the bandwidth of VF, the bandwidth
of VF will resume to default value, because of the qset resources
changed in this case.
This patch fixes it by using a fixed VF's qset resources according to
HNAE3_MAX_TC macro.
Fixes: 959fbccef807 ("net: hns3: add support for configuring bandwidth of VF on the host") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 14 Feb 2020 01:53:41 +0000 (09:53 +0800)]
net: hns3: add management table after IMP reset
In the current process, the management table is missing after the
IMP reset. This patch adds the management table to the reset process.
Fixes: a36633ff8784 ("net: hns3: add manager table initialization for hardware") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shay Bar [Mon, 10 Feb 2020 13:07:28 +0000 (15:07 +0200)]
mac80211: fix wrong 160/80+80 MHz setting
Before this patch, STA's would set new width of 160/80+80 MHz based on AP capability only.
This is wrong because STA may not support > 80MHz BW.
Fix is to verify STA has 160/80+80 MHz capability before increasing its width to > 80MHz.
The "support_80_80" and "support_160" setting is based on:
"Table 9-272 — Setting of the Supported Channel Width Set subfield and Extended NSS BW
Support subfield at a STA transmitting the VHT Capabilities Information field"
From "Draft P802.11REVmd_D3.0.pdf"
====================
icmp: account for NAT when sending icmps from ndo layer
The ICMP routines use the source address for two reasons:
1. Rate-limiting ICMP transmissions based on source address, so
that one source address cannot provoke a flood of replies. If
the source address is wrong, the rate limiting will be
incorrectly applied.
2. Choosing the interface and hence new source address of the
generated ICMP packet. If the original packet source address
is wrong, ICMP replies will be sent from the wrong source
address, resulting in either a misdelivery, infoleak, or just
general network admin confusion.
Most of the time, the icmp_send and icmpv6_send routines can just reach
down into the skb's IP header to determine the saddr. However, if
icmp_send or icmpv6_send is being called from a network device driver --
there are a few in the tree -- then it's possible that by the time
icmp_send or icmpv6_send looks at the packet, the packet's source
address has already been transformed by SNAT or MASQUERADE or some other
transformation that CONNTRACK knows about. In this case, the packet's
source address is most certainly the *wrong* source address to be used
for the purpose of ICMP replies.
Rather, the source address we want to use for ICMP replies is the
original one, from before the transformation occurred.
Fortunately, it's very easy to just ask CONNTRACK if it knows about this
packet, and if so, how to fix it up. The saddr is the only field in the
header we need to fix up, for the purposes of the subsequent processing
in the icmp_send and icmpv6_send functions, so we do the lookup very
early on, so that the rest of the ICMP machinery can progress as usual.
Changes v3->v4:
- Add back the skb_shared checking, since the previous assumption isn't
actually true [Eric]. This implies dropping the additional patches v3 had
for removing skb_share_check from various drivers. We can revisit that
general set of ideas later, but that's probably better suited as a net-next
patchset rather than this stable one which is geared at fixing bugs. So,
this implements things in the safe conservative way.
Changes v2->v3:
- Add selftest to ensure this actually does what we want and never regresses.
- Check the size of the skb header before operating on it.
- Use skb_ensure_writable to ensure we can modify the cloned skb [Florian].
- Conditionalize this on IPS_SRC_NAT so we don't do anything unnecessarily
[Florian].
- It turns out that since we're calling these from the xmit path,
skb_share_check isn't required, so remove that [Florian]. This simplifes the
code a bit too. **The supposition here is that skbs passed to ndo_start_xmit
are _never_ shared. If this is not correct NOW IS THE TIME TO PIPE UP, for
doom awaits us later.**
- While investigating the shared skb business, several drivers appeared to be
calling it incorrectly in the xmit path, so this series also removes those
unnecessary calls, based on the supposition mentioned in the previous point.
Changes v1->v2:
- icmpv6 takes subtly different types than icmpv4, like u32 instead of be32,
u8 instead of int.
- Since we're technically writing to the skb, we need to make sure it's not
a shared one [Dave, 2017].
- Restore the original skb data after icmp_send returns. All current users
are freeing the packet right after, so it doesn't matter, but future users
might not.
- Remove superfluous route lookup in sunvnet [Dave].
- Use NF_NAT instead of NF_CONNTRACK for condition [Florian].
- Include this cover letter [Dave].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Because xfrmi is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Because wireguard is calling icmp from network device context, it should
use the ndo helper so that the rate limiting applies correctly. This
commit adds a small test to the wireguard test suite to ensure that the
new functions continue doing the right thing in the context of
wireguard. It does this by setting up a condition that will definately
evoke an icmp error message from the driver, but along a nat'd path.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Because sunvnet is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly. While we're
at it, doing the additional route lookup before calling icmp_ndo_send is
superfluous, since this is the job of the icmp code in the first place.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
icmp: introduce helper for nat'd source address in network device context
This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Tue, 11 Feb 2020 18:33:40 +0000 (19:33 +0100)]
net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
to fl_policy.
Fixes: 609f1261c1c5 ("net/flower: Introduce hardware offload support") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Tue, 11 Feb 2020 18:33:39 +0000 (19:33 +0100)]
net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size
of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper
entry to mall_policy.
Fixes: 2385c6d25b98 ("net/sched: Add match-all classifier hw offloading.") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 11 Feb 2020 10:31:54 +0000 (18:31 +0800)]
net/flow_dissector: remove unexist field description
@thoff has moved to struct flow_dissector_key_control.
Fixes: 327dc61f00df ("net: Get skb hash over flow_keys structure") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Tue, 11 Feb 2020 02:13:44 +0000 (10:13 +0800)]
page_pool: refill page when alloc.count of pool is zero
"do {} while" in page_pool_refill_alloc_cache will always
refill page once whether refill is true or false, and whether
alloc.count of pool is less than PP_ALLOC_CACHE_REFILL or not
this is wrong, and will cause overflow of pool->alloc.cache
the caller of __page_pool_get_cached should provide guarantee
that pool->alloc.cache is safe to access, so in_serving_softirq
should be removed as suggested by Jesper Dangaard Brouer in
https://patchwork.ozlabs.org/patch/1233713/
so fix this issue by calling page_pool_refill_alloc_cache()
only when pool->alloc.count is zero
Fixes: 03f9ed77591e ("page_pool: handle page recycle for NUMA_NO_NODE condition") Signed-off-by: Li RongQing <lirongqing@baidu.com> Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Feb 2020 22:10:11 +0000 (14:10 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2020-02-12
This series contains fixes to only the ice driver.
Dave fixes logic flaws in the DCB rebuild function which is used after a
reset. Also fixed a configuration issue when switching between firmware
and software LLDP mode where the number of TLV's configured was getting
out of sync with what lldpad thinks is configured.
Paul fixes how the driver displayed all the supported and advertised
link modes by basing it on the PHY capabilities, and in the process
cleaned up a lot of code.
Brett fixes duplicate receive tail bumps by comparing the value we are
writing to tail with the previously written tail value. Also cleaned up
workarounds that are no longer needed with the latest NVM images.
Anirudh cleaned up unnecessary CONFIG_PCI_IOV wrappers. Updated the
driver to use ice_pf_to_dev() instead of &pf->pdev->dev or
&vsi->back->pdev->dev. Cleaned up the string format in print function
calls to remove newlines where applicable.
Akeem updates the link message logging to include "Full Duplex" and
"Negotiated", to help distinguish from "Requested" for FEC.
Bruce fixes and consolidates the logging of firmware/NVM information
during driver load, since the information is duplicate of what is
available via ethtool. Fixed the checking of the Unit Load Status bits
after reset to ensure they are 0x7FF before continuing, by updating the
mask. Cleanup up possible NULL dereferences that were created by a
previous commit.
Ben fixes the driver to use the correct netif_msg_tx/rx_error() to
determine whether to print the MDD event type.
Tony provides several trivial fixes, which include whitespace, typos,
function header comments, reverse Christmas tree issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen [Thu, 6 Feb 2020 09:20:13 +0000 (01:20 -0800)]
ice: Trivial fixes
This is a collection of trivial fixes including fixing whitespace, typos,
function headers, reverse Christmas tree, etc.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ben Shelton [Thu, 6 Feb 2020 09:20:12 +0000 (01:20 -0800)]
ice: Use correct netif error function
Use the correct netif_msg_[tx,rx]_error() function to determine whether to
print the MDD event type.
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Formatting strings in print function calls (like dev_info, dev_err, etc.)
can exceed 80 columns without making checkpatch unhappy. So remove
newlines where applicable and make print statements more compact.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use ice_pf_to_dev(pf) instead of &pf->pdev->dev
Use ice_pf_to_dev(vsi->back) instead of &vsi->back->pdev->dev
When a pointer to the pf instance is available, use ice_pf_to_dev
instead of ice_hw_to_dev
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 6 Feb 2020 09:20:08 +0000 (01:20 -0800)]
ice: Remove possible null dereference
Commit b60675373518 ("ice: add extra check for null Rx descriptor") moved
the call to ice_construct_skb() under a null check as Coverity reported a
possible use of null skb. However, the original call was not deleted, do so
now.
Fixes: b60675373518 ("ice: add extra check for null Rx descriptor") Reported-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Thu, 6 Feb 2020 09:20:07 +0000 (01:20 -0800)]
ice: update Unit Load Status bitmask to check after reset
After a reset the Unit Load Status bits in the GLNVM_ULD register to check
for completion should be 0x7FF before continuing. Update the mask to check
(minus the three reserved bits that are always set).
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Thu, 6 Feb 2020 09:20:06 +0000 (01:20 -0800)]
ice: fix and consolidate logging of NVM/firmware version information
Logging the firmware/NVM information during driver load is redundant since
that information is also available via ethtool. Move the functionality
found in ice_nvm_version_str() directly into ice_get_drvinfo() and remove
calling the former and logging that info during driver probe. This also
gets rid of a bug in ice_nvm_version_str() where it returns a pointer to
a buffer which is free'ed when that function exits.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch modifies link message logging to include "Full Duplex" and
"Negotiated" for FEC, so as to distinguish it from "Requested" FEC.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: Remove CONFIG_PCI_IOV wrap in ice_set_pf_caps
Remove unnecessary CONFIG_PCI_IOV wrapping in ice_set_pf_caps. None
of the data structures accessed within the block are wrapped with
this flag. When CONFIG_PCI_IOV is undefined, pf->num_vfs_supported
will be 0 anyway.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Brett Creeley [Thu, 6 Feb 2020 09:20:03 +0000 (01:20 -0800)]
ice: Remove ice_dev_onetime_setup()
ice_dev_onetime_setup contains driver workarounds needed for
firmware limitations. These issues have now been resolved in newer
NVMs so remove the function.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Brett Creeley [Thu, 6 Feb 2020 09:20:02 +0000 (01:20 -0800)]
ice: Don't allow same value for Rx tail to be written twice
Currently we compare the value we are about to write to the Rx tail
register with the previous value of next_to_use. The problem with this
is we only write tail on 8 descriptor boundaries, but next_to_use is
updated whenever we clean Rx descriptors. Fix this by comparing the
value we are about to write to tail with the previously written tail
value. This will prevent duplicate Rx tail bumps.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Paul Greenwalt [Thu, 6 Feb 2020 09:20:01 +0000 (01:20 -0800)]
ice: display supported and advertised link modes
Display all of the supported and advertised link modes based on the PHY
capability with media.
Displaying all supported modes is more informative then only displaying
the current link mode.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Thu, 6 Feb 2020 09:20:00 +0000 (01:20 -0800)]
ice: Fix switch between FW and SW LLDP
When switching between FW and SW LLDP mode, the
number of configured TLV apps in the driver's
DCB configuration is getting out of synch with
what lldpad thinks is configured. This is causing
a problem when shutting down lldpad. The cleanup
is trying to delete TLV apps that are not defined
in the kernel.
Since the driver is keeping an accurate account
of the apps defined, use the drivers number of
apps to determine if there is an app to delete.
If the number of apps is <= 1, then do not
attempt to delete.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Thu, 6 Feb 2020 09:19:59 +0000 (01:19 -0800)]
ice: Fix DCB rebuild after reset
The function ice_dcb_rebuild had some logic
flaws in it, and also didn't differentiate
between FW and SW modes needs.
For FW flow, the willing setting was being
forced to OFF and left that way. Unwilling
in DCB FW mode is not a supported model.
Leave the config alone and use the return value
from the set command to determine if setting the
config was successful.
The SW DCB flow does not need to need to register
for MIB change events (as they are not used in
SW mode).
Use !is_sw_lldp checks to only perform FW specific
task while in FW mode.
Also adding a reapplication of the current DCB
config after a link event. Some NVMs are not
maintaining their DCB configs across link events.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kunihiko Hayashi [Wed, 12 Feb 2020 10:55:34 +0000 (19:55 +0900)]
net: ethernet: ave: Add capability of rgmii-id mode
This allows you to specify the type of rgmii-id that will enable phy
internal delay in ethernet phy-mode.
This adds all RGMII cases to all of get_pinmode() except LD11, because LD11
SoC doesn't support RGMII due to the constraint of the hardware. When RGMII
phy mode is specified in the devicetree for LD11, the driver will abort
with an error.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Firo Yang [Wed, 12 Feb 2020 05:09:17 +0000 (06:09 +0100)]
enic: prevent waking up stopped tx queues over watchdog reset
Recent months, our customer reported several kernel crashes all
preceding with following message:
NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out
Error message of one of those crashes:
BUG: unable to handle kernel paging request at ffffffffa007e090
After analyzing severl vmcores, I found that most of crashes are
caused by memory corruption. And all the corrupted memory areas
are overwritten by data of network packets. Moreover, I also found
that the tx queues were enabled over watchdog reset.
After going through the source code, I found that in enic_stop(),
the tx queues stopped by netif_tx_disable() could be woken up over
a small time window between netif_tx_disable() and the
napi_disable() by the following code path:
napi_poll->
enic_poll_msix_wq->
vnic_cq_service->
enic_wq_service->
netif_wake_subqueue(enic->netdev, q_number)->
test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)
In turn, upper netowrk stack could queue skb to ENIC NIC though
enic_hard_start_xmit(). And this might introduce some race condition.
Our customer comfirmed that this kind of kernel crash doesn't occur over
90 days since they applied this patch.
Signed-off-by: Firo Yang <firo.yang@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Feb 2020 01:08:31 +0000 (17:08 -0800)]
Merge branch 'Bug-fixes-for-ENA-Ethernet-driver'
Sameeh Jubran says:
====================
Bug fixes for ENA Ethernet driver
Difference from V1:
* Started using netdev_rss_key_fill()
* Dropped superflous changes that are not related to bug fixes as
requested by Jakub
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
comp_ctx can be NULL in a very rare case when an admin command is executed
during the execution of ena_remove().
The bug scenario is as follows:
* ena_destroy_device() sets the comp_ctx to be NULL
* An admin command is executed before executing unregister_netdev(),
this can still happen because our device can still receive callbacks
from the netdev infrastructure such as ethtool commands.
* When attempting to access the comp_ctx, the bug occurs since it's set
to NULL
Fix:
Added a check that comp_ctx is not NULL
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Feb 2020 15:17:50 +0000 (15:17 +0000)]
net: ena: ethtool: use correct value for crc32 hash
Up till kernel 4.11 there was no enum defined for crc32 hash in ethtool,
thus the xor enum was used for supporting crc32.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ena: make ena rxfh support ETH_RSS_HASH_NO_CHANGE
As the name suggests ETH_RSS_HASH_NO_CHANGE is received upon changing
the key or indirection table using ethtool while keeping the same hash
function.
Also add a function for retrieving the current hash function from
the ena-com layer.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Saeed Bshara <saeedb@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The function ena_com_ind_tbl_convert_from_device() has an overflow
bug as explained below. Either way, this function is not needed at
all since we don't retrieve the indirection table from the device
at any point which means that this conversion is not needed.
The bug:
The for loop iterates over all io_sq_queues, when passing the actual
number of used queues the io_sq_queues[i].idx equals 0 since they are
uninitialized which results in the following code to be executed till
the end of the loop:
dev_idx_to_host_tbl[0] = i;
This results dev_idx_to_host_tbl[0] in being equal to
ENA_TOTAL_NUM_QUEUES - 1.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The indirection table has the indices of the Rx queues. When we store it
during set indirection operation, we convert the indices to our internal
representation of the indices.
Our internal representation of the indices is: even indices for Tx and
uneven indices for Rx, where every Tx/Rx pair are in a consecutive order
starting from 0. For example if the driver has 3 queues (3 for Tx and 3
for Rx) then the indices are as follows:
0 1 2 3 4 5
Tx Rx Tx Rx Tx Rx
The BUG:
The issue is that when we satisfy a get request for the indirection
table, we don't convert the indices back to the original representation.
The FIX:
Simply apply the inverse function for the indices of the indirection
table after we set it.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ena: rss: store hash function as values and not bits
The device receives, stores and retrieves the hash function value as bits
and not as their enum value.
The bug:
* In ena_com_set_hash_function() we set
cmd.u.flow_hash_func.selected_func to the bit value of rss->hash_func.
(1 << rss->hash_func)
* In ena_com_get_hash_function() we retrieve the hash function and store
it's bit value in rss->hash_func. (Now the bit value of rss->hash_func
is stored in rss->hash_func instead of it's enum value)
The fix:
This commit fixes the issue by converting the retrieved hash function
values from the device to the matching enum value of the set bit using
ffs(). ffs() finds the first set bit's index in a word. Since the function
returns 1 for the LSB's index, we need to subtract 1 from the returned
value (note that BIT(0) is 1).
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Feb 2020 15:17:45 +0000 (15:17 +0000)]
net: ena: rss: fix failure to get indirection table
On old hardware, getting / setting the hash function is not supported while
gettting / setting the indirection table is.
This commit enables us to still show the indirection table on older
hardwares by setting the hash function and key to NULL.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Feb 2020 15:17:44 +0000 (15:17 +0000)]
net: ena: rss: do not allocate key when not supported
Currently we allocate the key whether the device supports setting the
key or not. This commit adds a check to the allocation function and
handles the error accordingly.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bug description:
When running "ethtool -x <if_name>" the key shows up as all zeros.
When we use "ethtool -X <if_name> hfunc toeplitz hkey <some:random:key>" to
set the key and then try to retrieve it using "ethtool -x <if_name>" then
we return the correct key because we return the one we saved.
Bug cause:
We don't fetch the key from the device but instead return the key
that we have saved internally which is by default set to zero upon
allocation.
Fix:
This commit fixes the issue by initializing the key to a random value
using netdev_rss_key_fill().
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation of the driver calls skb_tx_timestamp()to add a
software tx timestamp to the skb, however the software-transmit capability
is not reported in ethtool -T.
This commit updates the ethtool structure to report the software-transmit
capability in ethtool -T using the standard ethtool_op_get_ts_info().
This function reports all software timestamping capabilities (tx and rx),
as well as setting phc_index = -1. phc_index is the index of the PTP
hardware clock device that will be used for hardware timestamps. Since we
don't have such a device in ENA, using the default -1 value is the correct
setting.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
>From the documentation of round_jiffies():
"Rounds a time delta in the future (in jiffies) up or down to
(approximately) full seconds. This is useful for timers for which
the exact time they fire does not matter too much, as long as
they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire
at the same time, rather than at various times spread out. The goal
of this is to have the CPU wake up less, which saves power."
There are 2 parts to this patch:
================================
Part 1:
-------
In our case we need timer_service to be called approximately every
X=1 seconds, and the exact time does not matter, so using round_jiffies()
is the right way to go.
Therefore we add round_jiffies() to the mod_timer() in ena_timer_service().
Part 2:
-------
round_jiffies() is used in check_for_missing_keep_alive() when
getting the jiffies of the expiration of the keep_alive timeout. Here it
is actually a mistake to use round_jiffies() because we want the exact
time when keep_alive should expire and not an approximate rounded time,
which can cause early, false positive, timeouts.
Therefore we remove round_jiffies() in the calculation of
keep_alive_expired() in check_for_missing_keep_alive().
Fixes: b80096fb13fd ("net: ena: add hardware hints capability to the driver") Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ena: fix potential crash when rxfh key is NULL
When ethtool -X is called without an hkey, ena_com_fill_hash_function()
is called with key=NULL, which is passed to memcpy causing a crash.
This commit fixes this issue by checking key is not NULL.
Fixes: 42904aea461f ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 10 Feb 2020 19:36:13 +0000 (11:36 -0800)]
net/smc: fix leak of kernel memory to user space
As nlmsg_put() does not clear the memory that is reserved,
it this the caller responsability to make sure all of this
memory will be written, in order to not reveal prior content.
While we are at it, we can provide the socket cookie even
if clsock is not set.
syzbot reported :
BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline]
BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
__arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
__fswab32 include/uapi/linux/swab.h:59 [inline]
__swab32p include/uapi/linux/swab.h:179 [inline]
__be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
Fixes: 334382adc2dc ("smc: netlink interface for SMC sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brett Creeley [Mon, 10 Feb 2020 18:59:18 +0000 (10:59 -0800)]
i40e: Fix the conditional for i40e_vc_validate_vqs_bitmaps
Commit 69f8dcc06e67 ("i40e: Fix virtchnl_queue_select bitmap
validation") introduced a necessary change for verifying how queue
bitmaps from the iavf driver get validated. Unfortunately, the
conditional was reversed. Fix this.
Fixes: 69f8dcc06e67 ("i40e: Fix virtchnl_queue_select bitmap validation") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
core: Don't skip generic XDP program execution for cloned SKBs
The current generic XDP handler skips execution of XDP programs entirely if
an SKB is marked as cloned. This leads to some surprising behaviour, as
packets can end up being cloned in various ways, which will make an XDP
program not see all the traffic on an interface.
This was discovered by a simple test case where an XDP program that always
returns XDP_DROP is installed on a veth device. When combining this with
the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending
side, SKBs reliably end up in the cloned state, causing them to be passed
through to the receiving interface instead of being dropped. A minimal
reproducer script for this is included below.
This patch fixed the issue by simply triggering the existing linearisation
code for cloned SKBs instead of skipping the XDP program execution. This
behaviour is in line with the behaviour of the native XDP implementation
for the veth driver, which will reallocate and copy the SKB data if the SKB
is marked as shared.
Reproducer Python script (requires BCC and Scapy):
from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP
from bcc import BPF
import time, sys, subprocess, shlex
s = sniff(iface="a_to_b", count=10, timeout=15)
if len(s):
print(f"Got {len(s)} packets - should have gotten 0")
return 1
else:
print("Got no packets - as expected")
return 0
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <skb|drv>")
sys.exit(1)
if sys.argv[1] == "client":
sys.exit(client())
elif sys.argv[1] == "server":
mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE
sys.exit(server(mode))
else:
try:
mode = sys.argv[1]
if mode not in ('skb', 'drv'):
print(f"Usage: {sys.argv[0]} <skb|drv>")
sys.exit(1)
print(f"Running in {mode} mode")
for cmd in [
'ip netns add netns_a',
'ip netns add netns_b',
'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b',
# Disable ipv6 to make sure there's no address autoconf traffic
'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1',
'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1',
'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa',
'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc',
'ip -n netns_a link set dev a_to_b up',
'ip -n netns_b link set dev b_to_a up']:
subprocess.check_call(shlex.split(cmd))
server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}"))
client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client"))
Fixes: 2a2f3de2862e ("net: xdp: support xdp generic on virtual devices") Reported-by: Stepan Horacek <shoracek@redhat.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sat, 8 Feb 2020 15:55:04 +0000 (16:55 +0100)]
qmi_wwan: unconditionally reject 2 ep interfaces
We have been using the fact that the QMI and DIAG functions
usually are the only ones with class/subclass/protocol being
ff/ff/ff on Quectel modems. This has allowed us to match the
QMI function without knowing the exact interface number,
which can vary depending on firmware configuration.
The ability to silently reject the DIAG function, which is
usually handled by the option driver, is important for this
method to work. This is done based on the knowledge that it
has exactly 2 bulk endpoints. QMI function control interfaces
will have either 3 or 1 endpoint. This rule is universal so
the quirk condition can be removed.
The fixed layouts known from the Gobi1k and Gobi2k modems
have been gradually replaced by more dynamic layouts, and
many vendors now use configurable layouts without changing
device IDs. Renaming the class/subclass/protocol matching
macro makes it more obvious that this is now not Quectel
specific anymore.
Cc: Kristian Evensen <kristian.evensen@gmail.com> Cc: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 8 Feb 2020 15:54:32 +0000 (16:54 +0100)]
net: dsa: mv88e6xxx: Prevent truncation of longer interrupt names
When adding support for unique interrupt names, after testing on a few
devices, it was assumed 32 characters would be sufficient. This
assumption turned out to be incorrect, ZII RDU2 for example uses a
device base name of mv88e6xxx-30be0000.ethernet-1:0, leaving no space
for post fixes such as -g1-atu-prob and -watchdog. The names then
become identical, defeating the point of the patch.
Increase the length of the string to 64 charactoes.
Reported-by: Chris Healy <Chris.Healy@zii.aero> Fixes: 597aed14ebbe ("net: dsa: mv88e6xxx: Unique IRQ name") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sat, 8 Feb 2020 14:50:36 +0000 (15:50 +0100)]
qmi_wwan: re-add DW5821e pre-production variant
Commit 3a1321147627 removed the support for the pre-production variant
of the Dell DW5821e to avoid probing another USB interface unnecessarily.
However, the pre-production samples are found in the wild, and this lack
of support is causing problems for users of such samples. It is therefore
necessary to support both variants.
Matching on both interfaces 0 and 1 is not expected to cause any problem
with either variant, as only the QMI function will be probed successfully
on either. Interface 1 will be rejected based on the HID class for the
production variant:
Fixes: 3a1321147627 ("qmi_wwan: fix interface number for DW5821e production firmware") Link: https://whrl.pl/Rf0vNk Reported-by: Lars Melin <larsm17@gmail.com> Cc: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Tuong Lien [Mon, 10 Feb 2020 08:35:44 +0000 (15:35 +0700)]
tipc: fix successful connect() but timed out
In commit 6a6522ed8905 ("tipc: fix wrong connect() return code"), we
fixed the issue with the 'connect()' that returns zero even though the
connecting has failed by waiting for the connection to be 'ESTABLISHED'
really. However, the approach has one drawback in conjunction with our
'lightweight' connection setup mechanism that the following scenario
can happen:
Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED'
and the 'wait_for_conn()' process is woken up but not run. Meanwhile,
the server starts to send a number of data following by a 'close()'
shortly without waiting any response from the client, which then forces
the client socket to be 'DISCONNECTING' immediately. When the wait
process is switched to be running, it continues to wait until the timer
expires because of the unexpected socket state. The client 'connect()'
will finally get ‘-ETIMEDOUT’ and force to release the socket whereas
there remains the messages in its receive queue.
Obviously the issue would not happen if the server had some delay prior
to its 'close()' (or the number of 'DATA' messages is large enough),
but any kind of delay would make the connection setup/shutdown "heavy".
We solve this by simply allowing the 'connect()' returns zero in this
particular case. The socket is already 'DISCONNECTING', so any further
write will get '-EPIPE' but the socket is still able to read the
messages existing in its receive queue.
Note: This solution doesn't break the previous one as it deals with a
different situation that the socket state is 'DISCONNECTING' but has no
error (i.e. sk->sk_err = 0).
Fixes: 6a6522ed8905 ("tipc: fix wrong connect() return code") Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Wandun [Mon, 10 Feb 2020 08:27:59 +0000 (16:27 +0800)]
mptcp: make the symbol 'mptcp_sk_clone_lock' static
Fix the following sparse warning:
net/mptcp/protocol.c:646:13: warning: symbol 'mptcp_sk_clone_lock' was not declared. Should it be static?
Fixes: 7cfd10502042 ("mptcp: fix use-after-free for ipv6") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Wandun [Mon, 10 Feb 2020 08:11:09 +0000 (16:11 +0800)]
tipc: make three functions static
Fix the following sparse warning:
net/tipc/node.c:281:6: warning: symbol 'tipc_node_free' was not declared. Should it be static?
net/tipc/node.c:2801:5: warning: symbol '__tipc_nl_node_set_key' was not declared. Should it be static?
net/tipc/node.c:2878:5: warning: symbol '__tipc_nl_node_flush_key' was not declared. Should it be static?
Fixes: 09a2b8a4df52 ("tipc: introduce TIPC encryption & authentication") Fixes: 117a81d94c0d ("tipc: add support for AEAD key setting via netlink") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Unbalanced locking in mwifiex_process_country_ie, from Brian Norris.
2) Fix thermal zone registration in iwlwifi, from Andrei
Otcheretianski.
3) Fix double free_irq in sgi ioc3 eth, from Thomas Bogendoerfer.
4) Use after free in mptcp, from Florian Westphal.
5) Use after free in wireguard's root_remove_peer_lists, from Eric
Dumazet.
6) Properly access packets heads in bonding alb code, from Eric
Dumazet.
7) Fix data race in skb_queue_len(), from Qian Cai.
8) Fix regression in r8169 on some chips, from Heiner Kallweit.
9) Fix XDP program ref counting in hv_netvsc, from Haiyang Zhang.
10) Certain kinds of set link netlink operations can cause a NULL deref
in the ipv6 addrconf code. Fix from Eric Dumazet.
11) Don't cancel uninitialized work queue in drop monitor, from Ido
Schimmel.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: thunderx: use proper interface type for RGMII
mt76: mt7615: fix max_nss in mt7615_eeprom_parse_hw_cap
bpf: Improve bucket_log calculation logic
selftests/bpf: Test freeing sockmap/sockhash with a socket in it
bpf, sockhash: Synchronize_rcu before free'ing map
bpf, sockmap: Don't sleep while holding RCU lock on tear-down
bpftool: Don't crash on missing xlated program instructions
bpf, sockmap: Check update requirements after locking
drop_monitor: Do not cancel uninitialized work item
mlxsw: spectrum_dpipe: Add missing error path
mlxsw: core: Add validation of hardware device types for MGPIR register
mlxsw: spectrum_router: Clear offload indication from IPv6 nexthops on abort
selftests: mlxsw: Add test cases for local table route replacement
mlxsw: spectrum_router: Prevent incorrect replacement of local table routes
net: dsa: microchip: enable module autoprobe
ipv6/addrconf: fix potential NULL deref in inet6_set_link_af()
dpaa_eth: support all modes with rate adapting PHYs
net: stmmac: update pci platform data to use phy_interface
net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter
net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter
...
Linus Torvalds [Sat, 8 Feb 2020 22:28:26 +0000 (14:28 -0800)]
Merge tag 'powerpc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix an existing bug in our user access handling, exposed by one of
the bug fixes we merged this cycle.
- A fix for a boot hang on 32-bit with CONFIG_TRACE_IRQFLAGS and the
recently added CONFIG_VMAP_STACK.
Thanks to: Christophe Leroy, Guenter Roeck.
* tag 'powerpc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix CONFIG_TRACE_IRQFLAGS with CONFIG_VMAP_STACK
powerpc/futex: Fix incorrect user access blocking
Linus Torvalds [Sat, 8 Feb 2020 22:19:39 +0000 (14:19 -0800)]
Fix up remaining devm_ioremap_nocache() in SGI IOC3 8250 UART driver
This is a merge error on my part - the driver was merged into mainline
by commit 0686f032cfe3 ("Merge tag 'mips_5.6' of git://../mips/linux")
over a week ago, but nobody apparently noticed that it didn't actually
build due to still having a reference to the devm_ioremap_nocache()
function, removed a few days earlier through commit e2b45a40eb4f ("Merge
tag 'ioremap-5.6' of git://../ioremap").
Apparently this didn't get any build testing anywhere. Not perhaps all
that surprising: it's restricted to 64-bit MIPS only, and only with the
new SGI_MFD_IOC3 support enabled.
I only noticed because the ioremap conflicts in the ARM SoC driver
update made me check there weren't any others hiding, and I found this
one.
Linus Torvalds [Sat, 8 Feb 2020 22:17:27 +0000 (14:17 -0800)]
Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC late updates from Olof Johansson:
"This is some material that we picked up into our tree late, or that
had more complex dependencies on more than one topic branch that makes
sense to keep separately.
- TI support for secure accelerators and hwrng on OMAP4/5
- TI camera changes for dra7 and am437x and SGX improvement due to
better reset control support on am335x, am437x and dra7
- Davinci moves to proper clocksource on DM365, and regulator/audio
improvements for DM365 and DM644x eval boards"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
ARM: dts: omap4-droid4: Enable hdq for droid4 ds250x 1-wire battery nvmem
ARM: dts: motorola-cpcap-mapphone: Configure calibration interrupt
ARM: dts: Configure interconnect target module for am437x sgx
ARM: dts: Configure sgx for dra7
ARM: dts: Configure rstctrl reset for am335x SGX
ARM: dts: dra7: Add ti-sysc node for VPE
ARM: dts: dra7: add vpe clkctrl node
ARM: dts: am43x-epos-evm: Add VPFE and OV2659 entries
ARM: dts: am437x-sk-evm: Add VPFE and OV2659 entries
ARM: dts: am43xx: add support for clkout1 clock
arm: dts: dra76-evm: Add CAL and OV5640 nodes
arm: dtsi: dra76x: Add CAL dtsi node
arm: dts: dra72-evm-common: Add entries for the CSI2 cameras
ARM: dts: DRA72: Add CAL dtsi node
ARM: dts: dra7-l4: Add ti-sysc node for CAM
ARM: OMAP: DRA7xx: Make CAM clock domain SWSUP only
ARM: dts: dra7: add cam clkctrl node
ARM: OMAP2+: Drop legacy platform data for omap4 des
ARM: OMAP2+: Drop legacy platform data for omap4 sham
ARM: OMAP2+: Drop legacy platform data for omap4 aes
...
Linus Torvalds [Sat, 8 Feb 2020 22:15:41 +0000 (14:15 -0800)]
Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC defconfig updates from Olof Johansson:
"We keep this in a separate branch to avoid cross-branch conflicts, but
most of the material here is fairly boring -- some new drivers turned
on for hardware since they were merged, and some refreshed files due
to time having moved a lot of entries around"
* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (38 commits)
ARM: configs: at91: enable MMC_SDHCI_OF_AT91 and MICROCHIP_PIT64B
arm64: defconfig: Enable Broadcom's GENET Ethernet controller
ARM: multi_v7_defconfig: Enable devfreq thermal integration
ARM: exynos_defconfig: Enable devfreq thermal integration
ARM: multi_v7_defconfig: Enable NFS v4.1 and v4.2
ARM: exynos_defconfig: Enable NFS v4.1 and v4.2
arm64: defconfig: Enable Actions Semi specific drivers
arm64: defconfig: Enable Broadcom's STB PCIe controller
arm64: defconfig: Enable CONFIG_CLK_IMX8MP by default
ARM: configs: at91: enable config flags for sam9x60 SoC
ARM: configs: at91: use savedefconfig
arm64: defconfig: Enable tegra XUDC support
ARM: defconfig: gemini: Update defconfig
arm64: defconfig: enable CONFIG_ARM_QCOM_CPUFREQ_NVMEM
arm64: defconfig: enable CONFIG_QCOM_CPR
arm64: defconfig: Enable HFPLL
arm64: defconfig: Enable CRYPTO_DEV_FSL_CAAM
ARM: imx_v6_v7_defconfig: Select the TFP410 driver
ARM: imx_v6_v7_defconfig: Enable NFS_V4_1 and NFS_V4_2 support
arm64: defconfig: Enable ATH10K_SNOC
...
Linus Torvalds [Sat, 8 Feb 2020 22:04:19 +0000 (14:04 -0800)]
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC-related driver updates from Olof Johansson:
"Various driver updates for platforms:
- Nvidia: Fuse support for Tegra194, continued memory controller
pieces for Tegra30
- NXP/FSL: Refactorings of QuickEngine drivers to support
ARM/ARM64/PPC
- NXP/FSL: i.MX8MP SoC driver pieces
- TI Keystone: ring accelerator driver
- Qualcomm: SCM driver cleanup/refactoring + support for new SoCs.
- Xilinx ZynqMP: feature checking interface for firmware. Mailbox
communication for power management
- Overall support patch set for cpuidle on more complex hierarchies
(PSCI-based)
and misc cleanups, refactorings of Marvell, TI, other platforms"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (166 commits)
drivers: soc: xilinx: Use mailbox IPI callback
dt-bindings: power: reset: xilinx: Add bindings for ipi mailbox
drivers: soc: ti: knav_qmss_queue: Pass lockdep expression to RCU lists
MAINTAINERS: Add brcmstb PCIe controller entry
soc/tegra: fuse: Unmap registers once they are not needed anymore
soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
soc/tegra: fuse: Warn if straps are not ready
soc/tegra: fuse: Cache values of straps and Chip ID registers
memory: tegra30-emc: Correct error message for timed out auto calibration
memory: tegra30-emc: Firm up hardware programming sequence
memory: tegra30-emc: Firm up suspend/resume sequence
soc/tegra: regulators: Do nothing if voltage is unchanged
memory: tegra: Correct reset value of xusb_hostr
soc/tegra: fuse: Add APB DMA dependency for Tegra20
bus: tegra-aconnect: Remove PM_CLK dependency
dt-bindings: mediatek: add MT6765 power dt-bindings
soc: mediatek: cmdq: delete not used define
memory: tegra: Add support for the Tegra194 memory controller
memory: tegra: Only include support for enabled SoCs
memory: tegra: Support DVFS on Tegra186 and later
...
Linus Torvalds [Sat, 8 Feb 2020 21:55:25 +0000 (13:55 -0800)]
Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC platform updates from Olof Johansson:
"Most of these are smaller fixes that have accrued, and some continued
cleanup of OMAP platforms towards shared frameworks.
One new SoC from Atmel/Microchip: sam9x60"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
ARM: OMAP2+: Fix undefined reference to omap_secure_init
ARM: s3c64xx: Drop unneeded select of TIMER_OF
ARM: exynos: Drop unneeded select of MIGHT_HAVE_CACHE_L2X0
ARM: s3c24xx: Switch to atomic pwm API in rx1950
ARM: OMAP2+: sleep43xx: Call secure suspend/resume handlers
ARM: OMAP2+: Use ARM SMC Calling Convention when OP-TEE is available
ARM: OMAP2+: Introduce check for OP-TEE in omap_secure_init()
ARM: OMAP2+: Add omap_secure_init callback hook for secure initialization
ARM: at91: Documentation: add sam9x60 product and datasheet
ARM: at91: pm: use of_device_id array to find the proper shdwc node
ARM: at91: pm: use SAM9X60 PMC's compatible
ARM: imx: only select ARM_ERRATA_814220 for ARMv7-A
ARM: zynq: use physical cpuid in zynq_slcr_cpu_stop/start
ARM: tegra: Use clk_m CPU on Tegra124 LP1 resume
ARM: tegra: Modify reshift divider during LP1
ARM: tegra: Enable PLLP bypass during Tegra124 LP1
ARM: samsung: Rename Samsung and Exynos to lowercase
ARM: exynos: Correct the help text for platform Kconfig option
ARM: bcm: Select ARM_AMBA for ARCH_BRCMSTB
ARM: brcmstb: Add debug UART entry for 7216
...
Linus Torvalds [Sat, 8 Feb 2020 21:44:41 +0000 (13:44 -0800)]
Merge tag 'compat-ioctl-fix' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull compat-ioctl fix from Arnd Bergmann:
"One patch in the compat-ioctl series broke 32-bit rootfs for multiple
people testing on 64-bit kernels. Let's fix it in -rc1 before others
run into the same issue"
* tag 'compat-ioctl-fix' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground:
compat_ioctl: fix FIONREAD on devices
Linus Torvalds [Sat, 8 Feb 2020 21:26:41 +0000 (13:26 -0800)]
Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
"Saner fs_parser.c guts and data structures. The system-wide registry
of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
the horror switch() in fs_parse() that would have to grow another case
every time something got added to that system-wide registry.
New syntax types can be added by filesystems easily now, and their
namespace is that of functions - not of system-wide enum members. IOW,
they can be shared or kept private and if some turn out to be widely
useful, we can make them common library helpers, etc., without having
to do anything whatsoever to fs_parse() itself.
And we already get that kind of requests - the thing that finally
pushed me into doing that was "oh, and let's add one for timeouts -
things like 15s or 2h". If some filesystem really wants that, let them
do it. Without somebody having to play gatekeeper for the variants
blessed by direct support in fs_parse(), TYVM.
Quite a bit of boilerplate is gone. And IMO the data structures make a
lot more sense now. -200LoC, while we are at it"
* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
tmpfs: switch to use of invalfc()
cgroup1: switch to use of errorfc() et.al.
procfs: switch to use of invalfc()
hugetlbfs: switch to use of invalfc()
cramfs: switch to use of errofc() et.al.
gfs2: switch to use of errorfc() et.al.
fuse: switch to use errorfc() et.al.
ceph: use errorfc() and friends instead of spelling the prefix out
prefix-handling analogues of errorf() and friends
turn fs_param_is_... into functions
fs_parse: handle optional arguments sanely
fs_parse: fold fs_parameter_desc/fs_parameter_spec
fs_parser: remove fs_parameter_description name field
add prefix to fs_context->log
ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
new primitive: __fs_parse()
switch rbd and libceph to p_log-based primitives
struct p_log, variants of warnf() et.al. taking that one instead
teach logfc() to handle prefices, give it saner calling conventions
get rid of cg_invalf()
...
Linus Torvalds [Sat, 8 Feb 2020 21:04:49 +0000 (13:04 -0800)]
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
- bmap series from cmaiolino
- getting rid of convolutions in copy_mount_options() (use a couple of
copy_from_user() instead of the __get_user() crap)
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
saner copy_mount_options()
fibmap: Reject negative block numbers
fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
ecryptfs: drop direct calls to ->bmap
cachefiles: drop direct usage of ->bmap method.
fs: Enable bmap() function to properly return errors
Linus Torvalds [Sat, 8 Feb 2020 19:44:02 +0000 (11:44 -0800)]
Merge branch 'pipe-exclusive-wakeup'
Merge thundering herd avoidance on pipe IO.
This would have been applied for 5.5 already, but got delayed because of
a user-space race condition in the GNU make jobserver code. Now that
there's a new GNU make 4.3 release, and most distributions seem to have
at least applied the (almost three year old) fix for the problem, let's
see if people notice.
And it might have been just bad random timing luck on my machine.
If you do hit the race condition, things will still work, but the
symptom is that you don't get nearly the expected parallelism when using
"make -j<N>".
The jobserver bug can definitely happen without this patch too, but
seems to be easier to trigger when we no longer wake up pipe waiters
unnecessarily.
* pipe-exclusive-wakeup:
pipe: use exclusive waits when reading or writing
Linus Torvalds [Mon, 9 Dec 2019 17:48:27 +0000 (09:48 -0800)]
pipe: use exclusive waits when reading or writing
This makes the pipe code use separate wait-queues and exclusive waiting
for readers and writers, avoiding a nasty thundering herd problem when
there are lots of readers waiting for data on a pipe (or, less commonly,
lots of writers waiting for a pipe to have space).
While this isn't a common occurrence in the traditional "use a pipe as a
data transport" case, where you typically only have a single reader and
a single writer process, there is one common special case: using a pipe
as a source of "locking tokens" rather than for data communication.
In particular, the GNU make jobserver code ends up using a pipe as a way
to limit parallelism, where each job consumes a token by reading a byte
from the jobserver pipe, and releases the token by writing a byte back
to the pipe.
This pattern is fairly traditional on Unix, and works very well, but
will waste a lot of time waking up a lot of processes when only a single
reader needs to be woken up when a writer releases a new token.
A simplified test-case of just this pipe interaction is to create 64
processes, and then pass a single token around between them (this
test-case also intentionally passes another token that gets ignored to
test the "wake up next" logic too, in case anybody wonders about it):
#include <unistd.h>
int main(int argc, char **argv)
{
int fd[2], counters[2];
do {
int i;
read(fd[0], &i, sizeof(i));
if (i < 0)
continue;
counters[0] = i+1;
write(fd[1], counters, (1+(i & 1)) *sizeof(int));
} while (counters[0] < 1000000);
return 0;
}
and in a perfect world, passing that token around should only cause one
context switch per transfer, when the writer of a token causes a
directed wakeup of just a single reader.
But with the "writer wakes all readers" model we traditionally had, on
my test box the above case causes more than an order of magnitude more
scheduling: instead of the expected ~1M context switches, "perf stat"
shows
[ Note! This kernel improvement seems to be very good at triggering a
race condition in the make jobserver (in GNU make 4.2.1) for me. It's
a long known bug that was fixed back in June 2017 by GNU make commit b552b0525198 ("[SV 51159] Use a non-blocking read with pselect to
avoid hangs.").
But there wasn't a new release of GNU make until 4.3 on Jan 19 2020,
so a number of distributions may still have the buggy version. Some
have backported the fix to their 4.2.1 release, though, and even
without the fix it's quite timing-dependent whether the bug actually
is hit. ]
Josh Triplett says:
"I've been hammering on your pipe fix patch (switching to exclusive
wait queues) for a month or so, on several different systems, and I've
run into no issues with it. The patch *substantially* improves
parallel build times on large (~100 CPU) systems, both with parallel
make and with other things that use make's pipe-based jobserver.
All current distributions (including stable and long-term stable
distributions) have versions of GNU make that no longer have the
jobserver bug"
Arnd Bergmann [Fri, 7 Feb 2020 16:55:48 +0000 (17:55 +0100)]
compat_ioctl: fix FIONREAD on devices
My final cleanup patch for sys_compat_ioctl() introduced a regression on
the FIONREAD ioctl command, which is used for both regular and special
files, but only works on regular files after my patch, as I had missed
the warning that Al Viro put into a comment right above it.
Change it back so it can work on any file again by moving the implementation
to do_vfs_ioctl() instead.
Fixes: f956e418bf78 ("compat_ioctl: simplify the implementation") Reported-and-tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Reported-and-tested-by: youling257 <youling257@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tim Harvey [Fri, 7 Feb 2020 20:40:26 +0000 (12:40 -0800)]
net: thunderx: use proper interface type for RGMII
The configuration of the OCTEONTX XCV_DLL_CTL register via
xcv_init_hw() is such that the RGMII RX delay is bypassed
leaving the RGMII TX delay enabled in the MAC:
The following pull-request contains BPF updates for your *net* tree.
We've added 15 non-merge commits during the last 10 day(s) which contain
a total of 12 files changed, 114 insertions(+), 31 deletions(-).
The main changes are:
1) Various BPF sockmap fixes related to RCU handling in the map's tear-
down code, from Jakub Sitnicki.
2) Fix macro state explosion in BPF sk_storage map when calculating its
bucket_log on allocation, from Martin KaFai Lau.
3) Fix potential BPF sockmap update race by rechecking socket's established
state under lock, from Lorenz Bauer.
4) Fix crash in bpftool on missing xlated instructions when kptr_restrict
sysctl is set, from Toke Høiland-Jørgensen.
5) Fix i40e's XSK wakeup code to return proper error in busy state and
various misc fixes in xdpsock BPF sample code, from Maciej Fijalkowski.
6) Fix the way modifiers are skipped in BTF in the verifier while walking
pointers to avoid program rejection, from Alexei Starovoitov.
7) Fix Makefile for runqslower BPF tool to i) rebuild on libbpf changes and
ii) to fix undefined reference linker errors for older gcc version due to
order of passed gcc parameters, from Yulia Kartseva and Song Liu.
8) Fix a trampoline_count BPF kselftest warning about missing braces around
initializer, from Andrii Nakryiko.
9) Fix up redundant "HAVE" prefix from large INSN limit kernel probe in
bpftool, from Michal Rostecki.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
powerpc: Fix CONFIG_TRACE_IRQFLAGS with CONFIG_VMAP_STACK
When CONFIG_PROVE_LOCKING is selected together with (now default)
CONFIG_VMAP_STACK, kernel enter deadlock during boot.
At the point of checking whether interrupts are enabled or not, the
value of MSR saved on stack is read using the physical address of the
stack. But at this point, when using VMAP stack the DATA MMU
translation has already been re-enabled, leading to deadlock.
Don't use the physical address of the stack when
CONFIG_VMAP_STACK is set.
The early versions of our kernel user access prevention (KUAP) were
written by Russell and Christophe, and didn't have separate
read/write access.
At some point I picked up the series and added the read/write access,
but I failed to update the usages in futex.h to correctly allow read
and write.
However we didn't notice because of another bug which was causing the
low-level code to always enable read and write. That bug was fixed
recently in commit 07017385bb77 ("powerpc/kuap: Fix set direction in
allow/prevent_user_access()").
futex_atomic_cmpxchg_inatomic() is passed the user address as %3 and
does:
mt76: mt7615: fix max_nss in mt7615_eeprom_parse_hw_cap
Fix u8 cast reading max_nss from MT_TOP_STRAP_STA register in
mt7615_eeprom_parse_hw_cap routine
Fixes: db866a179ee5b ("mt76: mt7615: read {tx,rx} mask from eeprom") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Linus Torvalds [Sat, 8 Feb 2020 01:59:07 +0000 (17:59 -0800)]
Merge tag 'fuse-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix a regression introduced in v5.1 that triggers WARNINGs for some
fuse filesystems
- Fix an xfstest failure
- Allow overlayfs to be used on top of fuse/virtiofs
- Code and documentation cleanups
* tag 'fuse-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use true,false for bool variable
Documentation: filesystems: convert fuse to RST
fuse: Support RENAME_WHITEOUT flag
fuse: don't overflow LLONG_MAX with end offset
fix up iter on short count in fuse_direct_io()
Linus Torvalds [Sat, 8 Feb 2020 01:52:38 +0000 (17:52 -0800)]
Merge tag 'for-linus-5.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fix from Mike Marshall:
"Debugfs fix for orangefs.
Vasliy Averin noticed that 'if seq_file .next function does not change
position index, read after some lseek can generate unexpected output'
and sent in this fix"
* tag 'for-linus-5.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
help_next should increase position index
Linus Torvalds [Sat, 8 Feb 2020 01:50:21 +0000 (17:50 -0800)]
Merge tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Highlights:
- Server-to-server copy code from Olga.
To use it, client and both servers must have support, the target
server must be able to access the source server over NFSv4.2, and
the target server must have the inter_copy_offload_enable module
parameter set.
- Improvements and bugfixes for the new filehandle cache, especially
in the container case, from Trond
- Also from Trond, better reporting of write errors.
- Y2038 work from Arnd"
* tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux: (55 commits)
sunrpc: expiry_time should be seconds not timeval
nfsd: make nfsd_filecache_wq variable static
nfsd4: fix double free in nfsd4_do_async_copy()
nfsd: convert file cache to use over/underflow safe refcount
nfsd: Define the file access mode enum for tracing
nfsd: Fix a perf warning
nfsd: Ensure sampling of the write verifier is atomic with the write
nfsd: Ensure sampling of the commit verifier is atomic with the commit
sunrpc: clean up cache entry add/remove from hashtable
sunrpc: Fix potential leaks in sunrpc_cache_unhash()
nfsd: Ensure exclusion between CLONE and WRITE errors
nfsd: Pass the nfsd_file as arguments to nfsd4_clone_file_range()
nfsd: Update the boot verifier on stable writes too.
nfsd: Fix stable writes
nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument
nfsd: Fix a soft lockup race in nfsd_file_mark_find_or_create()
nfsd: Reduce the number of calls to nfsd_file_gc()
nfsd: Schedule the laundrette regularly irrespective of file errors
nfsd: Remove unused constant NFSD_FILE_LRU_RESCAN
nfsd: Containerise filecache laundrette
...
Linus Torvalds [Sat, 8 Feb 2020 01:39:56 +0000 (17:39 -0800)]
Merge tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Puyll NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Fix memory leaks and corruption in readdir # v2.6.37+
- Directory page cache needs to be locked when read # v2.6.37+
New features:
- Convert NFS to use the new mount API
- Add "softreval" mount option to let clients use cache if server goes down
- Add a config option to compile without UDP support
- Limit the number of inactive delegations the client can cache at once
- Improved readdir concurrency using iterate_shared()
Other bugfixes and cleanups:
- More 64-bit time conversions
- Add additional diagnostic tracepoints
- Check for holes in swapfiles, and add dependency on CONFIG_SWAP
- Various xprtrdma cleanups to prepare for 5.7's changes
- Several fixes for NFS writeback and commit handling
- Fix acls over krb5i/krb5p mounts
- Recover from premature loss of openstateids
- Fix NFS v3 chacl and chmod bug
- Compare creds using cred_fscmp()
- Use kmemdup_nul() in more places
- Optimize readdir cache page invalidation
- Lease renewal and recovery fixes"
* tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (93 commits)
NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
NFSv4: try lease recovery on NFS4ERR_EXPIRED
NFS: Fix memory leaks
nfs: optimise readdir cache page invalidation
NFS: Switch readdir to using iterate_shared()
NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
NFS: Directory page cache pages need to be locked when read
NFS: Fix memory leaks and corruption in readdir
SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
NFSv4: Limit the total number of cached delegations
NFSv4: Add accounting for the number of active delegations held
NFSv4: Try to return the delegation immediately when marked for return on close
NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
NFS: nfs_find_open_context() should use cred_fscmp()
NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
NFS: remove unused macros
nfs: Return EINVAL rather than ERANGE for mount parse errors
...
That's one line (the assignment line) that is 2,686,974 characters in
length.
Now, sparse does happen to react particularly badly to that (I didn't
look to why, but I suspect it's just that evaluating all the types
that don't actually ever end up getting used ends up being much more
expensive than it should be), but I bet it's not good for gcc either.
Fixes: c8c09b4eed3c ("bpf: Introduce bpf sk local storage") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Link: https://lore.kernel.org/bpf/20200207081810.3918919-1-kafai@fb.com
Jakub Sitnicki [Thu, 6 Feb 2020 11:16:52 +0000 (12:16 +0100)]
selftests/bpf: Test freeing sockmap/sockhash with a socket in it
Commit 50f5bbf92168 ("bpf: Sockmap, ensure sock lock held during tear
down") introduced sleeping issues inside RCU critical sections and while
holding a spinlock on sockmap/sockhash tear-down. There has to be at least
one socket in the map for the problem to surface.
This adds a test that triggers the warnings for broken locking rules. Not a
fix per se, but rather tooling to verify the accompanying fixes. Run on a
VM with 1 vCPU to reproduce the warnings.
Fixes: 50f5bbf92168 ("bpf: Sockmap, ensure sock lock held during tear down") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200206111652.694507-4-jakub@cloudflare.com
Jakub Sitnicki [Thu, 6 Feb 2020 11:16:51 +0000 (12:16 +0100)]
bpf, sockhash: Synchronize_rcu before free'ing map
We need to have a synchronize_rcu before free'ing the sockhash because any
outstanding psock references will have a pointer to the map and when they
use it, this could trigger a use after free.
This is a sister fix for sockhash, following commit e4b40f977294 ("bpf:
sockmap, synchronize_rcu before free'ing map") which addressed sockmap,
which comes from a manual audit.
Fixes: eaf66302e4568 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200206111652.694507-3-jakub@cloudflare.com
Jakub Sitnicki [Thu, 6 Feb 2020 11:16:50 +0000 (12:16 +0100)]
bpf, sockmap: Don't sleep while holding RCU lock on tear-down
rcu_read_lock is needed to protect access to psock inside sock_map_unref
when tearing down the map. However, we can't afford to sleep in lock_sock
while in RCU read-side critical section. Grab the RCU lock only after we
have locked the socket.
This fixes RCU warnings triggerable on a VM with 1 vCPU when free'ing a
sockmap/sockhash that contains at least one socket:
Fixes: 50f5bbf92168 ("bpf: Sockmap, ensure sock lock held during tear down") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200206111652.694507-2-jakub@cloudflare.com
bpftool: Don't crash on missing xlated program instructions
Turns out the xlated program instructions can also be missing if
kptr_restrict sysctl is set. This means that the previous fix to check the
jited_prog_insns pointer was insufficient; add another check of the
xlated_prog_insns pointer as well.
Fixes: 7ff3995b6e18 ("bpftool: Don't crash on missing jited insns or ksyms") Fixes: f007a2f8d7e3 ("bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200206102906.112551-1-toke@redhat.com
Lorenz Bauer [Fri, 7 Feb 2020 10:37:12 +0000 (10:37 +0000)]
bpf, sockmap: Check update requirements after locking
It's currently possible to insert sockets in unexpected states into
a sockmap, due to a TOCTTOU when updating the map from a syscall.
sock_map_update_elem checks that sk->sk_state == TCP_ESTABLISHED,
locks the socket and then calls sock_map_update_common. At this
point, the socket may have transitioned into another state, and
the earlier assumptions don't hold anymore. Crucially, it's
conceivable (though very unlikely) that a socket has become unhashed.
This breaks the sockmap's assumption that it will get a callback
via sk->sk_prot->unhash.
Fix this by checking the (fixed) sk_type and sk_protocol without the
lock, followed by a locked check of sk_state.
Unfortunately it's not possible to push the check down into
sock_(map|hash)_update_common, since BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
run before the socket has transitioned from TCP_SYN_RECV into
TCP_ESTABLISHED.
Fixes: eaf66302e456 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20200207103713.28175-1-lmb@cloudflare.com
Linus Torvalds [Fri, 7 Feb 2020 21:03:10 +0000 (13:03 -0800)]
Merge tag 'docs-5.6-2' of git://git.lwn.net/linux
Pull Documentation fixes from Jonathan Corbet:
"A handful of small documentation fixes that wandered in"
* tag 'docs-5.6-2' of git://git.lwn.net/linux:
Allow git builds of Sphinx
Documentation: changes.rst: update several outdated project URLs
Documentation: build warnings related to missing blank lines after explicit markups has been fixed
mailmap: add entry for Tiezhu Yang
Documentation/ko_KR/howto: Update a broken link
Documentation/ko_KR/howto: Update broken web addresses
docs/locking: Fix outdated section names
Linus Torvalds [Fri, 7 Feb 2020 20:51:54 +0000 (12:51 -0800)]
Merge tag 'acpi-5.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"Add Hisilicon Hip08-Lite I2C controller clock frequency support to the
ACPI driver for AMD SoCs (APD) and to the Designware I2C driver
(Hanjun Guo)"
* tag 'acpi-5.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
i2c: designware: Add ACPI HID for Hisilicon Hip08-Lite I2C controller
ACPI / APD: Add clock frequency for Hisilicon Hip08-Lite I2C controller
Linus Torvalds [Fri, 7 Feb 2020 20:49:10 +0000 (12:49 -0800)]
Merge tag 'pm-5.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
- Update the recently merged CPR (Core Power Reduction) support in the
AVS (Adaptive Voltage Scaling) subsystem (Brendan Higgins, Nathan
Chancellor, Niklas Cassel)
- Update the rockchip-io AVS driver (Heiko Stuebner)
- Add two more module parameters to intel_idle on top of the recently
merged material (Rafael Wysocki)
- Clean up a piece of cpuidle documentation and consolidate system
sleep states documentation (Rafael Wysocki)
* tag 'pm-5.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: Documentation: Clean up PM QoS description
Documentation: admin-guide: PM: Update sleep states documentation
intel_idle: Introduce 'states_off' module parameter
intel_idle: Introduce 'use_acpi' module parameter
power: avs: qcom-cpr: Avoid clang -Wsometimes-uninitialized in cpr_scale
power: avs: qcom-cpr: add unspecified HAS_IOMEM dependency
PM / AVS: rockchip-io: fix the supply naming for the emmc supply on px30
power: avs: qcom-cpr: add a printout after the driver has been initialized
Linus Torvalds [Fri, 7 Feb 2020 20:46:08 +0000 (12:46 -0800)]
Merge tag 'drm-next-2020-02-07' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just some fixes for this merge window: the tegra changes fix some
regressions in the merge, nouveau has a few modesetting fixes.
The amdgpu fixes are bit bigger, but they contain a couple of weeks of
fixes, and don't seem to contain anything that isn't really a fix.
Summary:
tegra:
- merge window regression fixes
nouveau:
- couple of volta/turing modesetting fixes
amdgpu:
- EDC fixes for Arcturus
- GDDR6 memory training fixe
- Fix for reading gfx clockgating registers while in GFXOFF state
- i2c freq fixes
- Misc display fixes
- TLB invalidation fix when using semaphores
- VCN 2.5 instancing fixes
- Switch raven1 gfxoff to a blacklist
- Coreboot workaround for KV/KB
- Root cause dongle fixes for display and revert workaround
- Enable GPU reset for renoir and navi
- Navi overclocking fixes
- Fix up confusing warnings in display clock validation on raven
amdkfd:
- SDMA fix
radeon:
- Misc LUT fixes"
* tag 'drm-next-2020-02-07' of git://anongit.freedesktop.org/drm/drm: (90 commits)
gpu: host1x: Set DMA direction only for DMA-mapped buffer objects
drm/tegra: Reuse IOVA mapping where possible
drm/tegra: Relax IOMMU usage criteria on old Tegra
drm/amd/dm/mst: Ignore payload update failures
drm/amdgpu: update default voltage for boot od table for navi1x
drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
drm/amdgpu: fetch default VDDC curve voltages (v2)
drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)
drm/amdgpu/navi10: add OD_RANGE for navi overclocking
drm/amdgpu/navi: fix index for OD MCLK
drm/amd/display: Fix HW/SW state mismatch
drm/amd/display: Fix a typo when computing dsc configuration
drm/amd/powerplay: fix navi10 system intermittent reboot issue V2
drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
drm/amd/display: Only enable cursor on pipes that need it
drm/nouveau/kms/gv100-: avoid sending a core update until the first modeset
drm/nouveau/kms/gv100-: move window ownership setup into modesetting path
drm/nouveau/disp/gv100-: halt NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms
...