IRQs are requested during driver's ndo_open() and then later
freed up in disable_interrupts() during driver unload.
A race exists where driver can set the CXGB4_FULL_INIT_DONE
flag in ndo_open() after the disable_interrupts() in driver
unload path checks it, and hence misses calling free_irq().
Fix by unregistering netdevice first and sync with driver's
ndo_open(). This ensures disable_interrupts() checks the flag
correctly and frees up the IRQs properly.
Fixes: d716f24e66f1 ("cxgb4: Disable interrupts and napi before unregistering netdev") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Ma [Thu, 8 Jul 2021 13:17:10 +0000 (21:17 +0800)]
mt76: mt7921: continue to probe driver when fw already downloaded
When reboot system, no power cycles, firmware is already downloaded,
return -EIO will break driver as error:
mt7921e: probe of 0000:03:00.0 failed with error -5
Skip firmware download and continue to probe.
Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Fixes: 7ea08e64dc06a ("mt76: mt7921: add MCU support") Signed-off-by: David S. Miller <davem@davemloft.net>
Since Mikrotik 10/25G NIC MDIO op emulation is not 100% reliable,
on rare occasions it can happen that some physical functions of
the NIC do not get initialized due to timeouted early MDIO op.
This changes the atl1c probe on Mikrotik 10/25G NIC not to
depend on MDIO op emulation.
Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Lemon [Thu, 8 Jul 2021 18:04:08 +0000 (11:04 -0700)]
ptp: Relocate lookup cookie to correct block.
An earlier commit set the pps_lookup cookie, but the line
was somehow added to the wrong code block. Correct this.
Fixes: a8c1e90a0c6c ("ptp: Set lookup cookie when creating a PTP PPS source.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Dario Binacchi <dariobin@libero.it> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 8 Jul 2021 07:21:09 +0000 (00:21 -0700)]
ipv6: tcp: drop silly ICMPv6 packet too big messages
While TCP stack scales reasonably well, there is still one part that
can be used to DDOS it.
IPv6 Packet too big messages have to lookup/insert a new route,
and if abused by attackers, can easily put hosts under high stress,
with many cpus contending on a spinlock while one is stuck in fib6_run_gc()
Some of our servers have been hit by malicious ICMPv6 packets
trying to _increase_ the MTU/MSS of TCP flows.
We believe these ICMPv6 packets are a result of a bug in one ISP stack,
since they were blindly sent back for _every_ (small) packet sent to them.
These packets are for one TCP flow:
09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
TCP stack can filter some silly requests :
1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err()
2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS.
This tests happen before the IPv6 routing stack is entered, thus
removing the potential contention and route exhaustion.
Note that IPv6 stack was performing these checks, but too late
(ie : after the route has been added, and after the potential
garbage collect war)
v2: fix typo caught by Martin, thanks !
v3: exports tcp_mtu_to_mss(), caught by David, thanks !
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We will fail to build with CONFIG_SKB_EXTENSIONS disabled after 01eebbaeeb36 ("skbuff: Release nfct refcount on napi stolen or re-used
skbs") since there is an unconditionally use of skb_ext_find() without
an appropriate stub. Simply build the code conditionally and properly
guard against both COFNIG_SKB_EXTENSIONS as well as
CONFIG_NET_TC_SKB_EXT being disabled.
Fixes: Fixes: 01eebbaeeb36 ("skbuff: Release nfct refcount on napi stolen or re-used skbs") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 7 Jul 2021 10:01:00 +0000 (13:01 +0300)]
sock: unlock on error in sock_setsockopt()
If copy_from_sockptr() then we need to unlock before returning.
Fixes: e9964564dc86 ("net: sock: extend SO_TIMESTAMPING for PHC binding") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 7 Jul 2021 08:15:30 +0000 (16:15 +0800)]
selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
After redirecting, it's already a new path. So the old PMTU info should
be cleared. The IPv6 test "mtu exception plus redirect" should only
has redirect info without old PMTU.
The IPv4 test can not be changed because of legacy.
Fixes: 6985dbf5fdb9 ("selftests: Add redirect tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 7 Jul 2021 08:15:29 +0000 (16:15 +0800)]
selftests: icmp_redirect: remove from checking for IPv6 route get
If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC
info will not be exported to userspace in rt6_fill_node(). And ip cmd will
not print "from ::" to the route output. So remove this check.
Fixes: 6985dbf5fdb9 ("selftests: Add redirect tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
The "plat->phy_interface" variable is an enum and in this context GCC
will treat it as an unsigned int so the error handling is never
triggered.
Fixes: 09faf2653485 ("net: stmmac: platform: fix probe for ACPI devices") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-loongson: Fix unsigned comparison to zero
plat->phy_interface is unsigned integer, so the condition
can't be less than zero and the warning will never printed.
Fixes: 8faa7848cdb4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 6 Jul 2021 09:13:35 +0000 (11:13 +0200)]
ipv6: fix 'disable_policy' for fwd packets
The goal of commit 1a506b452042 ("ipv6: Provide ipv6 version of
"disable_policy" sysctl") was to have the disable_policy from ipv4
available on ipv6.
However, it's not exactly the same mechanism. On IPv4, all packets coming
from an interface, which has disable_policy set, bypass the policy check.
For ipv6, this is done only for local packets, ie for packets destinated to
an address configured on the incoming interface.
Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same
effect for both protocols.
My first approach was to create a new kind of route cache entries, to be
able to set DST_NOPOLICY without modifying routes. This would have added a
lot of code. Because the local delivery path is already handled, I choose
to focus on the forwarding path to minimize code churn.
Fixes: 1a506b452042 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 6 Jul 2021 11:18:02 +0000 (12:18 +0100)]
octeontx2-pf: Fix assigned error return value that is never used
Currently when the call to otx2_mbox_alloc_msg_cgx_mac_addr_update fails
the error return variable rc is being assigned -ENOMEM and does not
return early. rc is then re-assigned and the error case is not handled
correctly. Fix this by returning -ENOMEM rather than assigning rc.
Addresses-Coverity: ("Unused value") Fixes: d10266c4143f ("octeontx2-pf: offload DMAC filters to CGX/RPM block") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series fixes some problems related to bonding ipsec offload.
The 1, 5, and 8th patches are to add a missing rcu_read_lock().
The 2nd patch is to add null check code to bond_ipsec_add_sa.
When bonding interface doesn't have an active real interface, the
bond->curr_active_slave pointer is null.
But bond_ipsec_add_sa() uses that pointer without null check.
So that it results in null-ptr-deref.
The 3 and 4th patches are to replace xs->xso.dev with xs->xso.real_dev.
The 6th patch is to disallow to set ipsec offload if a real interface
type is bonding.
The 7th patch is to add struct bond_ipsec to manage SA.
If bond mode is changed, or active real interface is changed, SA should
be removed from old current active real interface then it should be added
to new active real interface.
But it can't, because it doesn't manage SA.
The 9th patch is to fix incorrect return value of bond_ipsec_offload_ok().
v1 -> v2:
- Add 9th patch.
- Do not print warning when there is no SA in bond_ipsec_add_sa_all().
- Add comment for ipsec_lock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix incorrect return value of bond_ipsec_offload_ok()
bond_ipsec_offload_ok() is called to check whether the interface supports
ipsec offload or not.
bonding interface support ipsec offload only in active-backup mode.
So, if a bond interface is not in active-backup mode, it should return
false but it returns true.
Fixes: 4a36c3eccd30 ("bonding: allow xfrm offload setup post-module-load") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
To dereference bond->curr_active_slave, it uses rcu_dereference().
But it and the caller doesn't acquire RCU so a warning occurs.
So add rcu_read_lock().
Fixes: 4bd94cbd3531 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding has been supporting ipsec offload.
When SA is added, bonding just passes SA to its own active real interface.
But it doesn't manage SA.
So, when events(add/del real interface, active real interface change, etc)
occur, bonding can't handle that well because It doesn't manage SA.
So some problems(panic, UAF, refcnt leak)occur.
In order to make it stable, it should manage SA.
That's the reason why struct bond_ipsec is added.
When a new SA is added to bonding interface, it is stored in the
bond_ipsec list. And the SA is passed to a current active real interface.
If events occur, it uses bond_ipsec data to handle these events.
bond->ipsec_list is protected by bond->ipsec_lock.
If a current active real interface is changed, the following logic works.
1. delete all SAs from old active real interface
2. Add all SAs to the new active real interface.
3. If a new active real interface doesn't support ipsec offload or SA's
option, it sets real_dev to NULL.
Fixes: 4bd94cbd3531 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding interface can be nested and it supports ipsec offload.
So, it allows setting the nested bonding + ipsec scenario.
But code does not support this scenario.
So, it should be disallowed.
interface graph:
bond2
|
bond1
|
eth0
The nested bonding + ipsec offload may not a real usecase.
So, disallowing this scenario is fine.
Fixes: 4bd94cbd3531 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
To dereference bond->curr_active_slave, it uses rcu_dereference().
But it and the caller doesn't acquire RCU so a warning occurs.
So add rcu_read_lock().
Test commands:
ip netns add A
ip netns exec A bash
modprobe netdevsim
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add bond0 type bond
ip link set eth0 master bond0
ip link set eth0 up
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
ip x s f
Fixes: 4bd94cbd3531 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
These are used in callback functions of struct xfrmdev_ops.
The *dev points whether bonding interface or real interface.
If bonding ipsec offload is used, it points bonding interface If not,
it points real interface.
And real_dev always points real interface.
So, ixgbevf should always use real_dev instead of dev.
Of course, real_dev always not be null.
Test commands:
ip link add bond0 type bond
#eth0 is ixgbevf interface
ip link set eth0 master bond0
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
Fixes: 4556eb15dbc4 ("xfrm: bail early on slave pass over skb") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: netdevsim: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
These are used in callback functions of struct xfrmdev_ops.
The *dev points whether bonding interface or real interface.
If bonding ipsec offload is used, it points bonding interface If not,
it points real interface.
And real_dev always points real interface.
So, netdevsim should always use real_dev instead of dev.
Of course, real_dev always not be null.
Test commands:
ip netns add A
ip netns exec A bash
modprobe netdevsim
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add bond0 type bond mode active-backup
ip link set eth0 master bond0
ip link set eth0 up
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
Fixes: 4556eb15dbc4 ("xfrm: bail early on slave pass over skb") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix null dereference in bond_ipsec_add_sa()
If bond doesn't have real device, bond->curr_active_slave is null.
But bond_ipsec_add_sa() dereferences bond->curr_active_slave without
null checking.
So, null-ptr-deref would occur.
Test commands:
ip link add bond0 type bond
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \
0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
Fixes: 4bd94cbd3531 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
To dereference bond->curr_active_slave, it uses rcu_dereference().
But it and the caller doesn't acquire RCU so a warning occurs.
So add rcu_read_lock().
Test commands:
ip link add dummy0 type dummy
ip link add bond0 type bond
ip link set dummy0 master bond0
ip link set dummy0 up
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
mode transport \
reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel \
src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
dev bond0 dir in
Fixes: 4bd94cbd3531 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized
This commit fixes a bug (found by syzkaller) that could cause spurious
double-initializations for congestion control modules, which could cause
memory leaks or other problems for congestion control modules (like CDG)
that allocate memory in their init functions.
The buggy scenario constructed by syzkaller was something like:
(1) create a TCP socket
(2) initiate a TFO connect via sendto()
(3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION),
which calls:
tcp_set_congestion_control() ->
tcp_reinit_congestion_control() ->
tcp_init_congestion_control()
(4) receive ACK, connection is established, call tcp_init_transfer(),
set icsk_ca_initialized=0 (without first calling cc->release()),
call tcp_init_congestion_control() again.
Note that in this sequence tcp_init_congestion_control() is called
twice without a cc->release() call in between. Thus, for CC modules
that allocate memory in their init() function, e.g, CDG, a memory leak
may occur. The syzkaller tool managed to find a reproducer that
triggered such a leak in CDG.
The bug was introduced when that commit a435903dcfb8 ("tcp: Only init
congestion control if not initialized already")
introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in
tcp_init_transfer(), missing the possibility for a sequence like the
one above, where a process could call setsockopt(TCP_CONGESTION) in
state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()),
which would call tcp_init_congestion_control(). It did not intend to
reset any initialization that the user had already explicitly made;
it just missed the possibility of that particular sequence (which
syzkaller managed to find).
Fixes: a435903dcfb8 ("tcp: Only init congestion control if not initialized already") Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Mon, 5 Jul 2021 10:54:51 +0000 (13:54 +0300)]
skbuff: Release nfct refcount on napi stolen or re-used skbs
When multiple SKBs are merged to a new skb under napi GRO,
or SKB is re-used by napi, if nfct was set for them in the
driver, it will not be released while freeing their stolen
head state or on re-use.
Release nfct on napi's stolen or re-used SKBs, and
in gro_list_prepare, check conntrack metadata diff.
Fixes: 70da334c62b6 ("net/mlx5e: CT: Handle misses after executing CT action") Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This guarantees that nf_conntrack_locks_all() prevents any
concurrent nf_conntrack_lock() owners:
If a thread past a1), then b2) will block until that thread releases
the lock.
If the threat is before a1, then b3)+a1) ensure the write b1) is
visible, thus a2) is guaranteed to see the updated value.
But: This is only the latest time when b1) becomes visible.
It may also happen that b1) is visible an undefined amount of time
before the b3). And thus KCSAN will notice a data race.
In addition, the compiler might be too clever.
Solution: Use WRITE_ONCE().
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ali Abdallah [Thu, 20 May 2021 10:53:11 +0000 (12:53 +0200)]
netfilter: conntrack: improve RST handling when tuple is re-used
If we receive a SYN packet in original direction on an existing
connection tracking entry, we let this SYN through because conntrack
might be out-of-sync.
Conntrack gets back in sync when server responds with SYN/ACK and state
gets updated accordingly.
However, if server replies with RST, this packet might be marked as
INVALID because td_maxack value reflects the *old* conntrack state
and not the state of the originator of the RST.
Avoid td_maxack-based checks if previous packet was a SYN.
Unfortunately that is not be enough: an out of order ACK in original
direction updates last_index, so we still end up marking valid RST.
Thus disable the sequence check when we are not in established state and
the received RST has a sequence of 0.
Because marking RSTs as invalid usually leads to unwanted timeouts,
also skip RST sequence checks if a conntrack entry is already closing.
Such entries can already be evicted via GC in case the table is full.
Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Ali Abdallah <aabdallah@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Mon, 5 Jul 2021 17:16:18 +0000 (10:16 -0700)]
Merge branch 'stmmac-ptp'
Xiaoliang Yang says:
====================
net: stmmac: re-configure tas basetime after ptp time adjust
If the DWMAC Ethernet device has already set the Qbv EST configuration
before using ptp to synchronize the time adjustment, the Qbv base time
may change to be the past time of the new current time. This is not
allowed by hardware.
This patch calculates and re-configures the Qbv basetime after ptp time
adjustment.
v1->v2:
Update est mutex lock to protect btr/ctr r/w to be atomic.
Add btr_reserve to store basetime from qopt and used as origin base
time in Qbv re-configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Mon, 5 Jul 2021 10:26:55 +0000 (18:26 +0800)]
net: stmmac: ptp: update tas basetime after ptp adjust
After adjusting the ptp time, the Qbv base time may be the past time
of the new current time. dwmac5 hardware limited the base time cannot
be set as past time. This patch add a btr_reserve to store the base
time get from qopt, then calculate the base time and reset the Qbv
configuration after ptp time adjust.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Mon, 5 Jul 2021 09:46:17 +0000 (17:46 +0800)]
ptp: fix format string mismatch in ptp_sysfs.c
Fix format string mismatch in ptp_sysfs.c. Use %u for unsigned int.
Fixes: 52c449a86022 ("ptp: support ptp physical/virtual clocks conversion") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Mon, 5 Jul 2021 08:53:06 +0000 (16:53 +0800)]
ptp: fix NULL pointer dereference in ptp_clock_register
Fix NULL pointer dereference in ptp_clock_register. The argument
"parent" of ptp_clock_register may be NULL pointer.
Fixes: 52c449a86022 ("ptp: support ptp physical/virtual clocks conversion") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: marvell: always set skb_shared_info in mvneta_swbm_add_rx_fragment
Always set skb_shared_info data structure in mvneta_swbm_add_rx_fragment
routine even if the fragment contains only the ethernet FCS.
Fixes: b8466baf25f7 ("net: mvneta: alloc skb_shared_info on the mvneta_rx_swbm stack") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Jul 2021 22:38:43 +0000 (00:38 +0200)]
udp: properly flush normal packet at GRO time
If an UDP packet enters the GRO engine but is not eligible
for aggregation and is not targeting an UDP tunnel,
udp_gro_receive() will not set the flush bit, and packet
could delayed till the next napi flush.
Fix the issue ensuring non GROed packets traverse
skb_gro_flush_final().
Reported-and-tested-by: Matthias Treydte <mt@waldheinz.de> Fixes: e511ff7d416f ("udp: skip L4 aggregation for UDP tunnel packets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3: fix cksum offload issues for tunnels with non-default udp ports
Commit bcef8f5256d6 ("vmxnet3: add geneve and vxlan tunnel offload
support") added support for encapsulation offload. However, the inner
offload capability is to be restricted to UDP tunnels with default
Vxlan and Geneve ports.
This patch fixes the issue for tunnels with non-default ports using
features check capability and filtering appropriate features for such
tunnels.
Fixes: bcef8f5256d6 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Fri, 2 Jul 2021 09:21:39 +0000 (11:21 +0200)]
nfp: flower-ct: remove callback delete deadlock
The current location of the callback delete can lead to a race
condition where deleting the callback requires a write_lock on
the nf_table, but at the same time another thread from netfilter
could have taken a read lock on the table before trying to offload.
Since the driver is taking a rtnl_lock this can lead into a deadlock
situation, where the netfilter offload will wait for the cls_flower
rtnl_lock to be released, but this cannot happen since this is
waiting for the nf_table read_lock to be released before it can
delete the callback.
Solve this by completely removing the nf_flow_table_offload_del_cb
call, as this will now be cleaned up by act_ct itself when cleaning
up the specific nf_table.
Fixes: 8c5122f345f6 ("nfp: flower-ct: add nft callback stubs") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Fri, 2 Jul 2021 09:21:38 +0000 (11:21 +0200)]
net/sched: act_ct: remove and free nf_table callbacks
When cleaning up the nf_table in tcf_ct_flow_table_cleanup_work
there is no guarantee that the callback list, added to by
nf_flow_table_offload_add_cb, is empty. This means that it is
possible that the flow_block_cb memory allocated will be lost.
Fix this by iterating the list and freeing the flow_block_cb entries
before freeing the nf_table entry (via freeing ct_ft).
Fixes: f933e40df605 ("netfilter: flowtable: Add API for registering to flow table events") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bridge: sync fdb to new unicast-filtering ports
Since commit ec4f933fb97e ("bridge: Automatically manage
port promiscuous mode.")
bridges with `vlan_filtering 1` and only 1 auto-port don't
set IFF_PROMISC for unicast-filtering-capable ports.
Normally on port changes `br_manage_promisc` is called to
update the promisc flags and unicast filters if necessary,
but it cannot distinguish between *new* ports and ones
losing their promisc flag, and new ports end up not
receiving the MAC address list.
Fix this by calling `br_fdb_sync_static` in `br_add_if`
after the port promisc flags are updated and the unicast
filter was supposed to have been filled.
Fixes: ec4f933fb97e ("bridge: Automatically manage port promiscuous mode.") Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 2 Jul 2021 14:41:01 +0000 (07:41 -0700)]
sock: fix error in sock_setsockopt()
Some tests are failing, John bisected the issue to a recent commit.
sock_set_timestamp() parameters should be :
1) sk
2) optname
3) valbool
Fixes: 1e98a495fa9f ("sock: expose so_timestamp options for mptcp") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: John Sperbeck <jsperbeck@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Florian Westphal <fw@strlen.de> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 2 Jul 2021 20:09:03 +0000 (13:09 -0700)]
tcp: annotate data races around tp->mtu_info
While tp->mtu_info is read while socket is owned, the write
sides happen from err handlers (tcp_v[46]_mtu_reduced)
which only own the socket spinlock.
Fixes: 26ef99d7a34d ("tcp: dont drop MTU reduction indications") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3863014e58fb ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") introduced
ip6_skb_dst_mtu with return value of signed int which is inconsistent
with actually returned values. Also 2 users of this function actually
assign its value to unsigned int variable and only __xfrm6_output
assigns result of this function to signed variable but actually uses
as unsigned in further comparisons and calls. Change this function
to return unsigned int value.
Fixes: 3863014e58fb ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
gve: Simplify code and axe the use of a deprecated API
The wrappers in include/linux/pci-dma-compat.h should go away.
Replace 'pci_set_dma_mask/pci_set_consistent_dma_mask' by an equivalent
and less verbose 'dma_set_mask_and_coherent()' call.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfo
Two patches listed below removed ctnetlink_dump_helpinfo call from under
rcu_read_lock. Now its rcu_dereference generates following warning:
=============================
WARNING: suspicious RCU usage
5.13.0+ #5 Not tainted
-----------------------------
net/netfilter/nf_conntrack_netlink.c:221 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
stack backtrace:
CPU: 1 PID: 2251 Comm: conntrack Not tainted 5.13.0+ #5
Call Trace:
dump_stack+0x7f/0xa1
ctnetlink_dump_helpinfo+0x134/0x150 [nf_conntrack_netlink]
ctnetlink_fill_info+0x2c2/0x390 [nf_conntrack_netlink]
ctnetlink_dump_table+0x13f/0x370 [nf_conntrack_netlink]
netlink_dump+0x10c/0x370
__netlink_dump_start+0x1a7/0x260
ctnetlink_get_conntrack+0x1e5/0x250 [nf_conntrack_netlink]
nfnetlink_rcv_msg+0x613/0x993 [nfnetlink]
netlink_rcv_skb+0x50/0x100
nfnetlink_rcv+0x55/0x120 [nfnetlink]
netlink_unicast+0x181/0x260
netlink_sendmsg+0x23f/0x460
sock_sendmsg+0x5b/0x60
__sys_sendto+0xf1/0x160
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x36/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
nf_ct_gre_keymap_flush() is useless.
It is called from nf_conntrack_cleanup_net_list() only and tries to remove
nf_ct_gre_keymap entries from pernet gre keymap list. Though:
a) at this point the list should already be empty, all its entries were
deleted during the conntracks cleanup, because
nf_conntrack_cleanup_net_list() executes nf_ct_iterate_cleanup(kill_all)
before nf_conntrack_proto_pernet_fini():
nf_conntrack_cleanup_net_list
+- nf_ct_iterate_cleanup
| nf_ct_put
| nf_conntrack_put
| nf_conntrack_destroy
| destroy_conntrack
| destroy_gre_conntrack
| nf_ct_gre_keymap_destroy
`- nf_conntrack_proto_pernet_fini
nf_ct_gre_keymap_flush
b) Let's say we find that the keymap list is not empty. This means netns
still has a conntrack associated with gre, in which case we should not free
its memory, because this will lead to a double free and related crashes.
However I doubt it could have gone unnoticed for years, obviously
this does not happen in real life. So I think we can remove
both nf_ct_gre_keymap_flush() and nf_conntrack_proto_pernet_fini().
Colin Ian King [Thu, 24 Jun 2021 19:57:18 +0000 (20:57 +0100)]
netfilter: nf_tables: Fix dereference of null pointer flow
In the case where chain->flags & NFT_CHAIN_HW_OFFLOAD is false then
nft_flow_rule_create is not called and flow is NULL. The subsequent
error handling execution via label err_destroy_flow_rule will lead
to a null pointer dereference on flow when calling nft_flow_rule_destroy.
Since the error path to err_destroy_flow_rule has to cater for null
and non-null flows, only call nft_flow_rule_destroy if flow is non-null
to fix this issue.
Addresses-Coverity: ("Explicity null dereference") Fixes: 3e5b5ae70952 ("netfilter: nf_tables: memleak in hw offload abort path") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 24 Jun 2021 10:36:42 +0000 (12:36 +0200)]
netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
Consider:
client -----> conntrack ---> Host
client sends a SYN, but $Host is unreachable/silent.
Client eventually gives up and the conntrack entry will time out.
However, if the client is restarted with same addr/port pair, it
may prevent the conntrack entry from timing out.
This is noticeable when the existing conntrack entry has no NAT
transformation or an outdated one and port reuse happens either
on client or due to a NAT middlebox.
This change prevents refresh of the timeout for SYN retransmits,
so entry is going away after nf_conntrack_tcp_timeout_syn_sent
seconds (default: 60).
Entry will be re-created on next connection attempt, but then
nat rules will be evaluated again.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 24 Jun 2021 10:36:41 +0000 (12:36 +0200)]
selftest: netfilter: add test case for unreplied tcp connections
TCP connections in UNREPLIED state (only SYN seen) can be kept alive
indefinitely, as each SYN re-sets the timeout.
This means that even if a peer has closed its socket the entry
never times out.
This also prevents re-evaluation of configured NAT rules.
Add a test case that sets SYN timeout to 10 seconds, then check
that the nat redirection added later eventually takes effect.
This is based off a repro script from Antonio Ojea.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
Add a wrapping struct to serve as the memcpy() source so the compiler
can perform appropriate bounds checking, avoiding this future warning:
In function '__fortify_memcpy',
inlined from 'iucv_message_pending' at net/iucv/iucv.c:1663:4:
./include/linux/fortify-string.h:246:4: error: call to '__read_overflow2_field' declared with attribute error: detected read beyond size of field (2nd parameter)
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If 'gve_probe()' fails, we should propagate the error code, instead of
hard coding a -ENXIO value.
Make sure that all error handling paths set a correct value for 'err'.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the 'register_netdev() call fails, we must release the resources
allocated by the previous 'gve_init_priv()' call, as already done in the
remove function.
Add a new label and the missing 'gve_teardown_priv_resources()' in the
error handling path.
Fixes: 0fb435173bd4 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add stmmac_fpe_stop_wq() in stmmac_suspend() to terminate FPE workqueue
during suspend. So, in suspend mode, there will be no FPE workqueue
available. Without this fix, new additional FPE workqueue will be created
in every suspend->resume cycle.
Fixes: 1d0a218e559c ("net: stmmac: support FPE link partner hand-shaking procedure") Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Jul 2021 20:27:11 +0000 (13:27 -0700)]
Merge branch 'sms911x-dts'
Geert Uytterhoeven says:
====================
sms911x: DTS fixes and DT binding to json-schema conversion
This patch series converts the Smart Mixed-Signal Connectivity (SMSC)
LAN911x/912x Controller Device Tree binding documentation to
json-schema, after fixing a few issues in DTS files.
Changed compared to v1[1]:
- Dropped applied patches,
- Add Reviewed-by,
- Drop bogus double quotes in compatible values,
- Add comment explaining why "additionalProperties: true" is needed.
[1] [PATCH 0/5] sms911x: DTS fixes and DT binding to json-schema conversion
https://lore.kernel.org/r/cover.1621518686.git.geert+renesas@glider.be
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the Smart Mixed-Signal Connectivity (SMSC) LAN911x/912x
Controller Device Tree binding documentation to json-schema.
Document missing properties.
Make "phy-mode" not required, as many DTS files do not have it, and the
Linux drivers falls back to PHY_INTERFACE_MODE_NA.
Correct nodename in example.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
ARM: dts: qcom-apq8060: Correct Ethernet node name and drop bogus irq property
make dtbs_check:
ethernet-ebi2@2,0: $nodename:0: 'ethernet-ebi2@2,0' does not match '^ethernet(@.*)?$'
ethernet-ebi2@2,0: 'smsc,irq-active-low' does not match any of the regexes: 'pinctrl-[0-9]+'
There is no "smsc,irq-active-low" property, as active low is the
default.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 30 Jun 2021 16:42:44 +0000 (09:42 -0700)]
udp: annotate data races around unix_sk(sk)->gso_size
Accesses to unix_sk(sk)->gso_size are lockless.
Add READ_ONCE()/WRITE_ONCE() around them.
BUG: KCSAN: data-race in udp_lib_setsockopt / udpv6_sendmsg
write to 0xffff88812d78f47c of 2 bytes by task 10849 on cpu 1:
udp_lib_setsockopt+0x3b3/0x710 net/ipv4/udp.c:2696
udpv6_setsockopt+0x63/0x90 net/ipv6/udp.c:1630
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3265
__sys_setsockopt+0x18f/0x200 net/socket.c:2104
__do_sys_setsockopt net/socket.c:2115 [inline]
__se_sys_setsockopt net/socket.c:2112 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2112
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88812d78f47c of 2 bytes by task 10852 on cpu 0:
udpv6_sendmsg+0x161/0x16b0 net/ipv6/udp.c:1299
inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2337
___sys_sendmsg net/socket.c:2391 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2477
__do_sys_sendmmsg net/socket.c:2506 [inline]
__se_sys_sendmmsg net/socket.c:2503 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2503
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000 -> 0x0005
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 10852 Comm: syz-executor.0 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: a388728846a0 ("udp: generate gso with UDP_SEGMENT") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 30 Jun 2021 11:42:13 +0000 (13:42 +0200)]
tcp: consistently disable header prediction for mptcp
The MPTCP receive path is hooked only into the TCP slow-path.
The DSS presence allows plain MPTCP traffic to hit that
consistently.
Since commit c5be4bc9a7f5 ("net: mptcp: improve fallback to TCP"),
when an MPTCP socket falls back to TCP, it can hit the TCP receive
fast-path, and delay or stop triggering the event notification.
Address the issue explicitly disabling the header prediction
for MPTCP sockets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/200 Fixes: c5be4bc9a7f5 ("net: mptcp: improve fallback to TCP") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The caif_hsi driver relies on a cfhsi_get_ops symbol using symbol_get,
but this symbol is not provided anywhere in the kernel tree. Remove
this driver given that it is dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, if a reset fails due to failover or other communication error
there is another reset (eg: FAILOVER) in the queue and we would process
that reset. But if we are unable to communicate with PHYP or VIOS after
H_FREE_CRQ, there would be no other resets in the queue and the adapter
would be in an undefined state even though it was in the OPEN state
earlier. While starting the reset we set the carrier to off state so
we won't even get the timeout resets.
If the last queued reset fails, retry it as a hard reset (after the
usual 60 second settling time).
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ptp: support virtual clocks and timestamping
Current PTP driver exposes one PTP device to user which binds network
interface/interfaces to provide timestamping. Actually we have a way
utilizing timecounter/cyclecounter to virtualize any number of PTP
clocks based on a same free running physical clock for using.
The purpose of having multiple PTP virtual clocks is for user space
to directly/easily use them for multiple domains synchronization.
user
space: ^ ^
| SO_TIMESTAMPING new flag: | Packets with
| SOF_TIMESTAMPING_BIND_PHC | TX/RX HW timestamps
v v
+--------------------------------------------+
sock: | sock (new member sk_bind_phc) |
+--------------------------------------------+
^ ^
| ethtool_get_phc_vclocks | Convert HW timestamps
| | to sk_bind_phc
v v
+--------------+--------------+--------------+
vclock: | ptp1 | ptp2 | ptpN |
+--------------+--------------+--------------+
pclock: | ptp0 free running |
+--------------------------------------------+
The block diagram may explain how it works. Besides the PTP virtual
clocks, the packet HW timestamp converting to the bound PHC is also
done in sock driver. For user space, PTP virtual clocks can be
created via sysfs, and extended SO_TIMESTAMPING API (new flag
SOF_TIMESTAMPING_BIND_PHC) can be used to bind one PTP virtual clock
for timestamping.
The test tool timestamping.c (together with linuxptp phc_ctl tool) can
be used to verify:
Changes for v2:
- Converted to num_vclocks for creating virtual clocks.
- Guranteed physical clock free running when using virtual
clocks.
- Fixed build warning.
- Updated copyright.
Changes for v3:
- Supported PTP virtual clock in default in PTP driver.
- Protected concurrency of ptp->num_vclocks accessing.
- Supported PHC vclocks query via ethtool.
- Extended SO_TIMESTAMPING API for PHC binding.
- Converted HW timestamps to PHC bound, instead of previous
binding domain value to PHC idea.
- Other minor fixes.
Changes for v4:
- Used do_aux_work callback for vclock refreshing instead.
- Used unsigned int for vclocks number, and max_vclocks
for limitiation.
- Fixed mutex locking.
- Dynamically allocated memory for vclock index storage.
- Removed ethtool ioctl command for vclocks getting.
- Updated doc for ethtool phc vclocks get.
- Converted to mptcp_setsockopt_sol_socket_timestamping().
- Passed so_timestamping for sock_set_timestamping.
- Fixed checkpatch/build.
- Other minor fixed.
Changes for v5:
- Fixed checkpatch/build/bug reported by test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Wed, 30 Jun 2021 08:12:00 +0000 (16:12 +0800)]
net: socket: support hardware timestamp conversion to PHC bound
This patch is to support hardware timestamp conversion to
PHC bound. This applies to both RX and TX since their skb
handling (for TX, it's skb clone in error queue) all goes
through __sock_recv_timestamp.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Wed, 30 Jun 2021 08:11:59 +0000 (16:11 +0800)]
net: sock: extend SO_TIMESTAMPING for PHC binding
Since PTP virtual clock support is added, there can be
several PTP virtual clocks based on one PTP physical
clock for timestamping.
This patch is to extend SO_TIMESTAMPING API to support
PHC (PTP Hardware Clock) binding by adding a new flag
SOF_TIMESTAMPING_BIND_PHC. When PTP virtual clocks are
in use, user space can configure to bind one for
timestamping, but PTP physical clock is not supported
and not needed to bind.
This patch is preparation for timestamp conversion from
raw timestamp to a specific PTP virtual clock time in
core net.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Wed, 30 Jun 2021 08:11:58 +0000 (16:11 +0800)]
mptcp: setsockopt: convert to mptcp_setsockopt_sol_socket_timestamping()
Split timestamping handling into a new function
mptcp_setsockopt_sol_socket_timestamping().
This is preparation for extending SO_TIMESTAMPING
for PHC binding, since optval will no longer be
integer.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Wed, 30 Jun 2021 08:11:56 +0000 (16:11 +0800)]
ethtool: add a new command for getting PHC virtual clocks
Add an interface for getting PHC (PTP Hardware Clock)
virtual clocks, which are based on PHC physical clock
providing hardware timestamp to network packets.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Wed, 30 Jun 2021 08:11:53 +0000 (16:11 +0800)]
ptp: support ptp physical/virtual clocks conversion
Support ptp physical/virtual clocks conversion via sysfs.
There will be a new attribute n_vclocks under ptp physical
clock sysfs.
- In default, the value is 0 meaning only ptp physical clock
is in use.
- Setting the value can create corresponding number of ptp
virtual clocks to use. But current physical clock is guaranteed
to stay free running.
- Setting the value back to 0 can delete virtual clocks and back
use physical clock again.
Another new attribute max_vclocks control the maximum number of
ptp vclocks.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Jul 2021 20:04:41 +0000 (13:04 -0700)]
Merge branch 'wwan-iosm-fixes'
M Chetan Kumar says:
====================
net: wwan: iosm: fixes
This patch series contains IOSM Driver fixes and details are
are mentioned below.
Patch1: Corrects uevent reporting format key=value pair.
Patch2: Removes redundant IP session checks.
Patch3: Correct link-Id number to be in sycn with MBIM session Id.
Patch4: Update netdev tx stats.
Patch5: Set netdev default mtu size.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Link ID to be kept intact with MBIM session ID
Ex: ID 0 should be associated to MBIM session ID 0.
Reported-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Jul 2021 20:02:38 +0000 (13:02 -0700)]
Merge branch 'octeontx2-dmasc-filtering'
Hariprasad Kelam says:
====================
DMAC based packet filtering
Each MAC block supports 32 DMAC filters which can be configured to accept
or drop packets based on address match This patch series adds mbox
handlers and extends ntuple filter callbacks to accomdate DMAC filters
such that user can install DMAC based filters on interface from ethtool.
Patch1 adds necessary mbox handlers such that mbox consumers like PF netdev
can add/delete/update DMAC filters and Patch2 adds debugfs support to dump
current list of installed filters. Patch3 adds support to call mbox
handlers upon receiving DMAC filters from ethtool ntuple commands.
Hariprasad Kelam [Wed, 30 Jun 2021 10:10:59 +0000 (15:40 +0530)]
octeontx2-pf: offload DMAC filters to CGX/RPM block
DMAC filtering can be achieved by either NPC MCAM rules or
CGX/RPM MAC filters. Currently we are achieving this by NPC
MCAM rules. This patch offloads DMAC filters to CGX/RPM MAC
filters instead of NPC MCAM rules. Offloading DMAC filter to
CGX/RPM block helps in reducing traffic to NPC block and
save MCAM rules
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Kumar Kori [Wed, 30 Jun 2021 10:10:57 +0000 (15:40 +0530)]
octeontx2-af: DMAC filter support in MAC block
MAC block supports 32 dmac filters which are logically
divided among all attached LMACS.
For example MAC block0 having one LMAC then maximum supported
filters are 32 where as MAC block1 having 4 enabled LMACS
them maximum supported filteres are 8 for each LMAC.
This patch adds mbox handlers to add/delete/update mac entry
in DMAC filter table.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Jul 2021 18:51:36 +0000 (11:51 -0700)]
Merge branch 'dsa-mv88e6xxx-topaz-fixes'
Marek Behún says:
====================
dsa: mv88e6xxx: Topaz fixes
here comes some fixes for the Topaz family (Marvell 88E6141 / 88E6341)
which I found out about when I compared the Topaz' operations
structure with that one of Peridot (6390).
This is v2. In v1, I accidentally sent patches generated from wrong
branch and the 5th patch does not contain a necessary change in
serdes.c.
Changes from v1:
- the fifth patch, "enable SerDes RX stats for Topaz", needs another
change in serdes.c
- Andrew's Reviewed-by to 1,2,3,4 and 6
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Wed, 30 Jun 2021 22:22:31 +0000 (00:22 +0200)]
net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
Commit 9acebbc033335 ("net: dsa: mv88e6xxx: Add 6390 family PCS
registers to ethtool -d") added support for dumping SerDes PCS registers
via ethtool -d for Peridot.
The same implementation is also valid for Topaz, but was not
enabled at the time.
Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 9acebbc033335 ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Wed, 30 Jun 2021 22:22:30 +0000 (00:22 +0200)]
net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
Commit 60eb18f71003d ("mv88e6xxx: Add serdes Rx statistics") added
support for RX statistics on SerDes ports for Peridot.
This same implementation is also valid for Topaz, but was not enabled
at the time.
We need to use the generic .serdes_get_lane() method instead of the
Peridot specific one in the stats methods so that on Topaz the proper
one is used.
Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 60eb18f71003d ("mv88e6xxx: Add serdes Rx statistics") Signed-off-by: David S. Miller <davem@davemloft.net>